AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Writing a bot from scratch
 
 
  [ # 16 ]
Erwin Van Lun - Jun 4, 2010:

@Richard:
ps I’ve split your private contest idea to another thread, it’s a separate discussion.

I have basically no objections if you want to recycle this idea to the general public, but you could have at least quoted me.

But then again… My offer here was specifically addressed to Victor as I see no point in slaughtering technologically inferior bots. wink So hopefully he will find this proposal there. @Victor: Will you? grin


Richard

 

 
  [ # 17 ]

ow, okay, sorry I didn’t understand you well.

I actually asked for technical assistant to set the new thread on your name, as that is not a standard functionalit (didn’t expect you would notice this that fast :-s ).

now I also see what you mean: a private contest, so between two bots?

What do you want me to do?

 

 
  [ # 18 ]

No worry Erwin, I have clarified my idea in the new “contest-thread”, so it should be ready if any two botmasters would like to have a dance.

As far as I’m concerned, no additional action is needed.

Richard

 

 
  [ # 19 ]

ok, this is the new URL for those who’d lilke to know: http://www.chatbots.org/ai_zone/viewthread/140/ and it’s now on your name grin. It’s already getting a dialogue now between Richard and Richard grin

 

 
  [ # 20 ]
Richard Jelinek - Jun 4, 2010:

well your posting was of course just Perl advocacy. wink
Which is ok, but when it is done, it should be done right.

Yes, Perl advocacy, and good enough for a quick sample, it did not need the entire set of ‘business logic’  rules of postal code validation.

Richard Jelinek - Jun 4, 2010:

It is hard to explain/make clear the merits of a programming language if your examples show no clear benefit compared to other languages.

They can compare it to how they would do it in the languages they know, by themselves.

Richard Jelinek - Jun 4, 2010:

because from the example provided, the machine that could solve this (as the general case - not hardcoded) would not only show true machine intelligence, but also true machine clairvoyance.

Why? if the machine can truly understand natural language, it could respond this way.

Richard Jelinek - Jun 4, 2010:

Because I do, OTOH I could now start picking on that and show how full of unpreciseness, contradictions and ambiguities it is.

Absolutely ! In case you haven’t noticed, human language is absolutely riddled with ”unpreciseness, contradictions and ambiguities”!  Real intelligence in a chatbot would be the ability to ask clarifying questions and interactively resolve and understand.  This is the approach I’m taking with my bot.

Richard Jelinek - Jun 4, 2010:

Which brings us to the topic of knowledge representation. What do you use if I might ask?
Because from what I’ve read so far, you probably (hopefully) have left the evolutionary
step of predicate-logic behind already.

  My bot’s knowledge base of facts will be stored as natural language statements compiled as parse trees.

Erwin - yes, mathematical equations will also be strings.  Which brings me to a new topic which I won’t introduce here.  My bot will use ‘two levels of logic’—execution logic and symbolic logic.

 

 
  [ # 21 ]
Victor Shulist - Jun 4, 2010:
Richard Jelinek - Jun 4, 2010:

... machine clairvoyance.

Why? if the machine can truly understand natural language, it could respond this way.

Let’s look at it forensically:

User: A valid postal code is : letter, number, letter, number, letter, number
Computer: OK
User: is K0J1B0 a valid postal code?
Computer: Yes

IMHO No. According to what you have stated, the computer would have to accept e.g.

Ä,123,ξ,01010101,æ,-0.005

as valid postal code, but never K0J1B0

letter? Sure. ฬ is a nice letter. No one said latin letter without diacritics.
number? Yes. 123 is a number. So is FF. Or even MCMIX. Negative or positive? Or did you mean digit? Decimal digit? Arabic-Indic ones hopefully.
What - if not clairvoyance - should the computer prevent from expecting commas in the valid postal code if they are in your definition?

Absolutely ! In case you haven’t noticed, human language is absolutely riddled with ”unpreciseness, contradictions and ambiguities”!  Real intelligence in a chatbot would be the ability to ask clarifying questions and interactively resolve and understand.  This is the approach I’m taking with my bot.

And this is a good approach. I’ll love to see it evolving.

My bot’s knowledge base of facts will be stored as natural language statements compiled as parse trees.

That is interesting. With this evey fact also carries its contextual information which helps with disambiguation. If done right, this could even be a large chunk of a formidable machine translation solution. I hope your parse trees are not hardcoded to english (or at least tagged, so you could have e.g. english and german parse trees living side by side in the same repository).

Richard

 

 
  [ # 22 ]

Ä,123,ξ,01010101,æ,-0.005 as a postal code at first ? Sure ok.  That is fine.  Then, again, through interactive conversation, you would say : “no, it must only be 6 char acters in length.”

Now you are assuming the system knows about hexadecimal digits, and assuming it knows about roman numerals.  It won’t know these things at first.

My approach is a “child machine” (see Turing’s writings).  It will start with some base amount of knowledge.  Yes, making incorrect assumptions and errors is quite alright.  By a process of positive and negative feedback, it will continue to refine its internal rules.  In other words, I will allow it to make mistakes, take guesses, and get feedback - true learning.

Yes, so far the parse trees are a combination of English and internal “key word identifiers”, but all the original English words are stored also, so it knows the original text.

I think this approach is promising.  For example I am right now able to do some “fuzzy” matching involving parse trees.  I can tell the bot “John’s phone number is 123-4567”.  And later ask “What’s John’s cell phone number?” 
It will then ask “I’m not sure if it is his *cell* phone number, but the only number I have of John’s is 123-4567”

Or, I can say “John’s cell phone number is 123-4567” and ask “What’s John’s number?”  it will then say, “Do you mean *cell* phone number?”

And yes, it puts in the asterisks around the words it is emphasizing.

Now these are very simple examples, but it is a start, and I believe this kind of “fuzzy” matching is the path to go.  Also, making assumptions, being wrong, and then having to me told more information to clarify (like “no, postal code must be 6 char actors long only.”) is a small price to pay.

One of the things I have realized is there doesn’t seem to be any knowledge that is absolute.  There is always some way that you have to further clarify.

Your examples above are excellent.  Having to have all that information “hand coded” or manually entered will never give rise to intelligence - it must grow, step by step, make assumptions, ask clarifying questions.  That is where the intelligence is, not sitting and coding and coding all possible rules, and making it perfect from the start, because someone will come along and say “well… what about….”.   

Information always changes as well.  If you coded in your program a few years ago that there were 9 planets, it was true - Pluto was a planet a few years back, now it is not considered a planet because new criteria was defined for a planet.

Right now my bot is only aware of the alphabet and digits 0 to 9 and +/-  When you type in just a number like 45 or -45.1, it knows to think of that as a number.

So it won’t think that a postal code could contain Chinese characters for example.  Also, it does not know hexadecimal yet.  Later it will, and if it makes the assumption that   K0J FBF is a postal code because it considered F as a digit, I will then provide a clarifying fact that it must be a decimal digit.  Now… the ‘fun’ part is going to be writing code that takes that English fact (or whatever language),  coded , as I say , as a parse tree, and integrate it into its existing knowledge base parse trees, and update them.  The updated parse tree will then be the data source for other code which translates that into Perl code (or whatever language) that actually does the validation.

One last thought - it is very interesting that a machine must be 100% correct from the start, yet we think nothing of teaching humans, having them make errors, false assumptions, and there is no question that humans are intelligent!!

There is almost nothing that is absolute.  I say almost, otherwise that would be absolute.

 

 
  [ # 23 ]
Victor Shulist - Jun 4, 2010:

Ä,123,ξ,01010101,æ,-0.005 as a postal code at first ? Sure ok.  That is fine.  Then, again, through interactive conversation, you would say : “no, it must only be 6 char acters in length.”

Now you are assuming the system knows about hexadecimal digits, and assuming it knows about roman numerals.  It won’t know these things at first.

But it will know the semantics of the word “it”? In your clarification example - “no, *it* must only be 6 char acters in length.” - you are assuming the system is able to perform anaphora resolution. Now that’s big fish. Compared to this, the knowledge about numeral systems is easy - I think.

My point was to clarify, that IF you would like your bot to be able to learn interactively BY NATURAL LANGUAGE, it must already know a hell lot of informations about the language, the vocabulary, its meanings (semantics) and so on.

Therefore my pedantic ride on numbers and digits.

For example I am right now able to do some “fuzzy” matching involving parse trees.  I can tell the bot “John’s phone number is 123-4567”.  And later ask “What’s John’s cell phone number?”  It will then ask “I’m not sure if it is his *cell* phone number, but the only number I have of John’s is 123-4567”
Or, I can say “John’s cell phone number is 123-4567” and ask “What’s John’s number?”  it will then say, “Do you mean *cell* phone number?”

Now that I call a skill - wothy of contesting. wink


And yes, it puts in the asterisks around the words it is emphasizing.

One of the things I have realized is there doesn’t seem to be any knowledge that is absolute.

Mathematical axioms are quite “atomic”. Behaviour of logical operators is also relatively absolute. grin

Information always changes as well.  If you coded in your program a few years ago that there were 9 planets, it was true - Pluto was a planet a few years back, now it is not considered a planet because new criteria was defined for a planet.

Yes, astronomy is my favorite exercise field when teaching the bot. And as the new status of Pluto as dwarf planet has been decided here in Prague it was of course a must to put this info right. Not that easy I might add:

A dwarf planet, as defined by the International Astronomical Union (IAU), is a celestial body orbiting the Sun that is massive enough to be rounded by its own gravity but has not cleared its neighbouring region of planetesimals and is not a satellite.

Uh… gravity, satelite, orbitting, region… every single of these words has a definition not much easier than that of a “dwarf planet”. As soon as you start teaching your bot like that…

And it gets worse. Information changes as you stated correctly. But it is not enough to only adapt. You basically have to keep old knowledge like “Until 2006 Pluto was considered a planet…” Or “How many planets had our solar system in 2005?”

So it won’t think that a postal code could contain Chinese characters for example.

Actually - my postal code is 90765.


Richard

 

 
  [ # 24 ]
Richard Jelinek - Jun 4, 2010:

My point was to clarify, that IF you would like your bot to be able to learn interactively BY NATURAL LANGUAGE, it must already know a hell lot of informations about the language, the vocabulary, its meanings (semantics) and so on.

NO KIDDING !!  Absolutely true, I have all kinds of data and logic in it right now so it can understand that for example

“While I was in Africa, I shot an elephant in my pajamas”.

and it knows that “in my pajamas” is being applied to “I” and not the elephant based on the fact that it is a safe assumption that since Elephant is an animal, and animals USUALLY don’t wear clothes, and “I” is a person and people do wear pajamas.

For example I am right now able to do some “fuzzy” matching involving parse trees.  I can tell the bot “John’s phone number is 123-4567”.  And later ask “What’s John’s cell phone number?”  It will then ask “I’m not sure if it is his *cell* phone number, but the only number I have of John’s is 123-4567”
Or, I can say “John’s cell phone number is 123-4567” and ask “What’s John’s number?”  it will then say, “Do you mean *cell* phone number?”

Now that I call a skill - wothy of contesting. wink

Yes, but I want to generalize it a lot more first.

One of the things I have realized is there doesn’t seem to be any knowledge that is absolute.

Mathematical axioms are quite “atomic”. Behaviour of logical operators is also relatively absolute. grin

Nope, What is 5 + 5?  Answer: 10.. .wrong, maybe I wanted it in hex, that would be A.
What is 5 + 5 ? answer - it’s a number.. that’s a correct answer.
What is 5 + 5 ? answer - it is a mathematical expression.

You see, you STILL need to clarify what you mean by the question “What is 5 +5 ”

Even axioms of Euclidean geometry break down when Einstein’s curved space theory comes into play !!


*maybe this statement is true*

There are almost no absolutes - ‘almost’ because otherwise this statement would be absolute.

 

 
  [ # 25 ]

Hi,
Great discussion!! Allow me to ‘water down’ this thread.

I hope your parse trees are not hardcoded to english (or at least tagged, so you could have e.g. english and german parse trees living side by side in the same repository).

What we need is an internationally recognized and accepted symbolic representation of language. There are only two such systems that I am aware of: mathematics and pornography. ;D I better stick with numbers…or my wife might never allow me to finish.


Red dwarfs? Pluto? Hexadecimal? Absolutes?  My family ancestry consists of wonderful but uneducated people…just plain, simple, hard working folks. In terms of absolutes, there is only 1 answer for each of these.
  * Red dwarf is a short person who spent to much time in the sun.
  * Pluto is an animated dog in Disney movies.
  * Witches cursor…they only hear ‘hex’.
My only and most likely ‘insignificant’ point is that chat bots, if they could truly carry on such a lofty and ‘academic’ conversation…still would not be able to communicate with many of my older family members…and or many of today’s younger generation.  They just don’t talk that way!

Incidentally, I am grateful for threads like these. They’re terrific for teaching me new terms and concepts.  Keep it up!

Regards,
Chuck

 

 
  [ # 26 ]

Yes, absolutely, and that is the same point that Richard brought up.  Currently, that is my focus… I am retaining all the terms of English , but I am also tagging key words.  Now, since, I am uni-lingual , those keywords are English words, but that doesn’t matter.

Keep in mind that not only do I need a “standard internal” set of “language agnostic keywords”, but I also need a “standard language agnostic” STRUCTURE of the encoded parse tree.

 

 
  [ # 27 ]

In America a woman has a baby every 15 minutes. Our job is to find that woman and stop her.

Using Victor’s patchup, we know women have babies.  Do we know that they don’t have them every fifteen minutes or does that take some heavy computation to derive at?

And once we get there, what do we do with the information?  That’s where I’m at.  Is it dialog management I need or goals in an agenda?

At least in Chuck’s world the bot has a place to do things.  I’m still looking for that higher drive, that motivation to explore and create which I think the bot needs.  I’m about ready to just give the bot an imagination so it will have a desired state to persue.  Only how do I know it won’t just jabber instead?

 

 
  [ # 28 ]

Again, this issue relates to assumptions and the power of naturual language to specify details.

        “In America a woman has a baby every 15 minutes.”

You are assuming that the chatbot would assume it is a single, specific women.  It could mean a single woman, or it could mean any.

The adjective modifier ‘a’ does not necessarily mean that.  If it was

        “In America the woman has a baby every 15 minutes.”

And this goes back to what I was stating earlier, through interactive conversation and clarifications.  A chat bot that would take in a clarifying statement   “It is not the same women”.  Yes, this is an anaphora, and the system must be able to deal with it.  The system must maintain 2 or more theories on what the correct interpretation of modifier ‘a’ is to ‘woman’ until it gets this clarifying statement.

Also, it would have to correlate a relevant peice of information - it takes about 9 months for a women to have a baby, and 9 months is considerably longer than 15 minutes.

Thus, the chat bot needs to be able to

a) generate all the possible ways to interpret “a women”, and for each
b) evaluate each assumption against known facts to rule them out.

At that point, there may be still more than one possible interpretation, or there will only be one.  If only one, it would ‘believe’ it understands the meaning of the statement.

 

 
  [ # 29 ]

Ok, I am wondering how a computer would determine that we are considering a woman giving birth from the input.  Clarifying “a” and “the” seems very tedious too.

The point I meant to make is that the facts alone don’t answer the problem.  It takes some connections between the facts including using the grammar also in order to derive at what we do almost automatically.  The bottom line here is what do we understand from this input?  When would a program stop with it’s analysis and declare it understands?  You say clarifying in the on-going dialog, but won’t you come up against a situation that cannot be solved, you know, the halting problem?

 

 
  [ # 30 ]

The halting problem won’t be an issue :

new_data_generated = true

user input +  knowledge base >>  New_Data(0)
i = 1

while(new_data_generated)
{

New_Data(i) + user input + knowledge base >> New_Data(i+1)

i++

if(num_entries_in_KB_before = num_entries_in_KB_after) { new_data_generated=false);
}

In other words, you keep looping, generating more deductions, until the # statements deduced that existed before your correlations, is equal to after, thus no new facts were deduced, and you break out of loop.

Gary Dubuque - Jun 6, 2010:

Ok, I am wondering how a computer would determine that we are considering a woman giving birth from the input.  Clarifying “a” and “the” seems very tedious too.

tedious - that is an understatement, but that is what it will take.

After it breaks out of the loop, it checks to see if it was able to rule-out all possible interpretations except one.

If there are still >1 possible interpretation, it knows it doesn’t get fully understand, else it believes it understands the user input.

as for the question of how the chatbot would know we’re talking about a woman having a baby…. i’m not going to try and write that in this textbox…. it will be awhile before i put together a document explaining my approach to that… if my chatbot algorithm works.. i hope it does grin

 

 < 1 2 3 4 > 
2 of 4
 
  login or register to react