AI Zone: chatbots.org

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Chatbots, AI, wildcards, random phrases and Pattern Matching

Posted: Dec 13, 2011

[ # 16 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

I have one last thing to say on this topic although Andrew summed it up nicely,

There is nothing wrong with using pattern matching. In matter of fact it is a very useful strategy when all else fails. I use a variation of pattern matching along with wild card and regex as a fail safe if my parser and knowledge base fails to respond.

Just like any well designed application, you have dependencies and you have dynamic libraries that you call upon within your routines. What you should always avoid is bottle necking your runtime or dead-ending your data. My understanding of the the goal in designing chat bots is to keep the user engaged. We do this by using a combination of tactics. IMO just about anything is game. Of course none of these methods are ground breaking in terms of AI research but effective none the less.

Posted: Dec 13, 2011

[ # 17 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

wow, i’m impressed, first “really good” thread on this site in a long time….
first off… sorry for any spelling mistakes, i’m not going to bother checking 100 times… to much to go through lol…

Steve
——-

Sure, I would say your bot is understanding. My take is - it doesn’t matter the design of your system, how it works, the algorithms, or if it is made of cogs and pulleys - I’m a functionalist , it is what it does that counts.

but consider….

I have a red truck.
or
I had a red truck.
Q:what do I have
A: a red truck
Q: what kind of truck do i have?
A: a red one

sure, that is cool, you are recursively applying regexes. First is to match

I have (*)
with
I have a (red truck)

then splitting apart “red truck” with something like “(red|blue|green|yellow) (truck|car)” ..or however you are doing it… but…

Fact: I had a red truck which I bought from my uncle Henry when I was going to college in the early 2000’s.

Question: what red truck did I buy from my uncle henry?

Question: what color of truck did I buy from my uncle henry in the early 2000’s?
Answer: red one

Question: what red truck did I buy from my uncle henry in the early 1980’s?
Answer: i’m not aware of any truck you bought from uncle henry in the early 1980’s.

Fact: I had a red truck which I bought from my uncle Henry when I was going to college in the early 2000’s.

Question: did I buy any vehicles from any of my relatives between 1985 and 2005 ?

So you are going to set “*” variable to be equal to “a red truck which I bought from my uncle Henry when I was going to college in the early 2000’s” ??

Then you will have to go *recursively* with regular expressions from there down. The complexity will be unreal—you may as well go with parse tree generation. I very much doubt using “any-depth-of-recursion” of applying regexes to that will be effective.

“In theory” (like communism ) it could work, but it would be astronomically difficult to manage with examples like above.

The main point is - sure regexes could work, but you will have such an immensely complex set of nested arrangements of them, you are basically, at that point, doing the same as parse tree generation.. but a very “night mare” way of doing it.

if all you’re doing is “I have a (*)”, where (*) is say one or more nouns with perhaps one or more adjectives, then its ok. But then i’d just have a GUI app with drop downs to interact with it. Click first drop down, it is “I have a”, or “I want a “, or “I bought a”... then second drop down with list of nouns as its direct object. Then perhaps a “+” to click and add a varible number of adjectives.

I *am* interested though, on how far you got with this approach and how it compares in functionality to my approach. I have to post another update but right now “Abel” (the new name) is able to do stuff like…

F: Bob went to a great party given by his former company
or
F: Did Bob go to a celebration?
A: yes, if you mean *great* celebration.
F: Did bob go to a celebration given by his great company?
A: yes, if you mean “former” company, and ignore “great” company.

meaning it is pointing out the differences in modifiers.

Abel can be controlled to the degree at which it considers “party” and “celebration” to be basically synonymous—context comes into play.
you can also have the information in the fact as the main clause or subordiante clause….

F: Bob went to his closet and took out his new suit because he was going to a party given by his great company
Q: Did bob go to a party given by his rich company?
A: yes, if you mean ‘great’ and ignore ‘rich’

(side note—if you notice the same old information in my examples it is because I am doing a huge amount of word on the framework (tree comparison etc), and not that much on world knowledge. When all framework is done, then I’ll spend 6 months or so on world knowledge to feed Abel)

Posted: Dec 13, 2011

[ # 18 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

Laura
———

True, pattern matching isn’t completely out of a job. And I agree with you - in my design it may be used, but not as the primary means of extracting meaning. Somes (perhaps a lot) the bot may encounter someone entering a noun as a verb, for example I over heard someone at work here say “We’re not going visiting the family this xmas, so I setup the computer so the kids could webcam with eachother”. Makes no sense grammatically, using a word like ‘webcam’ which is clearly only a noun, as a verb. It is human laziness that causes this.. instead of saying “so the kids could use the webcam to see and talk with eachother”. Now patterns can give hints to this—often we see an auxiliary verb like “could” followed by a verb, and then a phrase such as “each other”. This is still far from just using regexes though - we will must understand “<noun> could <verb> <pre>”, so it is not fixed sets of text. We are detecting not patterns of text, but patterns of “sub parse trees”. so that is how I will use “pattern matching” (if it could still be called that). So, personally, for me the “weights” that determine meaning are: parse tree generation and semantic tagging as #1, then #2 pattern matching hints (which will allow system to temporarily ‘break the rules’ (such as use a noun as a verb), and #3 statistics. #2 and #3 will have about the same weight, but #1 will be heighest weight. What I mean by statistics is that, the bot would consider the speaker. Example, teens these days have a different meaning of the word ‘sick’. If the word comes from someone <19, it can mean either unhealthy or some other word (which I haven’t figured out the meaning of yet… someone let me know what it really means please lol)... but if my mother says ‘sick’ , it usually just means unhealthy.

Now, some may think.. well.. if grammar can’t always manage on its own.. just go with regexes all the way. I think that would be a huge leap to take that is unwarranted. The reason being - people *do* use language in a sloppy way (nouns as verbs being the most common… ‘webcam with eachother’).. *BUT*, for the most part, people do speak with proper grammar. Also, in order to know that, after you “bent” that rule (and allowed that one word in the input to be used as a verb), you can re-evaluate. The bot can ask itself:“ok, if I were to allow this one word to be used as a verb (based on the “semantics of adjacent sub trees pattern), would it *THEN* make grammatically sense? In other words, grammar serves to help guide the bot to know when it “converged” at a grammatical meaningful sentence.

Dave
——

your comment that, in the human mind, “red truck” is a pattern similiar to a regex .. hum… perhaps speaking EXTREMELY GENERALLY, that could be true. if it is, it is more of a pattern of semantically tagged sub-parse trees. I don’t think it is something like (yellow|blue|red|green)\s+(car|truck|bike)

Posted: Dec 13, 2011

[ # 19 ]

Andrew Smith

Senior member

Total posts: 473

Joined: Aug 28, 2010

E-mail Andrew

Welcome back Victor, great to hear from you!

How much progress have you made since you were here last?

Posted: Dec 13, 2011

[ # 20 ]

Dave Morton

Administrator

Total posts: 3111

Joined: Jun 14, 2010

E-mail Dave

Victor! Great to see you! Welcome back!

I’ll get to your comment about my “comment” in a moment. First:

Aw, shucks! Now I can’t remember what I was going to say! Drat!

Ok, back to Victor:

Actually, I only inferred that the process used a pattern to match input to data to output. But since we’re limited to text, it’s difficult to convey concepts in any but the crudest of ways. Upon re-reading what I wrote, I can see how one would take my meaning as being a match of the text “red truck” (after all, I did place it in quotes), rather than the concept of a red truck. What I’m envisioning in my own brain is much higher up in the hierarchy that Andrew referred to than a simple RegEx. Sorry for the misunderstanding.

Posted: Dec 13, 2011

[ # 21 ]

Andrew Smith

Senior member

Total posts: 473

Joined: Aug 28, 2010

E-mail Andrew

Dave Morton - Dec 13, 2011:
Actually, I only inferred that the process used a pattern to match input to data to output. But since we’re limited to text, it’s difficult to convey concepts in any but the crudest of ways. Upon re-reading what I wrote, I can see how one would take my meaning as being a match of the text “red truck” (after all, I did place it in quotes), rather than the concept of a red truck. What I’m envisioning in my own brain is much higher up in the hierarchy that Andrew referred to than a simple RegEx.

Ok I think I understand what you’re getting at. There’s a point where it’s no longer a question of syntax and it becomes a question of semantics. The kind of pattern that you’re talking about could just as easily be “red with six wheels and a flat tray on the back” as “red truck.”

I’ve got one of the classic books on the subject “Conceptual Structures” by J.F.Sowa. One day when I’m less obsessed with syntax and parsing I’ll be able to read it and maybe understand some of it.

Posted: Dec 13, 2011

[ # 22 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

Thanks Andrew and Dave for the warm welcome back.

Apologizes for not being on here much…. but I decided to spend a huge amount of time on my project. Since last time, the main focus was the ability to compare trees and define the “confidence formula” , which *currently* consists of x weights. So when Abel is trying to find a fact, f, to satisfy all the requirements of your question, q, it may have several facts, say f1…f(n) to choose from. But, what if all those facts have variable number of modifiers in them that the question didn’t have? So a good example is (to use Steve’s “red truck” example)......

fact1: Joe bought a big red truck with big tires.
fact2: Joe bought a truck.

if that was your fact, but your question was less complex, so “Did Joe buy a green truck?” , the system calculates the number of “sub tree deltas” between fact and question parse trees. In this case, Abel would calculate a higher confidence (because of less number of “modifier deficiencies”), so it would use fact2 (it is closer to the question than fact2), because we simply have to state that it wasn’t a “green” truck, just “a” truck.

It turned out to be quite a job for it to handle these *on its own*. It has to calculate:

a) “fact deficiencies” (if any), so this is when your question has more modifiers than your *closest fact*, example: “Did Joe buy a big red truck?” (when the closest fact is “Joe bought a truck”—so the *fact* is deficient by modifier ‘big’ and ‘red’ (which were in the question).

b) “question deficiencies”—the reverse.. the fact has more modifiers than question…. “fact- Joe bought a big green expensive truck”, “Q: Did Joe buy a truck?”

That was relatively simple, because you are only calculating “term modifier differences”. But Abel (has a *start*) on being able to handle when there are variable number of , what I call , “sub tree modifiers” (STM)..... example…

“Joe bought the truck that his brother sold him”—here, an entire subordinate clause (“that his brother sold him”), is actually a modifier of truck in the parse tree. Then, the question can either have or not have that STM :

F: Joe bought the truck that his brother sold him
Q: Did Joe buy a big truck? (delta is: ‘the’ in fact, ‘a’ and ‘big’ in question, also fact has SC STM “that his brother sold him”)
OR, the question has a more complex parse….
Q Did Joe buy a truck that his big brother sold him.

The system can actually compare the direct objects *OF* the STMs. An entire perl module was written just to be able to handle that—in the general case (that is , variable number of STMs and term modifiers).

things REALLY got cute when the fact has two STMs and question has zero, one or two….

The tree comparison routines are not finished yet, but I -do- have considerable percent completed (can’t give an estimate off the top of my head). For example, I don’t yet have it down to comparing adverbs….

Fact: Bob bought a very expensive car.
Question - Did bob buy a nice car

right now it would report that there is a diff .. that fact has “expensive” but question has “nice” .. to inform you off that difference, but if you asked

Did bob buy a very expensive car

it would , right now, report “Yes, that exactly matches my knowledge”. Later, the STM/Term Tree compare routines will go right down to the adverb level.

Things really got cute when I started on functionality (which is where I am right now)....

(note—This is only an example (it doesn’t ‘understand’ this statement yet, more world knowledge needed, but once I provide that, the tree-compare library will handle finding the ‘delta’ ).....

fact- “Bob bought the movie that I liked but my wife hated”

when i provide the world knowledge inference rules so it can deduce that “that i liked” and “my wife headed” can possibly modify “movie”, then the following questions can be asked:

question - “Did Bob by the movie that my dear wife hated?”

as you can imagine, it gets pretty tricky—the system has to first enumerate both the “term” modifiers (like ‘the’ and ‘dear’), and the STMs, and, then iterate through the STMs of the question and STMs of the fact, to find a suitable match. Also, achieved, was the fact that you can have, not only *term synonyms*, but, what I call, ‘sub-tree-synonyms’... which means it can consider that things like “that my wife hated”, or “that my dear wife despised”, to be synonymous.

Oh… and the very last ““piece of the puzzle” was that I built a web GUI to help with adding new ‘grammar production knowledge’. A functionality I will have to add really soon is to hookup the web gui to the command-line “add new word” utility (which takes in say a verb form and conjugates it, and asks you to confirm, and then adds it to its lexicon.. same for adjectives and nouns). so, it will be nice to add that to “Kned” (knowledge editor), the new GUI tool.

So…. yeah… pretty darn busy on the project !!!!!!!!!!!!! What about you guys?!

Posted: Dec 13, 2011

[ # 23 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

Victor Shulist - Dec 13, 2011:

I *am* interested though, on how far you got with this approach and how it compares in functionality to my approach.

Hi Victor

I am not yet at the stage where it can work with sentences like, “I had a red truck which I bought from my uncle Henry when I was going to college in the early 2000’s.” and I agree that this is going to be pretty tricky using my approach (although not impossible).

I’m currently at the stage where it can pass the Loebner Prize qualifying type questions. Things like:

Human: Jim threw the ball to Bob.
Bot: Ok I will add that fact to my database.

Human: Who threw the ball?
Bot: Jim.

Human: Who did Jim throw the ball to?
Bot: Bob.

Human: What did Bob catch?
Bot: Bob was thrown a ball by Jim.

My current target is to have my bot in the finals of the Loebner Prize next year (and as a bonus, to also win the bronze medal). I think I am currently on track but time will tell. My mistake last year was not realising that the qualifying questions remain pretty much the same style with just a few words changed. eg:

2010: The ball was hit by Bill. What did Bill hit?
2011: The football was kicked by Fred. Who kicked the football?

2010: Are you a human or a computer?
2011: Are you a human or a computer?

I assumed thatseeing this style of questioning had been asked, the organisers would ask something different. Knowing my luck, it will all change next year!

My take is - it doesn’t matter the design of your system, how it works, the algorithms, or if it is made of cogs and pulleys - I’m a functionalist , it is what it does that counts.

I couldn’t agree more. To me, it is all about the end result, whether this be with NLP or pattern matching or any other method.

Nice to see you back again.

Posted: Dec 14, 2011

[ # 24 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

So the Loebner Prize is mainly judged on the memory retention and logical recognition of the judge’s input?
Do they also test on the bot’s knowledge base?, and if so, are the bots allowed to use the Internet to access data other than their host database?

Just curious, but depending on the progress with My Marie, I may be up for it.

Posted: Dec 14, 2011

[ # 25 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

As far as I can recall, internet access is strictly forbidden; the main reason of course being, a human could be at the other end of the TCP connection

Steve, thanks for the update. And Andrew, I understand you made some good progress on your GLR parser?

Posted: Dec 14, 2011

[ # 26 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

So, your bot is not tested on line?
Is the data and files installed onto a private network for the judging?

Posted: Dec 14, 2011

[ # 27 ]

Steve Worswick

Administrator

Total posts: 2048

Joined: Jun 25, 2010

E-mail Steve

Laura - You may be interested in this page which details the questions that the entrants are expected to cope with:
http://loebner.exeter.ac.uk/selection-results/

Yes - the bots are installed on individual computers on a private network. No internet access allowed for the reason Victor mentioned.

If you want to enter an online bot, this competition will hopefully kick off again next year:
http://www.chatterboxchallenge.com/

Posted: Dec 14, 2011

[ # 28 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

Thanks for the links and info Steve.

I could compile my bot into a self contained exe that will install, build the directories, copy the files and populate data tables. My Marie by design performs Internet searches and will query certain online databases but also performs the standard chatbot functions.

I will also defiantly check into the Chatterbot Challege as well.

Posted: Dec 14, 2011

[ # 29 ]

Merlin

Guru

Total posts: 1081

Joined: Dec 17, 2010

E-mail Merlin

Andrew Smith - Dec 13, 2011:

http://en.wikipedia.org/wiki/Chomsky_hierarchy

In a nutshell, you can divide all languages (i.e. patterns of interest) into four broad categories according to how difficult it is to parse (i.e. recognise) them.

The simplest are called “regular expressions” and these have a precise mathematical definition. These are what I’m referring to when I say “regular expression” but I’ve got no idea what Jan is referring to when he uses the term. I suspect that neither does he. Regular expressions can be parsed with a finite state automaton and require no additional storage.

Andrew,
Either you are a bit out of touch with modern regular expressions, or we are suffering from a semantic problem of using the same term to describe 2 different things.

If you follow the Chomsky link to the Regular Expression web page, you can find some interesting comments about Regular Expressions:

Larry Wall, author of the Perl programming language, writes in an essay about the design of Perl 6: ‘Regular expressions’ [...] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I’m not going to try to fight linguistic necessity here.

Many features found in modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression backreferences. This means that a pattern can match strings of repeated words like “papa” or “WikiWiki”, called squares in formal language theory. The pattern for these strings is (.*)/1.

Pattern matching with an unbounded number of back references, as supported by numerous modern tools, is NP-complete.

Using Regular Expression is very flexible. But, they don’t stand by themselves. You need a framework/memory/recall system. Recognizing a pattern is only the start of the process.
Regular Expression Comparison

Victor’s
Question: what red truck did I buy from my uncle henry?
Has as much to do with the memory subsystem as anything.
(Glad to see you again Victor)

The first reason is that context free grammars are capable of expressing all the patterns that are to be found in almost all natural languages.

The second reason is that when you are using a context free grammar you can add and remove rules without having to worry about changing the nature of it.

#1 - I have yet to find a pattern in natural language that a RegExp could not pick up.
#2 - Complexity and size is something that everyone dealing in AI will have to manage.

As luck would have it, this week the Stanford AI course is covering NLP. Peter Novig (Director of Research at Google) gives a good lecture on the limitation of grammars.

Posted: Dec 14, 2011

[ # 30 ]

Laura Patterson

Senior member

Total posts: 250

Joined: Oct 29, 2011

E-mail Laura

When I hear the term “regular expressions” or RegEx, I think in terms of using them in JavaScript or other simplified languages as an alternative to much more involved methods for parsing arrays and variables. RegEx is very efficient and out performs many internal scripting functions found in JScript and even Java.

I have built a library of RegEx routines that use for forming replies by using the memory back ordering that is easy accessed. On the other hand RegEx is limited, so you need to have a supporting framework as already mentioned.

< 1 2 3 4 > Last ›

2 of 5

‹‹ Interview with designer of 4004 and Z80 Chinese Room versus the Turing Test ››

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics