|
Posted: Jul 26, 2010 |
[ # 151 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
awesome.. now, like you mentioned, calculations…..
Bob has 80 coins.
Sally has three times the number of apples that Bob has coins, plus 5. How many apples does Sally have?
On an earlier post—yes, the “unless” keyword comes with me working in Perl for so long. Some people don’t like it, say it is ‘confusing’... bah ! I use that programming keyword in Perl all the time!
Also, as mentioned, my bot will use rules written in English . .so it is not limited to just “IF” and “THEN” keywords, but conjunctions, prepositions, prepositional phrases, participle phrases, infinitive phrases, noun clauses you name it.. not simply if, and then.
Actually both knowledge and rules are both data coded in plain English. Basically a ‘if-then’ rule is simply another knowledge entry parse tree which is able to generate (conclude) more knowledge entries.
|
|
|
|
|
Posted: Jul 27, 2010 |
[ # 152 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Hi Dave,
Thanks for the offer. I’ll definitely keep that in mind. I’m working on this small math processing application right now. I hope I can have something to test. It will allow the user to enter mathematical statements (varied kinds) and respond based upon the type of sentence. I’ll certainly need someone to test the module to see what works…and what fails. The app generates a log file…I would benefit from that data. Perhaps you can help test it?
good separators…you’re right. I need to allow punctuation and other marks…such as hyphenated words.
Bob has 80 coins. Sally has three times the number of apples that Bob has coins, plus 5. How many apples does Sally have?
Great question!! I’m having a great time simply thinking about how to solve problems such as these. In fact it probably makes sense to work on a chat bot one capability at a time.
I was working on ‘where are you’ questions but I’ve shifted focus on math.
TANGENT!! Has anyone heard from Erwin in the past two weeks. He seems to have disappeared.
Regards,
Chuck
|
|
|
|
|
Posted: Jul 27, 2010 |
[ # 153 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
It’s my belief that Erwin is done with his conferences, but still on holiday. I expect him back within the next few days.
As to helping to test, sure. I’m more than happy to do so. Just let me know what’s needed, and I’m your guy.
Another tangent! I got my StickyPad program built and running. There’s a post about it in the Tools forum. You may find it marginally useful. {wink!}
|
|
|
|
|
Posted: Jul 27, 2010 |
[ # 154 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Chuck Bolin - Jul 27, 2010:
In fact it probably makes sense to work on a chat bot one capability at a time.
Absolutely ! The first functionality I want my bot to have is store and retrieve of information such as “Bob’s phone number is 123-4567”. What’s bob’s phone number?
Yes, very simple… but then it gets very complicated very quickly. .
User- “Bob’s cell phone number is 123-4567”.
User- “What’s bob’s number?”
Computer- You mean his cell phone?
here it has to make a ‘fuzzy’ match
or . ..
User- ‘Bob’s phone number is 123-4567’
User - What is bob’s cell phone number?
Computer- I’m not sure if it is his *cell* number, but you told me his phone number is 123-4567.
So you have variable number of adjectives describing a noun when you tell the system something.. then, variable number of adjective’s modifying a noun when you are asking about it. Sometimes they may match up exactly (but maybe different order), other times the more details in original fact than the question, other times vice versa.
other times, synonyms must be used…
User- Bob’s telephone # is 123-4567
User- what’s Bobby’s number?
here ‘Bob’ = Bobby, ‘#’ = ‘number’ or ‘telephone’ number
for me this functionality will be wonderful ... my wife says i’m very good at programming, but not so organized !! A chatbot that I can just say stuff to like that and ask it about later will be great, won’t have to bother writing things in text files all over my computer. .. won’t even have to hit online phone directory . . ‘one stop shopping’ for everything I tell the chatbot - in plain english - and I won’t have a SQL table pre-defined with each type of information, instead ..just type it in!
|
|
|
|
|
Posted: Jul 28, 2010 |
[ # 155 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Vic,
I like the concept of stating a fact to the bot. Then in an immediate followup question asking for the information.
Since coding the other day I’ve come up with some more test requirements for the ‘math module’. I’ll post them after I’ve written them into the spec.
I need to create some ‘scouting’ functions. A scouting function (my term) has the purpose of performing a quick check on the human input and then indicating what sort of information is being indicated or requested. Complex inputs would result in multiple classifications. These could be used to apply the correct module(s) to assist in ‘gleaning the meaning’. I just made that one up too. =)
Chuck
|
|
|
|
|
Posted: Jul 28, 2010 |
[ # 156 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Hi,
A friend asked me to write an application that would search Craigslist. The program would email him each time his specific search criteria surfaced in a posting.
Using VB6, my app currently searches and populates a listbox that may be clicked to show the content in another text box. I need to add a timer. The email feature is a bit ‘iffy’.
What I have learned is that the craigslist web pages are formatted in a straightforward manner and are very easy to ‘scrape’ for data. This may serve as another data source for Walter.
I hope to return to Walter this evening.
Regards,
Chuck
|
|
|
|
|
Posted: Jul 30, 2010 |
[ # 157 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Hi,
I’m taking a week or two to complete an entry for this game programming competition. http://www.gameinstitute.com/challenges.php The contest theme is:
Objective : Music and Movement
Create a game using basic geometry that is connected to sound in some way. An example would be to move a pointer through an invisible 2D maze making some form of audio become louder or quieter depending how close to the wall they are.
I’m using this opportunity to improve my 2D engine and sound capability. I only need to solve a couple of interface issues and the program can serve as a framework for Walter.
Regards,
Chuck
|
|
|
|
|
Posted: Jul 30, 2010 |
[ # 158 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Best of luck to you, Chuck! I’m rootin’ fer ya!
|
|
|
|
|
Posted: Aug 2, 2010 |
[ # 159 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
*Update*
Took a break from game programming to spend time on my numbers module. It occurred to me that the design I created and tested (results posted above) is inadequate. I’m laying out a new design (data structures) and functions to facilitate improved management of lists with numbers, handling of calculations, and retrieval of information.
I’m also working on a pattern matching class. Here’s a concept I can share.
Word A: = effect
Word B: = affect
These two words are often confused. Despite the differences in meaning English readers allow for the mistake and continue to read. This may break input pattern matching. So…my concept and solution.
To determine a match I will tabulate a score. The reference word is A or ‘effect’. User types ‘affect’.
* 2 pts for each letter pair that matches sequence of reference word. [4 matches x 2 pts = 8 pts]
* 1 pt for each letter in word B that is in A. [5 letters x 1 pt = 5pts]
* 2 pts if the first letter matches. [0 first letter match x 2 pts = 0 pts]
* 2 pts if the second letter matches. [1 last letter match x 2 pts = 2 pts]
A perfect match is 20 pts. This match yields 15 pts or a 75%
So, I want to allow for common spelling errors without having to store each misspelled word.
Any thoughts?
Regards,
Chuck
|
|
|
|
|
Posted: Aug 2, 2010 |
[ # 160 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
I’m not sure, Chuck. Morti handles spelling errors by looking up a list of commonly misspelled words in his database, and storing them, along with proper spelling, in an array, and then uses simple substitution to correct “known” spelling errors. Currently, the database holds only about 900 words, so performance isn’t affected all that much. But if I were to expand on that list, the response times would increase to an unacceptable level. this is an issue that I also need to address, but my lack of experience in this area somewhat limits me.
I think, however, that there will need to be some knowledge of context, when using a scoring system like that which you’ve outlined. Consider the following phrases:
“I’m hoping to positively affect the outcome.”
“I’m hoping to have a positive effect on the outcome.”
Of course, the largest difference here is the whole “noun/verb” thing, so in this instance, the scoring would have to take that into account. The challenge seems to lie more in Victor’s area of expertise, to be perfectly honest.
|
|
|
|
|
Posted: Aug 3, 2010 |
[ # 161 ]
|
|
Member
Total posts: 2
Joined: Aug 3, 2010
|
In terms of spelling correction scoring, you may want to take a look at the (minimum) edit distance algorithm. The goal is to have a simple measure for ‘how far’ two strings are apart. At it’s most basic, insertions, deletions, and substitutions all have a particular fixed penalty (for example, 1 point for any insert/delete/substitute operation or 1 point for insert/delete, 2 for substitute). More complex versions will take into account that different errors are more or less likely. For typed text, position on the keyboard might affect the measure for particular errors (substituting m for n is a more likely typo than m for q and so would receive a lower score).
Edit distances can be weighted with some measure of the probability of a given string. This can be a basic probability over the entire language (just the raw probability of a given string) or some variety of N-gram model to measure the likelihood of a word given it’s immediate context. This could help with the affect/effect case (although, depending on the corpora used for training, the results may reinforce common errors rather than catch them).
There are apparently a lot of tricks to upping accuracy and performance and some of the biggest systems supposedly use entirely different categories of methods, but hopefully this basic info is useful.
|
|
|
|
|
Posted: Aug 3, 2010 |
[ # 162 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
For now, misspelled words is my lowest priority. I feel that kind of functionality is a huge distraction from the core functionality of bot development. Currently, my bot simply reports which portions of the input text that it did not recognize (which words or terms it has no information on, that is, if it does not know which words are nouns, verbs, adjectives, etc).
This of course will not ‘cut it’ for the Loebner prize, and that is why, I, myself, have pretty much changed direction or objectives with my bot’s development to more practical research - more towards the development of something useful.
At most, I will allow a single string of characters to be unknown, but I do not want to support millions of permutations of guessing correct word spelling. The other reason is that there is no way to differentiate a misspelled word with a word that the system simply does not know yet. If entering text, and a word is not known/misspelled, it will have a red underline, and the user can click and select the word they want. This way, the main bot core engine does not have to deal with that; you only need to develop that ‘spelling suggestion filtering front end’.
If the word is not misspelled, they can (when I develop a GUI), click the ‘menu’ and select ‘new word’ and then enter the text for the new word. Then, use that word in their input.
Later, and we’re talking about maybe late 2011 or more likely 2012, I may have a ‘spelling wrapper’ around the whole thing, which will of course add a huge burden of processing and will likely require the use of several computer systems networked together in order to parallel process the input.
As you mentioned Dave, yes, my bot will be able to consider whether the best spelled word is a noun, verb, adjective, etc. What it can do is, for each one of those, try the misspelled word as each part of speech, get the highest merit of each, and decide from there. This is, again, assuming that word is even misspelled and not simply a word it hasn’t seen or doesn’t know yet. The word could even be an acronym the user just made up on the fly.
Another fundamental question is whether your bot is using simple templates or completely free form grammar. With templates, perhaps the extra processing for spell check won’t be much of a problem. But once the bot’s vocabulary increases to 30,000+ words, i think you will run into problems and your bot will take unacceptable amounts of time to respond. Now, if you are not using simple templates, and allowing for completely free form like I am in my bot, then this functionality is absolutely out of the question, unless, the system is massively parallel, with something like 10 , 20 or more CPUs working concurrently on different parts of the problem. The reason being is that, a recent test with my bot using a ‘complex sentence’ (defined in English grammar) of a 22 word input, took 33 seconds and generated 2,808 different parse trees. If 6 of those words were unknown or misspelled, and even if each one had say 3 different words that they could equally be (based on your spell check compare algorithm), then you are taking about 3 *to-power-of-* 6 TIMES 2,808 (each spell check suggestion times all parse trees)... not going to be possible.. nope, not with a single CPU von neumann architecture computer system !! However, if your bot doesn’t allow a sentence more than say 8 words, using simple templates and a limit of say 5000 words, then perhaps you’ll be ok.
Dave - given all these complications (amount of CPU cycles required and not being able to tell the difference between not-yet-known word and misspelled word), I think your approach is best. I will store a set of commonly misspelled words and what they equate to. For example, unless context tells it differently, “teh” will be considered misspelled version of “the”.
Justin - the algorithm you give sounds like it has merit. When I have spent another year or two on core functionality, I will probably employ it… someday !
|
|
|
|
|
Posted: Aug 3, 2010 |
[ # 163 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Justin,
Welcome to the forums. Thanks for sharing the ‘edit distance algorithm’. I’ll certainly give that a look. I need to learn more about ‘insert/delete/substitute’ operations. Not really sure what that means in this context. However I appreciate the points. Assigning point values is what I’m attempting.
Dave,
I like the spelling list idea. In addition to misspelled words I’m concerned about the use of homonyms (I think that’s the correct word). I notice it is common for folks to select the wrong word when writing. Now regarding speech, the meaning is understood. But swapping a word like ‘here’ with ‘hear’ or ‘to’ with ‘too’ in writing alters the meaning of the sentence. However, when we read a sentence in which this occurs we still grasp the intended meaning. I notice I do this a lot.
Vic,
I agree that you should avoid spelling checks for the moment. It’s safe to assume correct spelling and grammar for the purpose of developing your bot. Like you wrote, there may come a time in the future when that bit is working and then you’ll need to make your bot smart so it can interact with us ‘real’ humans.
I’m thinking of a modified stimulus-response system. Here are a few points.
* I like stimulus patterns for common expressions. If the human input matches a predefined pattern then that makes much more sense then ‘free form grammar’ with 2000+ parse trees. However…
* I don’t necessarily like ‘canned’ responses…even with mixed responses using randomization. I do think these responses are okay when you’re not interested in a meaningful conversation. Kind of like two people passing by “How’s it going?” and “Fine thanks!”....or your wife is trying to talk with you and you respond ‘Yes dear!” =) I like how Dave is trying extend Morti’s capability with the telescope and other such features. It would be interesting to see how far this stimulus-response technology can be pushed.
* I would like to take user inputs (spelling errors and homonyms) and extract intended ‘meaning’. In Walter’s case, the meaning will be associated to retrieving data about his world and allowing Walter to manipulate this data through calculations, comparisons, logical deductions, etc. Then the results are translated into a human-like response.
* Regarding free form grammer bots. I find this technology very interesting. It seems most useful in evaluating human input in which there are no stored stimulus patterns…instead of issuing a canned “I don’t understand” responses. I imagine a small percentage of human chatting is not predictable so this capability would make the bot seem ‘most intelligent’.
All,
I’m starting to see Walter and his world as a ‘living and dynamic database’. Conversation with Walter is all about learning of his world and drawing numerically based conclusions translated into human language. E.g.
* What is the temperature? - 98°
* How far is the ranger station from the swamp? - About 400 meters.
* How many geese have you seen today? - I saw 7….3 in the marsh and 4 flying south.
* Are you available today at 3 pm to chat? - I’m sorry. I need to catch and tag two new bear cubs.
It’s all still conceptual at the moment.
Thanks for the input.
Regards,
Chuck
|
|
|
|
|
Posted: Aug 4, 2010 |
[ # 164 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Chuck,
So I’ve decided there will be two flavors of my bot - a ‘Turing Test’ (which I’m also calling the “Eliza +” mode) and the FTU (full text understanding mode).
For FTU mode, unless all words are known, it won’t let you submit your input. Instead all words that are unknown will show a red underline, and you can click the word you want to correct. OR.. if that word is not in the list because the system doesn’t know it, you will click “Add Word” on the menu (when I make a GUI !! lol). Also, FTU mode may work like google, where, as you type, it will auto-display suggestions of words it knows. Otherwise, you will be always guessing what words does this bot know ?
Now, you can’t get away with that in a Turing test (instant bust), so “Eliza+” mode will first identify to itself all unknown/misspelled words, and simply consider the unknowns as nouns. (later, I’ll add a spell check, but I’m not sure how much CPU power that will take, given all the other work the system has to do !!!)
I call it Eliza+ because, even if it is basically ignoring words it doesn’t know, it will do much, much more than Eliza did - that program only used keywords, my bot will do advanced grammatical analysis.
Seriously though, I think having to deal with so many issues all simultaneously when developing, so it can pass a Turing test.. is a distraction from real, profitable research !
|
|
|
|
|
Posted: Aug 4, 2010 |
[ # 165 ]
|
|
Senior member
Total posts: 971
Joined: Aug 14, 2006
|
I’m a little bit lost. Victor: is this about chatbot? Suggestion: shall we set up different threads for different projects? We can even setup a category ‘chatbot projects’: Walter, Morti, and your bot could be threads in this category. What do you think? Sorry for being so strict again (I’m back )
|
|
|
|