|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
One of the real challenges to any sort of chatbot lies in “understanding” the spirit of the intended input, even if it’s not spelled correctly. One of the simplest (though certainly not the “best”) is to process each word against a database of known commonly misspelled words and their properly spelled replacements, making substitutions as needed. Now this is quite obviously far from perfect, even if there were an algorithm to take context into account (e.g. there/their/they’re, pair/pear/pare, etc.).
However, even though it’s far from perfect, having at least some minimal form of spelling correction is essential to contributing to the success or failure of any type of Turing test. What we humans to without thinking about when we interpret the following statements:
“Fred went for a wauk in the citty, stoped at a flower shop, and baught a boukay of daisies.”
“Paul and Becky went to the park to celebrate there birthday they’re. They had a pare of pairs each, and now their going home.”
can be an impossible task for the average chat agent. While a spell checker may be able to correct the first statement without much difficulty, the second would prove impossible, since there isn’t a “spelling problem”, but one of semantics, instead. Either way, in order to pass a Turing test at all, I think that both of these types of challenges need to be overcome in some form.
Ok, slipped a little bit off topic, but Andy’s post brought this to mind, and I felt it was at least a little relevant.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 1 ]
|
|
Senior member
Total posts: 141
Joined: Apr 24, 2011
|
@Dave
I am with you! completely. Good comment! thank you!
I can comment also that my efforts towards a good spellchecker were rewarded because I could present it (and was accepted) to a very good NLPCS congress in Madeira, Portugal on June 2010. (the paper is free, written in plain -no pain- English on my blog-website) here: http://web.fi.uba.ar/~ahohenda
The engine we built is - of course - language dependent (but trainable in any language) and is similar or even better than the most of the best world-class word level spell-checkers including GNU Aspell, ISpell and derived works (used in Open Office). The main problem in a bot, is that you cannot build an “optional” list of candidate words, each with some sort of “coincidence factor” because you got to test them to all patterns and the task become too huge!
Even though, the library is capable of spell-checking and correcting unknown words! based on parasynthetic evidence composition with (prefix and suffix) using valid inflections also.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 2 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Ok, now I’m jealous. The spellchecking that Morti has is a simple search/replace function that checks a table in his database of over 7,000 misspelled words. It works well enough for an AIML chatbot, but it would be nice to have something just a bit more advanced.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 3 ]
|
|
Senior member
Total posts: 141
Joined: Apr 24, 2011
|
@Dave Don’t feel sad!
I can teach you how to make a much better spellchecker for AIML than ever!
Just make function to extract all of your pattern unique-words (you won’t need anything else to match).
Then build a simple phonetic index double-metaphone algorithm and get all the word-indexs, sort them along with your words. Just then build a simple edit-distance algorithm biased with letter-similarity coefficient (anything reasonable you can imagine will word fine). Push them all into a Blum filter and into a TST Trie.
When a unknown word arrives check the Blum Filter, if not there, you might have to spell-check, go to the TST trie and seek for 1 character change, evaluate against your phonetic distance, if under some threshold: you’re done! if not, check for 2..3 character change! (depends on some heuristics based on the length of you word, for example the square root of the length rounded to an integer)
This will do the best job with your AIML, fast and painless, garanteed!
enjoy!
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 4 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Dave Morton - Dec 8, 2011: One of the real challenges to any sort of chatbot lies in “understanding” the spirit of the intended input, even if it’s not spelled correctly. [...]
“Fred went for a wauk in the citty, stoped at a flower shop, and baught a boukay of daisies.”
“Paul and Becky went to the park to celebrate there birthday they’re. They had a pare of pairs each, and now their going home.”
can be an impossible task for the average chat agent. While a spell checker may be able to correct the first statement without much difficulty, the second would prove impossible, since there isn’t a “spelling problem”, but one of semantics, instead.
There are ways of hacking your way around sentences like these without relying on hand-picked lists of words. I plan to employ a variant of this idea. Though I wonder about the computation time required to work through possible variants and suss out a sentence that makes contextual sense.
Dave Morton - Dec 8, 2011: Ok, slipped a little bit off topic, but Andy’s post brought this to mind, and I felt it was at least a little relevant.
Whoops, guess I’m not helping.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 5 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Andres Hohendahl - Dec 8, 2011: Then build a simple phonetic index double-metaphone algorithm [...]
Ha, Andy beat me to it apparently.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 6 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
I’ve gone ahead and split this into it’s own thread, since we sort of strayed significantly off topic.
Andy and CR, this discussion has lead me to do a little research, and I’ve found that PHP already has Metaphone support built in. There’s also a module available to handle Double Metaphone, but it’s quite likely that most servers won’t have that particular module installed.
@Andy, you lost me at “Blum Filter”, but that’s what research is for.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 7 ]
|
|
Senior member
Total posts: 473
Joined: Aug 28, 2010
|
Dave Morton - Dec 8, 2011: @Andy, you lost me at “Blum Filter”, but that’s what research is for.
I’m pretty sure he meant “Bloom Filter”
http://en.wikipedia.org/wiki/Bloom_filter
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 8 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Thanks, Andrew. That would, of course, make more sense.
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 9 ]
|
|
Senior member
Total posts: 141
Joined: Apr 24, 2011
|
Dave Morton - Dec 8, 2011: Thanks, Andrew. That would, of course, make more sense.
Wow! what a hell of an idea-exchange! (brain hurricane!)
Thanks Andrew! of course Bloom Filtering (is a kind of unified hash, very efficient recall 100% precision quite high, good for making first chances, very fast, only N hashed done, and N is factor of number of “contained” memory and needed precision)
My error: I speak English, German and Spanish as well, so my memory is basically phonetic (as might be all of ours as humans) when you pronounce Bloom the double “oo” spelled as “u” and in German and Spanish this sound it is also written as “u” because Blume is the German word for flower, and for sure this English name derives from the term ‘flower’ in old-Anglican or old-German with a phonetic-orthographic transcript. Also blossom means something related in English, here are the roots! Also from here comes part of my concern about phonetics, on which mostly I based my spell-repair algorithm. (the former method - mentioned - was something I’ve tried, worked fine for English… but not for Spanish) :(
hope it helps!
@Andrew Congratulations for your Doctoring.. (PhD)
I am headed towards this also.. may be some day.. have no hurry!
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 10 ]
|
|
Senior member
Total posts: 141
Joined: Apr 24, 2011
|
C R Hunt - Dec 8, 2011: Dave Morton - Dec 8, 2011: One of the real challenges to any sort of chatbot lies in “understanding” the spirit of the intended input, even if it’s not spelled correctly. [...]
“Fred went for a wauk in the citty, stoped at a flower shop, and baught a boukay of daisies.”
“Paul and Becky went to the park to celebrate there birthday they’re. They had a pare of pairs each, and now their going home.”
can be an impossible task for the average chat agent. While a spell checker may be able to correct the first statement without much difficulty, the second would prove impossible, since there isn’t a “spelling problem”, but one of semantics, instead.
There are ways of hacking your way around sentences like these without relying on hand-picked lists of words. I plan to employ a variant of this idea. Though I wonder about the computation time required to work through possible variants and suss out a sentence that makes contextual sense.
Dave Morton - Dec 8, 2011: Ok, slipped a little bit off topic, but Andy’s post brought this to mind, and I felt it was at least a little relevant.
Whoops, guess I’m not helping.
The problem you mean is easily NP hard, if you plan to check all combinations, so it’s not an option to make this way, best try my method! also Metaphone is not the best phonetic-code (its a index-code, not a measure) so when you get the same number out of 2 words, you can guess they sound somehow similar, but you wont know how much “somehow” = similar!
This is the pity of all the Soundex and Meta.. family!
best!
|
|
|
|
|
Posted: Dec 8, 2011 |
[ # 11 ]
|
|
Senior member
Total posts: 250
Joined: Oct 29, 2011
|
It was my first priority when designing my chatbot. My parser first checks to see if it’s a valid word of phrase. The word(s) are checked against a 58,000 word and a 164,000 phrase database. During the parsing process, if an invalid word is found, it checks to see if the word has at least an 80% match to a valid word. If so, it marks that word to be compared in context with the rest of the input. As the parser continues, it checks for associative words and assigns grammar tags. The “taggedStr” is then processed by the interpretor where the taggedStr is further analyzed to form a reply.
It’s a complicated process that is yielding excellent results in testing thus far.
I agree that both spelling and grammar correction is the cornerstone to an effective AI conversation with a bot.
|
|
|
|
|
Posted: Dec 9, 2011 |
[ # 12 ]
|
|
Senior member
Total posts: 147
Joined: Oct 30, 2010
|
Here’s the statistical nlp (and therefore, google’s) approach: http://norvig.com/spell-correct.html
|
|
|
|
|
Posted: Dec 9, 2011 |
[ # 13 ]
|
|
Senior member
Total posts: 250
Joined: Oct 29, 2011
|
That is very similar to the approach that I have taken with the exception that mine is written in Ruby.
|
|
|
|
|
Posted: Dec 9, 2011 |
[ # 14 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
If you haven’t done it, you should implement the spellcheck=“true” tag in your input box.
|
|
|
|
|
Posted: Dec 9, 2011 |
[ # 15 ]
|
|
Senior member
Total posts: 250
Joined: Oct 29, 2011
|
Posting on an iPhone sucks, don’t you know?
|
|
|
|