AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Misspelled words
 
 

Interested in comments on various ways of handing spelling errors.  Spell check algorithms, etc

 

 
  [ # 1 ]

Two methods I’ve used:

1. A table of misspellings mapped to correct spellings, e.g. freind—> friend.

The problem with this is that it’s a never ending job to update the table. Also, do you include abbreviations etc or should these be in a separate table? And where do you stop? Do you include things like little—> small?

2. Approximate matching. Each word is checked against a list of words and if it doesn’t match any of them, the most similar word is substituted.

The problem with this is how to define the “most similar” word. One way is to use an edit distance similarity metric, but this can sometimes give silly results. For instance, teh—> ten when it’s obviously (to a human) a transposition of “the”.

Perhaps the best approach would be to combine both methods. Method 1 could cope with the most frequently occurring errors and method 2 could act as a backstop.

David

 

 
  [ # 2 ]

That was basically the approach I was going to take, seems to make sense.  Why can’t humans just spell correctly!!!!!  My engine is very CPU intense, evaluating many parse trees.  To further tax it with spell check is just such a pain !!

I think I will go with your suggestion.  The acronym issue is another big pain.  I think I will keep that in a separate table and it will be “field of study"specific.  So, first it will figure out the words it knows and does not know.  Of those words it does not know (after the first pass by methods you suggest),  it will then consult the field-of-study-specific acronym list.  So if the topic s say sports, then it will figure “O.T.” is overtime.

Of course my solution would be completely different depending on whether the system is being used to attempt to pass a turing test versus a useful in-production application.  In “the real world” I would simply underline the misspelled word (just like this chatbot.org text box I’m writing in now does).  The user would not be able to submit until the system knows every word in the sentence, thus misspelled/unknown words would be a non-issue.

 

 
  [ # 3 ]

@victor. It gets even worse on speech

I’ve have my right hand done hand surgery. Therefore I use speech reccognition software. But even in Dutch, my native language which I speak without a regional accent (ABN = Algemen Beschaafd Nederlands) I apparently pronounce not 100% correctly. It picks wrong words and sometimes it even picks total wrong sentences. In English, it’s a real pain in the a*s. You don’t want to know how often I had to say the words ‘scroll’ but that’s probably due to my accent.

On the other hand ( grin ) the speech program might choose the wrong words, but they are always spelled correctly. So evenually, creating speechbots, this shouldn’t be the problem any longer. 

(this reply took me 10 minutes :-s)

 

 
  [ # 4 ]

oh my god I’m not even considering a speech recognition “front end” to my bot… only so much you can do .. . in a LIFE TIME!  grin

Although, perhaps, an idea for that would be a “user specific over-ride”.

So a specific user could train it, so it would know this specific audio wave form (with your accent etc) maps to this specific word which would override the “global default” wave-form-to-text mapping.

 

 
  [ # 5 ]

Good questiona and solution. How about a 3rd point of “meaning”? Say, convert “teh apples” to 2 options: “the apples” and “ten apples”, then pick the best according to context and subject?

 

 
  [ # 6 ]

Yes, the system could get ‘hints’ from several sources to determine the spelling that makes the most sense.  Context provided by the sentence structure, context provided by the state of the conversation, example does what the computer said just before what the user said give us any clues?.  It just means more permutations.  I’m not going to worry about it too much for now though, but I appreciate everyone’s comments so far.

 

 
  [ # 7 ]

right, and knowing the context will not only be helpful for spelling but also for response…good luck.

 

 
  [ # 8 ]

One way is to use bubblesorting. For example a user says “Bey” to your bot. The bot then switches the characters to find a match. For example:
B > E. Result: Eby. No match.
B > Y. Result: Eyb. No match.
E > Y. Result: Yeb. No match.
E > B. Result: Ybe. No match.
Y > b. Result: Bye. Found match: Bye.

Another way is to have your bot ask what the user means when it doesn’t recognise a word. Like Google does when you search for an unknown phrase it asks “Did you mean: *** ?”

 

 
  [ # 9 ]

Just found a Java spelling API, thought you might be interested in it.
JOrtho - a Java spell checking library: http://www.inetsoftware.de/other-products/jortho

 

 
  [ # 10 ]
Victor Shulist - Apr 16, 2010:

Interested in comments on various ways of handing spelling errors.  Spell check algorithms, etc

You mean “typo” versus “tpyo”?

Or “einkommensteuer” versus “einkomenssteuer”?

t(yp|py)o

einkomm?enss?teuer

as simple as that

@David:

please do not mix typo-handling with synonyms-handling. It will just give you trouble. Address them both, but separately.


Richard

 

 
  login or register to react