AI Zone: chatbots.org

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Misspelled words

Posted: Apr 16, 2010

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

Interested in comments on various ways of handing spelling errors. Spell check algorithms, etc

Posted: Apr 17, 2010

[ # 1 ]

David Hamill

Member

Total posts: 20

Joined: Aug 17, 2009

Two methods I’ve used:

1. A table of misspellings mapped to correct spellings, e.g. freind—> friend.

The problem with this is that it’s a never ending job to update the table. Also, do you include abbreviations etc or should these be in a separate table? And where do you stop? Do you include things like little—> small?

2. Approximate matching. Each word is checked against a list of words and if it doesn’t match any of them, the most similar word is substituted.

The problem with this is how to define the “most similar” word. One way is to use an edit distance similarity metric, but this can sometimes give silly results. For instance, teh—> ten when it’s obviously (to a human) a transposition of “the”.

Perhaps the best approach would be to combine both methods. Method 1 could cope with the most frequently occurring errors and method 2 could act as a backstop.

David

Posted: Apr 19, 2010

[ # 2 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

That was basically the approach I was going to take, seems to make sense. Why can’t humans just spell correctly!!!!! My engine is very CPU intense, evaluating many parse trees. To further tax it with spell check is just such a pain !!

I think I will go with your suggestion. The acronym issue is another big pain. I think I will keep that in a separate table and it will be “field of study"specific. So, first it will figure out the words it knows and does not know. Of those words it does not know (after the first pass by methods you suggest), it will then consult the field-of-study-specific acronym list. So if the topic s say sports, then it will figure “O.T.” is overtime.

Of course my solution would be completely different depending on whether the system is being used to attempt to pass a turing test versus a useful in-production application. In “the real world” I would simply underline the misspelled word (just like this chatbot.org text box I’m writing in now does). The user would not be able to submit until the system knows every word in the sentence, thus misspelled/unknown words would be a non-issue.

Posted: Apr 19, 2010

[ # 3 ]

Erwin van Lun

Senior member

Total posts: 971

Joined: Aug 14, 2006

E-mail Erwin1

@victor. It gets even worse on speech

I’ve have my right hand done hand surgery. Therefore I use speech reccognition software. But even in Dutch, my native language which I speak without a regional accent (ABN = Algemen Beschaafd Nederlands) I apparently pronounce not 100% correctly. It picks wrong words and sometimes it even picks total wrong sentences. In English, it’s a real pain in the a*s. You don’t want to know how often I had to say the words ‘scroll’ but that’s probably due to my accent.

On the other hand ( ) the speech program might choose the wrong words, but they are always spelled correctly. So evenually, creating speechbots, this shouldn’t be the problem any longer.

(this reply took me 10 minutes :-s)

Posted: Apr 20, 2010

[ # 4 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

oh my god I’m not even considering a speech recognition “front end” to my bot… only so much you can do .. . in a LIFE TIME!

Although, perhaps, an idea for that would be a “user specific over-ride”.

So a specific user could train it, so it would know this specific audio wave form (with your accent etc) maps to this specific word which would override the “global default” wave-form-to-text mapping.

Posted: Apr 20, 2010

[ # 5 ]

John Li

Member

Total posts: 22

Joined: Apr 20, 2010

E-mail John

Good questiona and solution. How about a 3rd point of “meaning”? Say, convert “teh apples” to 2 options: “the apples” and “ten apples”, then pick the best according to context and subject?

Posted: Apr 22, 2010

[ # 6 ]

Victor Shulist

Senior member

Total posts: 974

Joined: Oct 21, 2009

E-mail Victor

Yes, the system could get ‘hints’ from several sources to determine the spelling that makes the most sense. Context provided by the sentence structure, context provided by the state of the conversation, example does what the computer said just before what the user said give us any clues?. It just means more permutations. I’m not going to worry about it too much for now though, but I appreciate everyone’s comments so far.

Posted: Apr 26, 2010

[ # 7 ]

John Li

Member

Total posts: 22

Joined: Apr 20, 2010

E-mail John

right, and knowing the context will not only be helpful for spelling but also for response…good luck.

Posted: May 14, 2010

[ # 8 ]

Roy van Berkum

Member

Total posts: 7

Joined: May 5, 2010

E-mail Roy

One way is to use bubblesorting. For example a user says “Bey” to your bot. The bot then switches the characters to find a match. For example:
B > E. Result: Eby. No match.
B > Y. Result: Eyb. No match.
E > Y. Result: Yeb. No match.
E > B. Result: Ybe. No match.
Y > b. Result: Bye. Found match: Bye.

Another way is to have your bot ask what the user means when it doesn’t recognise a word. Like Google does when you search for an unknown phrase it asks “Did you mean: *** ?”

Posted: May 14, 2010

[ # 9 ]

Roy van Berkum

Member

Total posts: 7

Joined: May 5, 2010

E-mail Roy

Just found a Java spelling API, thought you might be interested in it.
JOrtho - a Java spell checking library: http://www.inetsoftware.de/other-products/jortho

Posted: Jun 3, 2010

[ # 10 ]

Richard Jelinek

Experienced member

Total posts: 42

Joined: Oct 16, 2009

E-mail Richard

Victor Shulist - Apr 16, 2010:
Interested in comments on various ways of handing spelling errors. Spell check algorithms, etc

You mean “typo” versus “tpyo”?

Or “einkommensteuer” versus “einkomenssteuer”?

t(yp|py)o

einkomm?enss?teuer

as simple as that

@David:

please do not mix typo-handling with synonyms-handling. It will just give you trouble. Address them both, but separately.

Richard

‹‹ Research on Chatbots Size of the virtual agent market ››

Search the Forum

Forum Profile

Forum Subscription

Forum Moderators

On Our Admin Forums

Partner Forums

Science Statistics

Chatbot Statistics