|
Posted: Feb 15, 2011 |
[ # 16 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Steve Worswick - Feb 14, 2011:
Unfortunately, if I turned everyone away who spoke like that, I would have a fraction of the chatlogs I have now. Keeping up with teen talk is a necessary evil and so I can’t place too much importance on spelling and grammar.
For me, quality, not quantity
Yeah, so it depends on your target audience for sure. My bot won’t care, it will just say, I don’t know what “m8” is.
Well, first version anyway. Later it will do spell check and ask : “m8: do you mean ‘me’ ?’ or something.
And if they don’t like it, I couldn’t care less if they leave and don’t come back
for me, I’m not going to go into “Eliza” mode, picking only words I *do* know of the bunch of misspelled crap. Too high of a prize to pay to satisfy stupid kids.
I know a lot of teenagers, and they spell as well as I do. Those are the ones that can speak with my bot.
Because seriously, I know people that check their chatbot logs - and the kid that can’t put a little effort into typing his words, and doesn’t care, is only there basically to “screw around” with your bot, and end up saying stupid stuff like “you’re gay”.
So, for me, my audience is going to be people that WANT to talk to the bot, and are interested in it (and have a brain).
Sorry for the rant
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 17 ]
|
|
Senior member
Total posts: 971
Joined: Aug 14, 2006
|
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 18 ]
|
|
Administrator
Total posts: 2048
Joined: Jun 25, 2010
|
Please feel free to grab it if it’s any use.
My main chatbot is on an internet games site and so I don’t get many visitors asking about “case based reasoning” and “natural language processing”. It’s usually asking her if she is gay and if she will sleep with them. Ah well, give them what they want I suppose.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 19 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Steve Worswick - Feb 14, 2011: awrite m8, howz u 2nite?
I’m interested, as a human being, what do you make of that?
Looking at it, I can convert the “howz u 2nite?” to “how are you tonight?”
In my bot, I will support “u” = “you” and “2nite” = “tonight”, that’s not bad,
but what about the “awrite m8” ? You have to wonder about the mental state of the author. I can’t make sense of it, and I’m a human.
Steve Worswick - Feb 14, 2011:
u dum bot
Funny thing is, you’d have to do a spell check in order to know your bot was being insulted, in order to counter the insult.
Steve Worswick - Feb 14, 2011:
I once made a list of how many different ways my users had said FAVORITE/FAVOURITE:
FAFORITE
FAOURITE
FAOURTIE
...
...
So I believe those examples probably should be handled by the bot. Again, within reason. I am not worrying about spell check for now, until I am done all the core, and most important things. But if someone types in a lengthly 20+ word sentence and only gets that word wrong, we should be forgiving and ask “Did you mean _____?”
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 20 ]
|
|
Experienced member
Total posts: 61
Joined: Jan 2, 2011
|
Dave Morton - Feb 15, 2011: If I were to keep “up to date” with just the substitutions/spell-check list, I would be spending at least a couple of hours every day. Perhaps I should write a script that gathers up words, and then maps unknown words to the “most likely” correct spelling. Hmmm…
I have a feeling the word “most” will appear in front of the word “favorite” in many sentences.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 21 ]
|
|
Senior member
Total posts: 494
Joined: Jan 27, 2011
|
This discussion again shows the ‘trap’ of using ‘grammatical parsing’ as the base for an AI-system. There is no end to all the little grammatical deviations you must be able to handle. You will have to keep on adding more logic to your parser.
Victor Shulist - Feb 15, 2011: But if someone types in a lengthly 20+ word sentence and only gets that word wrong, we should be forgiving and ask “Did you mean _____?”
My system would answer such a sentence (with that ‘one wrong word’) with ‘I understand’, because it would be able to comprehend the ‘meaning’ even with misspelled words, and then just formulate a response.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 22 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Hans Peter Willems - Feb 15, 2011: This discussion again shows the ‘trap’ of using ‘grammatical parsing’ as the base for an AI-system…
My system would answer such a sentence (with that ‘one wrong word’) with ‘I understand’, because it would be able to comprehend the ‘meaning’ even with misspelled words, and then just formulate a response.
That’s correct, but in Victor’s case, a database of commonly misspelled words can be accessed as a “first step” toward resolving an unknown word, and if found, replaced with the correct spelling. If the unknown word isn’t found in the spell checking routine, then it can ask for a correction or definition. That’s a rather simple (though potentially time consuming) step to take. And Victor has it easy, because he doesn’t have to run every word through the routine. Only the ones that CLUES can’t figure out. Morti has to run the substitution engine first. which means that every word is checked, and that’s going to take some time. I think I’ll make up an array of the 100 to 500 most common words in Morti’s pattern database (input side), and compare the words in the input to see if they exist in the array. That will make things a LOT faster, I think.
I found out that Wikipedia has a list of commonly misspelled words, intended to be used to correct typos within the Wiki pages. This list is nice, because it’s one of very few that I’ve seen that provides the misspellings, along with the proper spelling/spellings. And the list is long, too. I’ve got a total of 5, 023 words that I’m combing through, to see if there are any that I can add to Morti’s substitution engine/spell checker. It’s going to be a lot of work, but I think it’s going to be well worth it.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 23 ]
|
|
Senior member
Total posts: 494
Joined: Jan 27, 2011
|
Dave Morton - Feb 15, 2011: That’s correct, but in Victor’s case, a database of commonly misspelled words can be accessed as a “first step” toward resolving an unknown word, and if found, replaced with the correct spelling. If the unknown word isn’t found in the spell checking routine, then it can ask for a correction or definition.
Indeed, and this is actually not much different from the approach that I’m taking. The difference is in the underlying ‘mind-model’ that executes this logic. Working from grammar, it seems logical to write a ‘spell-checking’ algo to handle misspellings. However, in my own model there is no such thing; it handles misspelling like any other ‘representation’ of a concept (including ‘tags’ with relation to experience) . So the handling up front (check for existence, if not ask for clarification) is similar but the underlying paradigm is totally different.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 24 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
I’m planning to use Damerau-Levenshtein distances as a first step to correct misspelled/mistyped words. And it seems not too difficult for the bot to compose it’s own list of commonly misspelled words if it interacts with users and asks if its correction guesses are right.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 25 ]
|
|
Experienced member
Total posts: 61
Joined: Jan 2, 2011
|
The more I think about it the more I realize that correcting spelling is a good measure of intelligence of a bot, if it is corrected in a non-mechanical way. If the bot understands the meaning of the sentence there’s a good chance it can correct the spelling.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 26 ]
|
|
Senior member
Total posts: 494
Joined: Jan 27, 2011
|
Toby Graves - Feb 15, 2011: If the bot understands the meaning of the sentence there’s a good chance it can correct the spelling.
My model is aimed at this exactly: it will (at some point) understand the meaning of the sentence even if it is misspelled. So it will ‘correct the spelling’ in an ‘understanding way’. There is no need to actually ‘correct’ anything (as in: feedback to the user, unless the AI is a teacher by itself), the AI will already understand what it meant but that some things where misspelled.
Based on my model the AI will not only be able to ‘understand’ misspelling, but also be able to translate freely (as in: not based on grammar but based on understanding) between languages.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 27 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Hans Peter Willems - Feb 15, 2011: This discussion again shows the ‘trap’ of using ‘grammatical parsing’ as the base for an AI-system.
On the contrary by good sir, grammar will actually help give CLUES all kinds of hints as to what the correct spelling of a word could be. You see, grammar feeds into the semantic system, which can provide hints as to context of the word.
Later, when I have the spell check algorithm, when I write one, (or probably just research existing ones, since that’s been done long ago), the hints from the semantic inference (which is powered by grammar), the system will know which word is more likely the one the user wanted.
If we have several possible words the user could have meant, but no context, it is pretty much a guess. But when grammar provides semantic context suggestions, the choice is much easier.
@Dave - CLUES will first generate the parse trees. Misspelled words (which are the same as unknown words really), are marked as being nouns (as a first guess), but also marked as “assumed to be nouns”. Later, it will try, based on context, that perhaps that unknown word is a verb. I want to handle unknown words, because otherwise, it cannot learn via NLP. For example, if it has never seen “France”, I want to be able to enter:
“France is a country in Europe”
France will be marked as “assumed to be a noun”. Then, if the particular code that ends up dealing with that statement doesn’t require the subject to be a known noun, it will update its database. CLUES knows what “is a country in Europe” means (no, not atomically, it figures out the grammar and semantic tree), thus it knows to update its database so that next time it knows “France” is a noun (and of course a country in Europe). So later if you asked “List the countries in Europe” it would be included in the response.
Then, knowing what France means, it could learn
“Paris is a city in France”
the same way
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 28 ]
|
|
Senior member
Total posts: 494
Joined: Jan 27, 2011
|
Victor Shulist - Feb 15, 2011: the hints from the semantic inference (which is powered by grammar), the system will know which word is more likely the one the user wanted.
But this is still only grammatical inference based on how a ‘correct sentence’ is assembled. Your bot still won’t gain any ‘understanding’ about what the user is trying to say. It can only calculate ‘the sentence’ because that is what grammar is; calculating the correctness of a sentence based on fixed rules.
Example: when the user puts in ‘I don’t want to talk with you right now’, based on only grammar there is no way for a bot to figure out what the correct response on that should be. For a bot to be able to respond correctly to that, it must have understanding about what that means to the user himself. No grammar-based model is going to help you there.
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 29 ]
|
|
Experienced member
Total posts: 66
Joined: Feb 11, 2011
|
Well, I can not cast stones
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a total mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh?
|
|
|
|
|
Posted: Feb 15, 2011 |
[ # 30 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Again, I will state, yes it is important to have spell check. But let’s do first things first, core logic.
The rest is “layers and layers” of processing.
Spell check for me is later on.
Good example though !!
|
|
|
|