Very well put Thunder Walk!
But we wouldn’t say the AI fails if it doesn’t understand a new way of someone saying something would we? I mean, would a human child learning language, understand every possible way things could be said. Carrying a conversation, and learning via conversation, these new synonyms and sentence structures that mean the same thing, would be the ultimate goal of NLP. I can imagine how difficult a young child would have in parsing and understanding these sloppy and confusing English I see on some people’s facebook walls!
The way I am handing all these complexities in my design is with what I’m calling “permutation wrappers”. I will first complete a bot that can understand language and expression, that can derive a common meaning regardless of whether any synonyms are used and regardless of what sentence structure is used. Then, another permutation wrapper will, if it fails to derive the meaning, will start replacing different words, it could try replacing common prepositions, perhaps replacing ‘of’ with ‘from’, test again, ok, does the input generate a parse tree that makes sense now? Yes, go to next stage of processing, No, try another permutation wrapper, which may be, trying different punctuation marks, still doesn’t make sense, ok, next permutation wrapper is misspelled words.
Another permutation wrapper I have to work on is the apostrophe problem.
X’s Y
Should that expand to:
a) X is Y - example, if X=John Y=Smart, so X’s Y = John is smart.
OR
b) stay as is, example X=John, Y=Car, so X’s Y = John’s car
But we don’t want “John’s car” to expand to “John is car”.
So basically I am working on a CORE functionality, where yes, you will have to use proper English, proper words (teach / learn), correct spelling, etc.
THEN, layer by layer, add on the “pre-processing” stages to run through the algorithm multiple times with different words replaced (‘learning’ does generate any meaningful parse trees, ok, people often incorrectly use ‘learning’ instead).
The CPU, computing power you will require will depend on how tolerant your system is to incorrect usage. For my system, CLUES, I believe it will be fast enough even on single thread with proper English. I will know within the next 1-3 years (when i have say 10 layers of permutation wrapper/pre-processing), what it will be like. I’m not really too worried about it though, now that I have taken the time to convert the engine to C++, I can make use of multi-threading (Perl multi threading really let me down!).
As a last note, consider a comparison of a calculator with an NLP engine - it is like we want a calculator to, when I would enter say 10 + 10 and it would say 20, I’d say WRONG!! FAIL !! I “misspelled” the first 10, I *mean* 100, you’re system didn’t know that, so it fails. That is about how unfair NLP is !! Or perhaps every single word in a sentence is misspelled or has some vernacular definition the system didn’t know about.
I say we don’t worry about those things, when first building a chat bot. Have it learn, common, normal, proper English first. Then add on the layers of pre-processing (permutation wrappers), one by one, and cross your fingers you have enough horse power to run it!! perhaps on 10,000 machines in cloud computing. Watson for example has 3,000 cores!
Sorry if there are errors in the above, please point them out, but I don’t have the time for proof reading Short day and I’m dieing to get back to coding!!