Hi,
I am not expert in Rivescript, but faced this problem before, with early-AIML system about 10 years ago.
I was wanting to make a Spanish chatbot, and the real problem was not only the spellchecking but the whole system of simlpe pattern matching is collapsing on any flexionfull language like Spanish, French, Italian, Portuguese, etc.
Each word in those languages has too many forms, and thus, using a simple pattern yields failure (lots of missing or you are urged make lots of patterns - too much quantity for the system to be responsive…) and the combinational power of AIML reduction systems goes to hell!
Therefore I decided to build my own dictionary, capable of dealing with Spanish (and other’s) inflections. But the way spanish is used, too many compund verbs, too much adjectives, including gender, number etc. forced me to walk out of simple pattern matching schemas. Then I decided to desing a new system, then there was no chatscript nor rivescript, or at least I did not met them, and begun to construct a whole new concept, a semantic+sintactic pattern matcher. Also the dialog flow of AIML was not too contollable, adn the flow of the conversations is too weak for me.
The inflection-capable dictionary worked fine! but there remained several problems, as Spanish has too many diacritics (written accents and marks) and the people do not use them a lot, the pattern matching capability was severly mined by the simple spelling errors, which are too usual on chat users. So I went into a second challenge: a spanish one-time spellcorrector (I mean one-time because Word fo example, only corrects giving you a list of possible words, and you are not there to select which!) as well MS-word spells words too solwly (I tried to use it as backend, in a COM-model, I got severly dissappointed! because of the spelling speed was only 2-3 words per second!!!) and the precision was less then 60%, so get 2 of 3 words corrected soppending 3 seconds for each word is dull!
Then as second stage I faced to build a Spanish Spell-Correction system, capable of correcting at least 90% of the input word-stream and being successfully fast, at least 10 words/second! - was my goal!
After 4 years of hard work including my These on NLP-Engineering Electronics at University of Buenos Aires done with 2 Engineers+PhD/doctors, a recognised lingüist PhD and very much research, many International Research Publications, I succeeded (I guess) with a fair Spellcorrector (not spellchecker),
It is now capable of fixing severe spelling errors in context (close-sintactic, not yet semantic) at a speed of up to 900 words/second about 300 times better than word in many ways, and with a precision of at least 98% of the most common spell-errors like diacritics, also capable of handling multiple errors on the same word, making sound-like (phonetic) corrections. As an example you may spell (in Spanish) VAYEMA meaning BALLENA and you got 67% misspelled characters, including an addition, and the system desn’t hesitate to give a unique correct answer.
As a side effect, the system is capable of correcting well written words, ¿what is this? - if the words are well written but sintactically not well placed, and another word, with a spell-error is the best chance for that place, it changes it - no doubt- like the following sentence: “el komia carrne ezta manana” -> “él comía carne esta mañana” in a brize!
I designed a Dialog Description Language acronymed: DDL, a whole new concept, like AIML but not a XML-like pattern matching scritpt, instead of this crap I designed a whole new language, defining a regular syntax and building afterwards a full flagged compilator (not interpreter) to engage it for speed. It has over 100 built-in lingüistic functions, is extensible, can include any .net code and is faced towards high availability and speed. So robust thet you can change parts of a compiled + running program on-the-fly, dynnamically recompiling all internal links, like no other known computer language. (the main idea taken from the web frameworks, where you throw a new source an the system recompiles it an places it to work)
After 3 years, the system includes many state-of-the art semantic matching algorithms, including SVM + LSA methods, for matching whole buchs of text, training them on-the fly (as well as you throw the new text into the repository) and allowing to access a whole SQL-selected data from any database as a single pattern, including phonetic+spellcorrection. We also built a question-classifier, capable of extracting the subject and matter of any question, even of ill-behaved grammar, the F-Score of this multiple classifier, is over 88% being close to 100% for person, time places, faling only on common objects vs. matters.
to be continued…