Well I certainly don’t know all of the specifics as most of that is done by my linguistics and computer science collaborators, but we/they have been trying to use machine learning (ML) to help with our virtual patient dialogues for the past several years. Just using ML by itself does not work well due to the sparse data problem - we only have a few hundred dialogues with a few thousand (<15,000) questions and ML generally needs a lot more than that to be any good. As such, we have looked at ways to combine ChatScript (CS) with ML to see if we could improve our accuracy.
As we examined the questions missed by both systems, we found that the CNN (we currently use a word and character based CNN classifier but have tested others) was bad at rare questions (no surprise) but good at questions with mangled words. There are other bits to the model (for example deciding whether to suggest a match when CS does not match anything) but one of the biggest gains was on horribly misspelled words. Based on these data we built a classifier that would choose either CS or the CNN based on the probability that one or the other was correct.
The classifier is part of a web services back end which gets the input from the student, routes it to CS, gets the response from CS, asks CS :why to get the corresponding label, and decides whether CS is likely correct or whether the CNN is likely correct. If CS is chosen then the original answer is provided to the student. If CNN is chosen then the system sends that label (as the canonical form of the question) back to CS and gets the corresponding answer and sends it back to the student.
It is a roundabout approach I know (we were in a hurry to get this implemented) and we are working on making it more efficient, but the drag on the system is the chooser, not CS. CS is so fast that it matches the input, provides the answer, answers why, and (if necessary) responds again in ΒΌ of the time it takes the chooser to decide. The other drawback is that resubmitting the canonical form of the question does not ALWAYS result in the correct match, but we can solve that pretty easily.
We are currently working paraphrasing and memory augmented models to combat the sparse data problem, as well as ways to generate more training data such as getting all patterns that match an input, translating from English to other languages and back, and even putting a version of our Virtual Patient in the local Center of Science and Industry to get random schoolchildren to ask him questions. What could go wrong with that . The problem with all of these is that some human generally has to decide if the responses/matches are correct and that human is me.
Probably more than you wanted to know but hope this helps,
Doug