I use ternary search tries (TST) for a long time (years), and they are integrated into my stuff, are useful for robust for word-seeking, but not for pattern matching! If you use it, you need to build a new TST for each pattern-making the whole thing nasty and slow. (I mean runtime). Also this is no a compact stuff, it wastes a lot of space, the big O access is not logarithmic, its asymptotically linear.
I used something like a binary tree for AIML pattern matching. So no, you don’t need a new TST for each pattern. You would do such a method using regular expressions, trying each expression individually, which is why context becomes so important to reduce the number of expressions to try.
It sounds like you consume a single character in each trie when you employ a TST. I am considering consuming a word instead, that is, the actual text of the word and not a symbol of a word derived from using another TST to extract the words into symbols (something like a spell checker, except a match returns token.) I don’t want to deal with a lexicon of all the possible inputs and then have to deal with stuff that may not be in my dictionary such as a person’s name.
With the binary tree I use now for AIML, each trie matches the word for the next trie using a hash table. With a TST it appears I’d be replacing that with a binary search. This is still much faster than your next suggestion.
The best approach is to build an automaton, and implement the graph for making one or more symbols optional, this works fine and is fast and scalable, you may also compile the whole thing, and build a table (transition table) then your automaton must only jump from rows to columns as the symbols are input and if you are in a terminal state when the input symbols ends, you’ve got a match! voila!
A TST is a graph. If implemented in C or C++ you pull the tries from the heap using pointers. There would be little overhead, probably less than when compiled into an automation. If you using a run-time engine like Java or the CLR of .Net, it will be much slower (JIT loading, automatic garbage collection and a bunch of other things, etc.) Furthermore the symbols will typically be matched sequentially in the automaton.
The apparent speed probably comes from the input being converted into symbols which would create a register to register compare instead of a string to string match. But as I pointed out, you lose that speed advantage when you factor in the conversion to the symbols used to match and you introduce another set of issues about maintaining the lexicon.
Also an automaton could run into issues if your optional symbols recurse.
Andy identified that you’d need a transition graph or automaton for each pattern to compare. You might relieve this by representing all the patterns in a single graph. Then we are talking about a grammar. There are several developers here who prefer grammars. They follow up the initial run of the automaton for syntax, the parsing, with additional rule processing to get at the semantics that clarifies the meaning of the inputs because, as you know, what is uttered may have more than one interpretation.
Pattern matching is what you do when your eyes scan ahead in text to get the jist of what you are reading. Parsing is what you do when reading aloud or when reading to yourself so that you hear the words in your consciousness, that inner voice. Ultimately your mind must process the stuff which is where the real salesmanship begins. An effective sales is typically sold in the presentation. Pattern matching or parsing only needs to cue the best sales presentation for the client.