A novel technique of achieving multilingual (including regionals) speech recognition by combining existing single language recognizers has a promise of
Offering for sale or license, US Patent 7,689,404, a method of enabling multilingual speech recognition by reduction to single language recognizer engine components
The purpose of the invention is to significantly reduce the computational complexity of multilingual and/or large-vocabulary speech recognition, while simplifying the efforts of productizing.
In addition to the instance where the language of an utterance is not known in advance, applications exist where the language may also change during an utterance: “I shouldn’t have been schadenfreude but c’est la vie”.
Thus business needs increasingly dictate the need for multi-language speech recognition for a wide range of applications. This patented technology offers a more efficient means of enabling existing single language recognizers to support multiple languages simultaneously. This technology overcomes traditional challenges and complexities of multilingual speech recognition, such as creating hugely complex monolithic multilingual recognizers that support all languages. So,
- Your existing single-language recognizers become reusable assets, and therefore
- Overall system maintenance costs are reduced
In contrast, the method described in the patent utilizes existing components of single-language speech recognizer engines by combining and controlling them in a way that enables automatic multilingual speech recognition across a range of supported languages and dialects.
A new component, the ‘Multilingual Dispatcher’ (MLD) envelops language independent components and invokes language-specific components to perform language-dependent processing. The MLD dispatches certain requests to individual recognizers, aggregates their responses and keeps track of the recognized sequence. The dispatcher is agnostic to how the single-language recognizers work internally. Thus, the hypotheses space is decomposed into sub-spaces visible to individual recognizers, which reduces the complexity. Moreover, language-specific components themselves are not affected when a language is added or removed from the application. So,
- Modular recognizers are added or removed without affecting the rest of the system
- This simplifies implementation and improves time to market, and
- Reduces incremental maintenance/upgrade/deployment costs caused by a single language
Smaller footprint and faster execution can be achieved, enabling smaller platforms and/or larger vocabularies
A ‘language’ is applicable for anything for which a recognizer exists, so the invention applies to both different spoken languages and different recognizer models or engines for different subsets of the same spoken language, such as regional or ethnic accents or gender differences. So,
This naturally leads to language tagging. E.g. a mischievous toy or bot may react differently to commands issued by a male, a female or a young child.
Key elements of the invention include a heuristic way to make numeric scores of hypotheses (such as Viterbi scores) comparable even if produced by different language-specific recognizers and a heuristic way of propagation of (seeding) a hypothesis from a hypothesis in a different language. Specific language support works like a replaceable plug-in thus creating a structure that enables scalable deployment of any subset of supported languages in short order. So,
- The complexity is at worst linear on the number of languages
- Pruning of active unlikely hypotheses is automatically aggressive in unlikely languages
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=07689404&OS=PN/07689404&RS=PN/07689404