A conversation can be thought of as a series of utterances between 2 entities, where each utterance is determined by the previous context, and the *intentions* of each entity.
Polite conversation obeys a series of utterances, like tennis volleys, as opposed to both talking at the same time.
One might think of the context as being built on the set of past volleys: {s1-->r1, s2-->r2, ... }
Some of the more recent conversational systems like IBM Watson and Amazon Lex also speak of “slots.” This has evolved from what we used to call “frames based learning.” The slots pick up useful words or phrases that are linked together in order to determine the intention of the interlocutor, or the person who is talking to the chatbot.
Thus, the goal for one of this type of system is to chat until an intention is detected, and then issue a yes/no question to verify that the correct one has been found. So in these models, context might mean something closer to keeping noun/pronoun agreement, correct verb conjugation, facts or slots that are filled in as the chat goes on.
A conversation can have more than one intention, but in terms of practical applications, “get me an airline reservation,” and “I need a hotel room in Denver for 5 days” means that the conversation will likely be short if the intention is determined soon.
It is worth noting that these systems are speech recognition enabled.
For more about Lex, see: https://aws.amazon.com/lex/
Here is a slideshow depicting benchmarks of several NLU systems.
https://www.slideshare.net/KonstantinSavenkov/nlu-intent-detection-benchmark-by-intento-august-2017
Robby.