For a more detailed description, please see:
Bot Colony paper presented at ACL JapTAL2012, Kanazawa, Japan Oct. 23, 2012
Manages the dialog flow and makes decisions about how to handle specific user utterances. Utterances (or dialog acts) include stating facts, answering questions, making requests, stating opinions and so on.
The parser breaks down the input text into constituents (phrases), identifies functional components (subject, verb, object, etc.), and generates a parse tree.
Disambiguation (Word Sense Disambiguation or WSD)
Determines the sense of each constituent. For example, a word like structure can mean a building (the structure consisted of arches), someone’s anatomical structure (good bone structure), or how knowledge is organized (the lecture structure).
Determines to what entity a word refers. In ‘The plane crashed, but the crew survived.’, the crew is presumably the crew of the plane that crashed. Presumably, the plane had already been introduced in the discourse, so we know to which plane this word refers.
The reasoner matches queries with a fact base, and attempts to return an answer. Suppose Alice was in Singapore on Monday, in Paris on Tuesday and in Montreal on Wednesday. If one asks “Where was she on Tuesday ?” co-reference resolution will first resolve ‘she’ to Alice, and then the reasoner will answer ‘Paris’.
Converts the internal representation useful for reasoning (called a logic form) into text that can be understood by people.
Script Engine (not shown)
This is where application code connects, to handle utterances from a user, and other events.
This component receives events from various sources (utterances, timers, changes in variables of interest, geometry events, etc.) and reacts to them using a context/goal/plan paradigm.
Our scripts are in controlled-natural language with a strong syntax, but other API’s are envisageable.
3D world management
The 3D environment is ‘the world’. Since the conversation often refers to objects visible in the environment, the 3D environments we manage can be used in co-reference resolution and reasoning.
Converts the user’s spoken input into a stream of characters, that can be processed by the dialog pipeline.
Converts text originating from the dialog pipeline into speech, that can be heard by the user.