Thanks to the progress of computer vision technologies and human sensing technologies, human behaviors, such as gaze and head poses, can be accurately measured in real time. Previous studies in multimodal user interfaces and intelligent virtual agents presented many interesting applications by exploiting such sensing technologies [1, 2]. However, little has been studied how to extract communication signals from a huge amount of data, and how to use such data in dialogue management in conversational... Read more