In the past few years, advances in speech recognition and computing power have made it possible to develop and deploy user interfaces that use spoken language as input and output. Spoken language user interfaces have great potential to increase access to information, especially through telephone-based applications. This is in part due to the near ubiquity of telephones and the rapid embrace of cellular phones for mobile communication. Recently, several companies [1,2] have started offering "voice portal" services which provide access to on-line information about weather, sports, movies, and the ability to contact businesses by name rather than by phone number -- all through a telephone-based speech interface. Several personal assistant services [3,4] have been available for several years that offer the ability to review and reply to voice mail and email, forward calls, and dial the phone by voice. Most of the user interfaces in these services are based on restricted forms of dialog that are designed to seem natural, yet also help the system limit mis-recognition rates.
I am interested in the use of shared dialogue context to streamline and enhance spoken language user interfaces for applications that allow users to access information. Shared context may be built over time, or it may be based on the existence of common knowledge such as local geographic information or expressions. In other words, the shared context may be built between a user and a system, or it may be part of context and knowledge shared by a community. In addition, context may be built in a variety of environments. For example, a user may build shared context while working at her personal computer and then make use of this context while using a telephone-based personal assistant service the next day. I believe that exploring the use of shared dialog context may help lead to more robust spoken language user interfaces.
Some of the fundamental theoretical research I have been exploring is in the area of dialogue and conversation. Theories of turn-taking, grounding, and conversational implicature as reviewed in Jurafsky and Martin's recent book, Speech and Language Processing [5] are forming a starting point for some of my exploration. Approaches to interpretation and inference of dialogue acts (also outlined in [5]) are relevant. James Allen's classic book, Natural Language Understanding [6] has also been a good reference and refresher.
In the area of spoken language user interface design, there has been a good deal of work going on in the past few years. Some of this work is not being published since it is taking place in corporate R&D labs or in smaller start-up companies. Some is being shared at industry conferences such as AVOIS and SpeechTek or in groups such as the VoiceXML Forum. There is also a good deal of work being published in academic journals and conference proceedings. Some examples are [7, 8] and the recent Communications of the ACM magazine [9] that featured conversational interfaces.
Prior to staring my Ph.D. at Virginia Tech, I worked for five years in the Speech and Language Technology group at SBC Technology Resources, Inc., the R&D subsidiary of SBC Communications Inc. (parent company of Southwestern Bell and Pacific Bell). While there, I developed, researched, and prototyped a variety of spoken language user interfaces and related technologies.
I have been discussing the research areas described here with Dr. Manuel Pérez-Quiñones, Dr. Naren Ramakrishnan, and Dr. Jack Carroll at Virginia Tech. Much of my current interest and thoughts about dialogue context have been motivated by Dr. Pérez.