a spectrogram
What is this?
Robert G. Capra III
Department of Computer Science, Virginia Tech
Using Shared Context in Spoken Language User Interfaces for Mobile Applications

Introduction
First, I must state that my research is very much in its formative stages and is evolving on an almost daily basis. The description presented here is an early view on a possible direction I may pursue. That said, here it is!

In the past few years, advances in speech recognition and computing power have made it possible to develop and deploy user interfaces that use spoken language as input and output. Spoken language user interfaces have great potential to increase access to information, especially through telephone-based applications. This is in part due to the near ubiquity of telephones and the rapid embrace of cellular phones for mobile communication. Recently, several companies [1,2] have started offering "voice portal" services which provide access to on-line information about weather, sports, movies, and the ability to contact businesses by name rather than by phone number -- all through a telephone-based speech interface. Several personal assistant services [3,4] have been available for several years that offer the ability to review and reply to voice mail and email, forward calls, and dial the phone by voice. Most of the user interfaces in these services are based on restricted forms of dialog that are designed to seem natural, yet also help the system limit mis-recognition rates.

I am interested in the use of shared dialogue context to streamline and enhance spoken language user interfaces for applications that allow users to access information. Shared context may be built over time, or it may be based on the existence of common knowledge such as local geographic information or expressions. In other words, the shared context may be built between a user and a system, or it may be part of context and knowledge shared by a community. In addition, context may be built in a variety of environments. For example, a user may build shared context while working at her personal computer and then make use of this context while using a telephone-based personal assistant service the next day. I believe that exploring the use of shared dialog context may help lead to more robust spoken language user interfaces.

Theoretical Background and Previous Research
Areas that relate to this research: speech recognition, natural language processing, dialogue and conversation, collaboration, knowledge representation, linguistics, language modeling.

Some of the fundamental theoretical research I have been exploring is in the area of dialogue and conversation. Theories of turn-taking, grounding, and conversational implicature as reviewed in Jurafsky and Martin's recent book, Speech and Language Processing [5] are forming a starting point for some of my exploration. Approaches to interpretation and inference of dialogue acts (also outlined in [5]) are relevant. James Allen's classic book, Natural Language Understanding [6] has also been a good reference and refresher.

In the area of spoken language user interface design, there has been a good deal of work going on in the past few years. Some of this work is not being published since it is taking place in corporate R&D labs or in smaller start-up companies. Some is being shared at industry conferences such as AVOIS and SpeechTek or in groups such as the VoiceXML Forum. There is also a good deal of work being published in academic journals and conference proceedings. Some examples are [7, 8] and the recent Communications of the ACM magazine [9] that featured conversational interfaces.

Prior to staring my Ph.D. at Virginia Tech, I worked for five years in the Speech and Language Technology group at SBC Technology Resources, Inc., the R&D subsidiary of SBC Communications Inc. (parent company of Southwestern Bell and Pacific Bell). While there, I developed, researched, and prototyped a variety of spoken language user interfaces and related technologies.

Goals of the research
Issues that may be explored include: how to represent context so that it can be efficiently reasoned over, how to actually make inferences from context, and how to manage the complexity of capturing context. Additionally, I believe there are potential privacy and confidentiality issues to be examined in storing and transmitting shared context and history.

Current Status
I am currently in an exploratory phase of this research. I have been conducting a literature review, reading articles, and talking to other people involved in related fields. This may turn out to be a research area that I pursue further, or it may lead me to other, related problems that I pursue.

I have been discussing the research areas described here with Dr. Manuel Pérez-Quiñones, Dr. Naren Ramakrishnan, and Dr. Jack Carroll at Virginia Tech. Much of my current interest and thoughts about dialogue context have been motivated by Dr. Pérez.

Current Stage of Program of Study
I passed the Ph.D. qualifier exam in August, 2000. During the Fall of 2000, I have been exploring several directions for my research in the area of spoken language user interfaces. The next stage of my program of study is to complete my dissertation proposal.

What I Hope to Gain
I have a strong interest in an academic career and hope to gain insights, advice, and encouragement from participating in the Doctoral Consortium.

Bibliographic References
[1] TellMe website. http://www.tellme.com
[2] BeVocal website. http://www.bevocal.com
[3] Wildfire website. http://www.wildfire.com
[4] Portico website. http://www.genmagic.com/portico/portico_home.shtml
[5] Jurafsky, D., and Martin, J. Speech and Language Processing. Prentice Hall. 2000.
[6] Allen, J. Natural Language Understanding. Benjamin Cummings. 1987.
[7] Yankelovich, Nicole. "How Do Users Know What to Say?" ACM Interactions, Vol. 3, No. 6 (Nov/Dec), 1996.
[8] Marx, M., and Schmandt, C. "Putting People First: Specifying Proper Names in Speech Interfaces," Proceedings of the ACM Symposium on User Interface Software and Technology (UIST '94), November 1994, 29-37.
[9] Communications of the ACM. Vol. 43, No. 9 (Sept), 2000.



Home | Research | Literature | Research Tools | Speech and Langauge Links | Software | Demos

Prepared by Rob Capra, rcapra@vt.edu
Last modified
Copyright 2000 by Robert G. Capra III