Andrew Begel
University of Calfornia, Berkeley

Spoken Language Support for Software Development

Introduction

Software development environments have not changed very much in the past thirty years; developers are forced to use low-level text editors and program representations designed for compiler input. However, with one another, developers discuss software artifacts in terms of high-level conceptual notions. Our research enables developers to work at a more conceptual level by programming via speech. By reducing their dependence on typing and text, our approach also lowers barriers for the growing numbers of software developers that suffer from repetitive strain injuries and other related disabilities that make typing difficult or impossible.

Background

When not in front of the computer, software developers communicate with one other all the time using voice, diagrams, presentations, etc. This is the way that teachers communicate with students, and how students communicate with each other when they are working on a software project. Unfortunately, however expressive these interactions might be, none of them are understood by current software development environments. These environments only support text entry into a text editor and batch compilation services. Even though there has been a long history of research in support of program development [Bahlke 96, Borras 88, Reps 89, Van den Brand 95 and 96], very few environments support higher-level services, such as online syntactic and semantic analysis [VisualStudio, CodeGuide] that developers can use to explore their source code artifact with more effectiveness than "grep.". Even in these environments, however, program entry is still consigned to text editing.

Text-based program editing has been used for fifty years, and of course, has earned great merit. However, what we advocate instead, is that in addition to plain text editing, we will enable programmers to dictate, compose, navigate, browse and edit their software in high-level linguistic terms. We feel that this can enhance programmer productivity and make them more efficient. Moreover, we believe that if the programmer could express himself verbally (through speech recognition), he might find it easier to speak in pseudocode, or in some stylized, formalized -- yet informal with respect to an actual programming language -- high-level language, and get his ideas down on screen more efficiently.

Efforts to apply speech-to-text conversion for programming tasks such as authoring, navigation, and modification using these conventional natural language processing tools [IBM, Dragon] have had limited success. English language parsing provides poor recognition of most traditional programming language text because the grammars are very structurally dissimilar. Some researchers have attempted to adapt speech recognizers for programming [Desilets], but their work suffers from awkward, over-stylized code entry, and the inability to exploit the structure and semantics of the program. A few researchers [Snell 00, Price 00, Arnold 00] have applied programming language analysis technology to understand the program code being spoken, but all have their limitations.

Goals of the Research

The main thrust of this work is to build a software development system that can understand spoken program dictation, composition, navigation and browsing, and editing, which will make these tasks easier for the software developer. The major technical challenge is to resolve the ambiguities that the new input modes allow. For instance, if the user is programming in Java, and says FOO SUB BAR BAZ PLUS PLUS, what did he mean? It could be foo[barBaz]++, but perhaps it's foo[bar.baz]++, or even foo[bar].baz++. Not only are there lexical ambiguities (Is bar capitalized? Is it concatenated with baz?), but there are quite a few syntactic (Where does the closing bracket go? Is baz a field dereference from bar, or perhaps bar is a method that takes baz as an argument?), and semantic ambiguities (What is the element type of foo? Can it be incremented by ++? Is bar a method in the current scope?).

These kinds of ambiguity resolution require new algorithms for interacting lexical, syntactic, semantic, and program-specific analysis. New methods of accommodating lexical, syntactic, and semantic errors and inconsistencies will be created, in order to sustain language-based services when the artifacts are incomplete and incorrectly formed. We will build these analysis techniques on top of Harmonia, our programming language analysis framework which provides language-based tools and services to applications.

We also are asking questions about the cognitive issues behind spoken software development. How is code spoken? How is this different from writing code? What effect might speaking code have on our problem-solving ability? How do novices, who have an incomplete understanding of computer programming, talk about code? How is this different than an expert talking about code? How easy is it to program verbally with speech recognition systems? We hope to gain insight on these questions and contribute to the body of research of expressing oneself through code.

To answer these questions, we will conduct several user studies to discover how students and experts express themselves verbally during the software development process, both alone and with others. We will use the results from these user studies to design a more easily spoken version of the Java programming language, as well as design the higher-level composition, navigation and editing commands that most naturally formalize how students already verbalize these tasks.

Finally, we will evaluate our system to see how it is used by students and expert programmers, and determine if it improves programmer efficiency and productivity.

Current Status

We have conducted one user study (results below) related to the design of the spoken form of a programming language. In addition, we have released a first version of Harmonia, our programming language editor, as a plugin to the popular editor XEmacs. This first release illustrates the maturity of the analysis techniques we will use in developing the rest of the speech-based editing system. Finally, we've developed two prototype editors in which it is possible to use speech to activate and instantiate code templates during program composition.

Interim Conclusions

We conducted an experiment in which participants read a one-page pre-existing Java program out loud. The experiment showed interesting differences between spoken and written communication about code.

Future Work

A future prototype of our spoken programming editor will support dictation of spoken Java, as well as spoken interfaces to higher-level program templates, transformations, refactorings and design patterns. We plan to conduct several Wizard of Oz studies to help design and debug the Spoken Java language dialeect.

Afterwards, in collaboration with several other computer science and education researchers, we will use this editor in CS61B, a sophomore-level course on data structures taught in Java. First, we will observe students to explore the vocabulary -- oral, diagrammatic, and gestural -- that inexperienced programmers use to communicate solutions and designs to experts and to one anothers. We will then incorporate corresponding primitives in the programming environment, and examine their effect on productivity. Effects will not necessarily be beneficial. For example, the use of gestures is often unconscious; perhaps making students aware of their gestural communication will be distracting rather than helpful. However, we expect that some of the facilities we provide will significantly improve productivity for some users. The challenge is to invent ways to enable other students to take advantage of these features.


Current Stage of Study

5th year Ph.D. student (beginning the second year of dissertation).

What I hope to gain from participation in the Doctoral Consortium


References

[Arnold 00]

Arnold S., Mark L., and Goldthwaite J. Programming By Voice, Vocal Programming. Proceedings of the ACM 2000 Conference on Assistive Technologies. (pdf).

[Bahlke 86]

Rolf Bahlke and Gregor Snelting. The PSG system: From formal language definitions to interactive programming environments. ACM Transactions on Programming Languages and Systems, 8(4):547-576, October 1986.

[Borras 88]

P. Borras, D. Clement, Th. Despeyroux, J. Incerpi, G. Kahn, B. Lang, and V. Pascual. CENTAUR: The system. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 14-24, November 1988.

[Desilets]

Alain Desilets. VoiceGrip 3. http://ai.iit.nrc.ca/il_public/VoiceCode.

[Dragon]

Dragon Systems, Inc. Dragon Dictate Product Home Page. http://www.dragonsystems.com/products/dragondictate.

[IBM]

IBM Inc. IBM ViaVoice Product Home Page. http://www-4.ibm.com/software/speech/.

[CodeGuide]

Omnicore Software. CodeGuide 3.0: The Next Generation Java IDE. http://www.omnicore.com/.

[Price 00]

Price, David, Riloff, Ellen, Zachary, Joseph and Harvey, Brandon. NaturalJava: A Natural Language Interface for Programming in Java. Proceedings of the 2000 International Conference on Intelligent User Interfaces, January 2000.

[Reps 89]

Thomas W. Reps and Tim Teitelbaum. The Synthesizer Generator: A System for Constructing Language-Based Editors. Springer-Verlag, 1989.

[Snell 00]

Snell, Lindsey. An Investigation Into Programming By Voice and Development of a Toolkit for Writing Voice-Controlled Applications. M.Eng. Report. Imperial College of Science, Technology and Medicine, London. June, 2000.

[Van den Brand 95]

Mark Van den Brand and E. Visser. The ASF+SDF meta-environment: Documentation tools for free! Lecture Notes in Computer Science, 915:803-??, 1995.

[Van den Brand 96]

Mark Van den Brand, A. Van Deursen, P. Klint, and S. Klusener. Industrial applications of ASF+SDF. Lecture Notes in Computer Science, 1101:9-??, 1996.

[VisualStudio]

Microsoft Corp. Visual Studio.NET. http://msdn.microsoft.com/vstudio.