SIGCSE 2002 DC Application

Eliciting Pseudocode in Novice Program Design

H. Chad Lane
hcl@cs.pitt.edu

University of Pittsburgh
Dept. of Computer Science
CIRCLE: Center for Interdisciplinary Research on Constructive Learning Environments

INTRODUCTION:
When faced with a typical introductory programming assignment, there is a definite proclivity for students to "jump right in" and start writing code (that's what programmers do all day, right?). For students with little or no programming experience, the resulting programs (operational or not) are often low quality and suggestive of a "just get it to work" mentality [Jon86]. In my opinion, teaching novices to write pseudocode supports the idea that problem solving is distinct from writing (source) code. If students can learn to learn that solving problems should be done before implementing solutions to them, I believe they will become better programmers in the long run.

To address this problem, I am building an intelligent tutoring system, named the Program Development Tutor (PDT), aimed at supporting students in the earliest stages of the programming process. The objectives of PDT are to help the student...

understand the problem being solved
design a high level solution in natural language-style pseudocode
develop a set of incremental programming goals consummating a complete program.

The only explicitly supported design strategy is iterative enhancement (or incremental program development), which encourages a student to solve a simplified version of a problem first, followed by an iterative process of adding features until the program is complete. At this time, my focus is on establishing the efficacy of a pseudocode based approach to novice programming as well as on the elicitation and understanding of novice produced pseudocode.

BACKGROUND:
The target audience for PDT is novice programmers with little or no programming experience. The intended role is in an introductory programming course as a replacement to the traditional handout or web page style of assignment distribution. PDT would replace that with an interactive session intended to give the student a stronger foundation from which to base a real implementation. No programming language specifics are involved: its purpose is to help the student focus on the core algorithm without the worry of syntactic details of any particular language.

RELATED RESEARCH:
A central theme underlying many CIRCLE projects is that students learn with greater understanding when they explain ideas in their own words [Chi89], [Chi94]. Two such projects are WHY2000 [Gra01], a culmination of several tutoring-oriented NLP projects, and the Geometry Explanation Tutor, a system that engages students by having them state explanations that justify their problem solving steps [Ale01a], [Ale01b].

The relationship between natural languages and programming languages has been studied in a number of contexts. Of particular interest is the result that meanings of many natural language words and constructions differ from related ideas and similar terminology in programming [Bon85], [Pan01]. Thus PDT could be viewed as supporting a student's "shift" into a programmer's mindset while maintaining the familiar footing of natural language. One such system that attempted to leverage natural language in a similar way was BRIDGE [Bon88] by helping students to "bridge" the gap between natural language (via menu selections) and Pascal.

With respect to program design, PDT is not an attempt to be a complete design tool. In contrast to SODA, a system supporting novice program design from beginning to end including knowledge about programming plans and goals [Hoh92], PDT is intended to prepare students for implementation, on their own with traditional tools.

RESEARCH GOALS:
I am addressing problems related to both instructional design and artificial intelligence. The instructional design problem is to develop an approach and environment supportive of pseudocode construction. The goal is to not only help the student write small programs now, but also to lay a foundation that will lead to success on harder problems. The AI problem, of course, is determining how best to automate tutoring within PDT.

My primary instructional objective is to investigate the hypothesis that students will have more success at writing programs when they first analyze and solve the problem in their own words. Indeed, if the ideas have been worked out in advance, students should produce fewer logical errors during implementation. For example, if the student already knows the termination condition(s) on a particular loop, s/he will likely not be faced with an infinite or non-executing loop during implementation. Similar situations involving data flow and the proper ordering of steps are easy to imagine.

My primary technological objective is to build a system capable of classifying novice expressions of pseudocode steps and coaching pseudocode construction. Because novice programmers frequently struggle to express themselves in the context of programming, many student utterances are vague and unrefined. For example, students can typically recognize the need for a loop in a program (e.g., "I think I need a loop"), but need help laying out the particulars. These issues of vagueness and incompleteness in student language are relatively untouched problems within the ITS community.

CURRENT STATUS:
The interface for PDT is complete. It is written entirely in Java and runs as a standalone program (although conversion to a web-based version is imminent). The interface consists of three windows: (1) a browser, (2) a dialogue window, and (3) a pseudocode construction area. In addition, modules for extensive logging, session replay, and human-to-human use over a network are complete.

Problem understanding is addressed by giving the student worked out examples to read in the browser (often in the form of sample output of the desired program). In addition, interactive examples are carried out in the dialogue window with the student "playing computer" and the tutor taking the role of a user. Pseudocode is built up by the student through a process of proposal (of a step, to the tutor), refinement (via dialogue with the tutor), creation (by the tutor), and finally placement (via drag-and-drop, into the pseudocode area). To support iterative enhancement, incremental programming goals are met along the way when the pseudocode reaches a state that accomplishes a programming goal. The first goal is always a simple subproblem of the whole (hopefully the simplest), while subsequent goals incorporate more and more aspects of the final desired program. At these times, a "snapshot" can be taken to save a copy of the code at each stage to help the student to reify this progression of goal attainment.

At this time (Nov 2001), I am completing a pilot testing study using a "wizard of oz" framework (human-to-human), with me as the tutor. A total of 13 students volunteered to participate: 7 as control subjects (no use of PDT), 6 as experimental subjects (users of PDT). Testing was done for three separate projects. The purpose was threefold:

find out how comfortable students were creating pseudocode within PDT
obtain a corpus of student generated pseudocode and dialogues with a tutor
to determine if the instructional design reduces time and/or effort to complete assignments

I am analyzing this data and simultaneously considering approaches to automate some of the observed tutor behavior. Some preliminary observations are mentioned below.

For more information about PDT, including examples and screenshots, please visit the PDT homepage.

INTERIM CONCLUSIONS:
As of this writing, it is unclear if the data supports the prescribed process of pseudocode writing. Students in the pilot study who used PDT were able to complete their assignments in less time on average, but with no statistical significance. Currently I am analyzing intermediate programs submitted to the compiler in order to construct a chronology of program development detailing how much time was spent on fixing syntax errors vs. logical errors, and how much was spent creating new code. Data is still being collected, including a final (more advanced) project with no use of PDT at all. This will allow preliminary assessment on PDT's long term impact (if any) on student design habits.

With respect to my technological goals, most of the subdialogues geared towards pseudocode elicitation seem to follow the same pattern. Following an empty prompt by the tutor (e.g., "What should we do next?"), students begin with a vague programming goal (e.g., "we need to do the count"). This is followed by subdialogue refining that goal into an actual step. Once the step is agreed up on (e.g., "increment count"), the process of placing it in the pseudocode area begins. Although I haven't thought as much about this part of the process, the proper location of steps seems mostly constraint-based (e.g., indentation, ordering, vertical spacing, etc.).

OPEN ISSUES:

Parsing and Classification: As stated earlier, this is my current focus: individual expressions of pseudocode steps. In addition, some kind of classification of the quality of the step will be needed as well.
Dialogue Management: The data collected thus far very clearly shows dialogues and subdialogues much like those found in other tutorial corpora. Thus, the problems of reference resolution, turn taking, dialogue acts, and plan recognition are all currently open issues in PDT.
Problem and Solution Representation: Problem specific knowledge poses a true challenge in this domain. It must be present in some form to do proper analysis of pseudocode, principles alone are not enough. Problem representation must include the program requirements and the solution representation needs to capture all possible (or at least, all reasonable) approaches to solving the particular problem.
Program Design: Currently, I am only addressing programming problems that fit nicely to the iterative enhancement style of design. Clearly, a top-down approach (or some hybrid of the two) is superior for some problems. PDT can handle a top-down approach in its current form, although I did not do so in any of the experiments.

PROGRAM STATUS:
This is my fifth year as a Ph.D. student and I've been working on this project since November of 2000. My immediate goal is to complete my dissertation proposal and present it in the spring of 2002.

WHAT I HOPE TO GAIN:
I am interested in getting feedback on the pedagogical efficacy of this approach to novice programming. In addition, I'm hoping the panel will help me refine and/or redirect my technological aims with respect to my pedagogical objectives. Most generally, I am looking for a CSE perspective on what is primarily an ITS project.

BIBLIOGRAPHIC REFERENCES:

[Ale01a] Aleven, V.A., Popescu, O., Koedinger, K.R. (2001). A Tutorial Dialogue System with Knowledge-Based Understanding and Classification of Student Explanations. In Working Notes of the 2nd IJCAI Workshop on Knowledge and Reasoning In Practical Dialogue Systems, August. http://www-2.cs.cmu.edu/~aleven/AlevenEtAlIJCAI2001WS.pdf

[Ale01b] Aleven, V.A., Popescu, O., Koedinger, K.R. (2001). Pedagogical Content Knowledge in a Tutorial Dialogue System to Support Self-Explanation. In Working Notes of the AIED 2001 Workshop "Tutorial Dialogue Systems. http://www-2.cs.cmu.edu/~aleven/AlevenEtAlAIED2001DialWS.pdf

[Bon85] Bonar, J.G., Soloway, E. (1985) Preprogramming Knowledge: A Major Source of Misconceptions in Novice Programmers. In Human-Computer Interaction, 1, 133-161.

[Bon88] Bonar, J.G., Cunningham, R. (1988). Bridge: Tutoring the Programming Process. In J. Psotka, L. D. Massey, & S. A. Mutter (Eds.), Intelligent Tutoring Systems: Lessons Learned. Hillsdale, Lawrence Erlbaum Assoc. Inc.

[Chi89] Chi, M.T.H., Bassock, M., Lewis, M.W., Reimann, P., & Glaser, R. (1989) Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145-182. http://www.pitt.edu/~chi/papers/ChiBassokLewisReimannGlaser.pdf

[Chi94] Chi, M.T.H., de Leeuw, M.C., & Lavancher, C. (1994) Eliciting Self-explanations Improves Understanding. In Cognitive Science, 18, 439-477. http://www.pitt.edu/~chi/papers/Self-explanations94.pdf

[Gra01] Graesser, A., VanLehn, K., Rosé, C., Jordan, P., & Harter, D. (2001) Intelligent Tutoring Systems with Conversational Dialogue, To appear in AI Magazine. http://andes1.lrdc.pitt.edu/~pjordan/atlas-papers/AIMag.ps

[Hoh92] Hohmann, L., Guzdial, M., Soloway, E. (1992) SODA: A Computer-Aided Design Environment for the Doing and Learning of Software Design, In Computer Assisted Learning: 4th International Conference, ICCAL '92, Berlin, Springer-Verlag.

[Jon86] Joni S.J., Soloway, E. (1986). But My Program Runs! Discourse Rules for Novice Programmers. In Journal of Educational Computing Research, 2(1).

[Pan01] Pane, J.F., Ratanamahatana, C.A., Myers, B.A. (2001). Studying the Language and Structure in Non-programmers' Solutions to Programming Problems. In Int. Journal of Human-Computer Studies, 54, 237-264. http://www-2.cs.cmu.edu/~pane/IJHCS.html

Last Modified on 11/29/01 by hcl