SIGSCE 2003 DC Application - Hector G. Perez-Gonzalez

Hector G. Perez-Gonzalez
University of Colorado at Colorado Springs
Research:


LEARNING OBJECT ORIENTED DESIGN THROUGH
 AUTOMATIC MODEL GENERATION
 FROM NATURAL LANGUAGE PROBLEM STATEMENTS

INTRODUCTION
A number n of software (SW) analysts may produce n different (perhaps all of them correct) solutions from one specific SW requirement document. This happens because Natural Language (NL) understanding involves syntactic, semantic and pragmatic issues, (different backgrounds in persons produce different interpretations) and because of distinct design experiences. I am proposing a technique (called role posets) able to be automated and a semi-natural Language (4WL) as a main vehicles of a methodology  to accelerate the production of reliable accords between different stakeholders. The supporting software tool (GOOAL: Graphic Object Oriented Analysis Laboratory) automatically produces object models (UML diagrams) from NL (English) statements. These statements describe the original software requirements. These models are generated by analyzing sentence by sentence the intermediate language (4W) version of the original sentence set.
With this methodology and supporting software tool, students of Object Oriented technology courses can realize which design decisions are being taken by the system while it iterates through the requirement sentence.

THEORETICAL BACKGROUND
Although it has been proven that Natural Language processing with holistic objectives is a very complex task (McDonald 1990)(Hobbs 1992), it is possible to extract sufficient meaning from NL sentences to produce reliable models. Complexities of language range from simple synonyms and antonyms to such complex issues as idioms, anaphoric relations or metaphors.  Efforts in this particular area have had some success in generating static object models using some complex NL requirement sentences. This research proposes the use of a tool-supported methodology that helps in the object oriented design teaching-learning process.     

PREVIOUS RESEARCH
Some methodologies have followed the approach to formalize the process to create software products with limited intentions to automates it. This approach shall cover the complete process (even to the coding stage). Other approach has a clear aim to use specific techniques to construct an automated software tool to help the user. Under this tool-oriented approach, we shall consider the user and the tool developer points of view.
 The user's point of view sees the techniques and methodologies considering the goals and nature of the user.
 The tool developer's point of view considers the general view of constructing a software tool that proves the proposed methodologies.

We analyze the methodologies and the Software Design tools from the Natural Language System Requirements Specifications  (SDNL).

The first relevant published technique attempting to produce a systematic procedure to produce design models from NL requirements was Abbot (1983). Abbot suggested a non automatic methodology that only produces static analysis and design products obtained by an informal technique requiring high participation with the of users' for decisions.
Cordes and Carver (1988) presented a methodology by which the English requirements document is converted into a reliable set of knowledge for use within the specification process. Although this is a tool oriented effort, its objective is the production of a set of knowledge for reuse when a new problem is an extension of a previous one.
Saeki, Horai and Enemoto (1989) published the first effort oriented to a more comprehensive analyses of the software process. Their research present a process to derive incrementally a formal specification from an informal specification written in NL. They suggested the use of simple "verb patterns" to identify linguistic subjects and objects. Their methodology highly depend on user interactivity. The technique produces a list of nouns and verbs. The user decides which nouns become classes and which verbs become methods.
Cockburn (1992) presented a more detailed analysis of NL elements. He related relational nouns with objects and adverbs with polymorphisms. He analyses the roles of objects in a high level abstraction.
Osborne and MacNish (1996) present a methodology whose objective is to eliminate ambiguity in NL requirements. The problems associated with processing unrestricted NL are established. The impact of multiple sense problems can be reduced by the use of a Controlled language (CL). This CL can have several disadvantages: It can restrict and irritate the user and can be difficult to learn The proposed CL is called Newspeak.
The Da Silva, Baptista's (1996) thesis suggested an object oriented model whose value in those days (before UML) was very high. He presents a formal diagrammatic notation (Object-Z) and a set of rules to transform semiformal into formal specifications.
Mich and Garigliano (1996) presented a very serious effort to automatically  produce good results. They centered their publication on a software tool that is based on previous software using NL processing, but with wider objectives (as translations or summary productions). This effort is improved with the work of Borstler, Cordes and Carver (2001).
Burg and Van de Riet (1996) presented an interesting example that try to minimize the participation of the user in the job of extracting classes and relationships from the text. There has not been a lot of information about their research, but it seems that it includes a very large lexicon to aid in semantic model validation and provides a new modeling language to illustrate results.
Hars and Marchewka (1997) presented one of the first important efforts to produce dynamic analysis products in 1997. They presented an interesting classification about  the obstacles "within", "between" and "among" the users of a proposed technique or tool. Their effort claims to be the first technique (and tool) that considers the meaning of the concepts involved.
 The supporting tool's core approach is a word dictionary (about 23,000 words). These are just single root words and every one includes a concept category (about 15: event, person, location, etc.). The dictionary also contains word frequency information which was computed from the number of occurrences of words in a set in a set of volumes from one year of Time magazine. The process starts transforming the textual sentences into an internal representation, then  a parser construct an equivalent tree structure. A separate algorithm resolves complex or composite words.
 This technique's tools have the capacity to ask the user to resolve ambiguities not just at a syntactic level but at a semantic level as well.
Boyd (1999) made a good summary of past efforts in his research, and the talks interestingly about linguistic metaphors in software design and emphasizes about analysis prepositions, articles, interjections and so on, in addition  to nouns and verbs. A process of syntactic normalizations is introduced. This technique produces a more precise sentence through the following two manual steps: Syntax normalization and semantic exploration. This work is valuable but does not produce automatic generation of results.
 Borstler, Cordes and Carver (2001) presented a product that may be the most elaborated tool currently developed to resolve this problem. The prototype accepts well formed NL textual use cases and produces, with minimal user participation UML static diagrams with Classes, objects and simple relationships. A Valuable feature is the final traceability supported by hypertext technology.
Overmyer, Lavoie and Rambow (2001) presented a very complete interactive methodology and prototype. It is a usable tool to provide linguistic assistance to produce a subset of  UML results. However, the text analysis remains in good part a manual process. It provides reliable tools for the user to analyze texts but it does not analyze text itself.

GOALS
The supporting software tool (GOOAL: Graphic Object Oriented Analysis Laboratory) in this research paper will automatically produce static and dynamic object models from a NL statements (in English). These statements will describe the original software requirements. These models are generated by analyzing sentence-by-sentence the intermediate language (4W) version of the original sentences set.
With this methodology and supporting software tool, students of Object Oriented technology courses can realize which design decisions are being taken by the system while it iterates through the requirement sentence.

CURRENT STATUS AND CONCLUSSIONS
The proposed tool has demonstrated reliable results with simple sentences and has been validated with different versions of the original sentences. Manual application of the core tool's techniques has been applied to complex domains (such chess rules) and showed promising results.
Novel issues of this proposal are the integrated tool assisted methodology and the automatic generation of dynamic analysis models with educational objectives.


OPEN ISSUES
Although the proposed tool presents promising results using simple sentences, it is intended for use with more elaborated cases.
The current focus of this research is the automatic decision about classes, objects, relationships, collaborations and sequences. At the end of this research the tool must produce more complete sets of models and must be capable of producing reliable results with more complex sets of sentences. Students of Object Oriented courses will be exposed to using this tool to compare and validate results against a control group. It is intended to prove this methodology  as a valuable tool to produce memorable learning experiences.

 

CURRENT STAGE IN MY PROGRAM OF STUDY
I have completed the required course normal credits and 16 from the required 30 Phd dissertation credit hours. It is expected that I can complete the complete credits on summer 2003. I hope to gain a valuable feedback that can help me to define reliable methods to validate results of my research and software tool and precise the particular objectives of the project.


BIBLIOGRAPHIC REFERENCES
(Abbott 1983) Abbott R. "Program Design by Informal English Descriptions"
 Communications of the ACM, 26(11): 882-894, Nov 1983

   (Borstler 1992) Borstler, Jurgen, Cordes, David and Carver Doris.
An Object-Based Requirements Modeling Method.
Journal of the American Society for Inf. Science 43(1):62-71, Jan. 1992

 (Boyd 1999) Boyd, Nik. "Using Natural Language in Software Development"
 Journal Of Object Oriented Programming February  1999
 http://www.jps.net/nikboyd/papers/rhetoric/road/index.htm 1999

                              (Burg 1996) Burg J.F.M. and Van De Riet. R.P. "Analyzing Informal Requirements Specifications: A first Step  towards conceptual modeling"
  Proceedings of the 2th  International workshop on applications of natural language to information systems, Amsterdam,
The Netherlands, IOS Press 1996

(Cockburn 1992) Cockburn A. "Using Natural Language as a metaphorical Basis for Object Oriented Modeling and Programming",
IBM Technical Report  TR-36.0002, 1992

(Cordes 1988) Cordes A.W., Carber D.L. "Generating a Requirements Specifications Knowledge Base"
  Proceedings of the 1988 ACM 16th. Annual conference on Computer Science, February 1988

(Da Silva 1990) Da Silva Araujo Joao Baptista Junior M.Sc.
 "Metamorphosis: An Integrated Object Oriented Requirements Analysis and Specification Method"
Lancaster University http://citeseer.nj.nec.com

 (Hars 1997) Hars, Alexander and Marchewka, Jack T.
"The Application of Natural Language Processing requirements Analysis"
(Journal of Management Information Systems, March 1997)http://www-ref.usc.edu/~hars/pub

(Hobbs 1992) Hobbs, Jerry R., Douglas E. Appelt, Jhon Bear, Mabry Tyson and David Magerman.(1992)
Robust Processing of real World Natural-Language Texts. SRI international.
Text Based intelligent systems. Pages 13-34.Lawrence Erlbaum Associates Publishers.

(McDonald 1992) McDonald, David D.  (1992)
Robust Partial-Parsing Through Incremental Multi-Alghoritm Processing.
Text Based intelligent systems. Pages 83-100.Lawrence Erlbaum Associates Publishers

(Mich 1994) Mich L., Garigliano R. [1994]
(NL-OOPS) "NL-OOPS A Tool for Object Oriented Requirements Analysis"
http://nl-oops.cs.unitn.it/

(Osborne 1996) Osborne, Miles  KmacNish, C  
"Processing Natural Language Software Requirement Specifications"
(1996)Department of Computer Science, Univ. of York, Heslington
http://citeseer.nj.nec.com/97557.html

(Overmyer 2001) Overmyer S. Lavoie Venoit and Rambow O.
"Conceptual Modeling through Linguistics Analysis Using  LIDA"
23rd international conference on Software engineering July 2001

 (Saeki 1989) Saeki,M.H.Horai, H.Enemoto "Software Development Process from Natural Language Specification" Proceedings of the 11th  International conference on SW Engineering  IEEE Computer Society Press 1989

(Umeå 2001) Umeå University project-         http://www.cs.umu.se/~jubo/RECORD.html