submit itec360-01
README.txt
editdis.adb anyotherfiles.ads
You can see the assignment checker for this program at itec-noki01/360.
Many search engines give you a list of words that are close to your search terms. This feature can be implemented by listing the words that are within a certain edit distance of the search terms. For this assignment, the edit distance from word X to word Y is defined to be the minimal cost sequence of changes that transforms X into Y. Possible changes, and example associated costs, include
computer scomputer insertion scomputer copy scimputer substitution scieputer substitution scienuter substitution sciencter substitution sciencer deletion sciencer copy science deletionThus, the edit distance from computer to science is (2 * 0) + (2 * 1) + (1 * 1) + (4 * 1) = 7 n`&nbs;;&nbs;;
Search engines might select words with a low edit distance when making suggestions.
Assignment:
Your assignment is to use a dynamic programming
algorithm to efficiently calculate the edit distance between pairs of words.
A word is defined to be a maximal sequence
of letters from upper and lower case 'a' to 'z'.
You will use your algorithm to find and mark all of the words in a body of text
that can be transformed into a given, specified word
in less than or equal to a specified permissible edit distance.
You do not have to search for substrings that are identical to the specified word.
Your program should ignore the case of letters when calculating edit
distance, but not when producing output.
Input and output:
The first line of the input will be only a single, left-justified word, known as the search word.
The search word will consist of only letters from upper and lower case 'a' to 'z'.
The second line of the input will be an Natural, the maximum permissible edit distance.
The third line of the input will be four Natural numbers, which represent the weights for copy, deletion, insertion, substitution, respectively.
The remainder of the input will be a body of text containing one or more words that are separated by punctuation marks or white space (ie spaces, tabs, and end of line characters).
The output from the program is an EXACT copy of the input, with two exceptions. The first exception is that parentheses are added around each word within the body of text that can be transformed into the search word within the permissible edit distance. The second exception is the formatting of the distance and weights: The maximum distance and the operation weights should be printed without leading zeros and they should be left justified, with the weights separated by a single space.
Your program should read from standard input and write to standard output.
Is Edit Distance Commutative: Notice that edit distance calculations are not commutative. The distance from "computer" to "science" may be different from the distance from "science" to "computer". You may want to ponder the circumstances under which the calculation is commutative, although doing so is not necessary for the assignment. You will want to make sure that you are doing your calculation in the correct order.
Sample run:
Example standard input:
lovely 2 0 1 1 1 You have always been my love my love, for the love I love is lovely as love itself. Love? did you hang up? How many "loves" in this text?
lovely 2 0 1 1 1 You have always been my (love) my (love), for the (love) I (love) is (lovely) as (love) itself. (Love)? did you hang up? How many "(loves)" in this text?
Readme: Please submit file README.txt that contains instructions to compile and run your program. Also describe how well your program works.
Documentation: You will, of course, want to follow good style in writing your program. Please include the following two items in the documentation of your main routine:
Hint 1: If you find it difficult to break the program into words and to copy the input to the output, then I highly suggest that you design your algorithm so that it works as a finite state machine. Your machine will have two main states: one for processing inside words and one for processing outside words. This pseudocode, which doesn't necessarily adequately consider eol and eof, gives the general idea:
type States is (outside_a_word, inside_a_word); state: States := outside_a_word loop until eof get(c); case state is when outside_a_word => if c is not a letter put(c) else -- c is a letter token := c state := inside_a_word end if when inside_a_word => if c is a letter token := token + c else -- c not a letter check word and print as is or with () state := outside_a_word end if end case end loopIn summary, you should carefully consider what your program does to process a character when your program is in either of these two states, and then code your algorithm so that it handles the character appropriately based based upon which of the two states it is in. This pseudocode implies that you process the input one character at a time; however, it is up to you whether your input is read one character at a time or whether the input is read one line at a time (and processed one character at a time).
Hint 2: First solve the problem with all costs as 1.
Submission and language: You should use a command similar to the one above to submit your program. Include instructions on how to compile and run your code in your main routine and in your README file. If you want to use any language other than Ada, C, C++, or Java, please discuss it with me first.
Credits: This assignment relies heavily on an assignment developed by Dr. Uppuluri. The sample input above comes from page 403 of The Algorithm Design Manual by Steven S. Skeina and from the Stoney Brook Algorithm Repository.