ITEC 320: Programming Assignment 5 - Huffman Coding

Due Date: 11:59:59 p.m., Saturday 4/29/2017

Late programs NOT accepted

Submit command: submit itec320-01 huffman_pkg.ads huffman_pkg.adb tree_pkg.adb tree_pkg.ads code_strings.adb


Last modified:
Updates:

  1. 2017 Apr 27 11:26:56 AM: Added a suggested print_tree procedure to Huffman_Pkg.
  2. 2017 Apr 25 02:41:07 PM: Added assignment checker
  3. 2017 Apr 25 09:21:16 AM: I will not test your program with codes of only 1 character. You can assume that the input is correct; I will not test with erroneous input.
  4. 2017 Apr 24 10:43:47 PM: I will not test your program with strings of length 0 and 1.
  5. 2017 Apr 24 10:43:47 PM: Clarification: When combining nodes, the lowest frequency node should be on the left. When frequencies are tied, the alphabetically earlier string should be on the left.
  6. 2017 Apr 22 06:19:35 PM: Do not copy pre and post conditions from the spec to the body. Simply declare the routines in the body as usual. The pre and post conditions will not change anything that you; they only change what runtime errors occur, and when.
  7. 2017 Apr 21 11:04:48 PM: Another Hint: Another important section to look at to help you figure out how to do Trees is the section on "Dynamic Stacks and Queues" in the notes "Introduction, linked structures, Stacks" in weeks 13 and 14.
  8. 2017 Apr 21 07:02:25 PM: Corrected name of huffman_pkg and to_huffman in a few places.
  9. 2017 Apr 21 07:02:25 PM: Hint: Your treepkg should use the Stackpkg as an example. See the notes in week 11 on generic stack package and generic child package.
  10. 2017 Apr 21 09:55:11 AM: Removed "DRAFT". Updated Huffman package specification and client.

Now available: You can see an example of this program at assignment checker (which uses gnoga to integrate the GUI, web server, and problem solution). On-campus or VPN session required.



In this assignment you will work with pointers, trees, and Huffman codes. You will implement procedures to_huffman and to_ascii that convert between Huffman and ASCII representations of a string. Huffman codes are used for compressing data by using shorter bit strings to represent more frequent characters and longer bit strings for less frequent ones. In this assignment you will find the frequency of the characters of a string, then create the code for that string, and finally convert the string to its Huffman encoding.

Example:

      Original String: "The cat in the Hat"
      Frequency and character encodings:
          :  4   00
         H:  1   11010
         T:  1   11011
         a:  2   010
         c:  1   0110
         e:  2   100
         h:  2   101
         i:  1   0111
         n:  1   1100
         t:  3   111

      Encoded string: 110111011000001100101110001111100001111011000011010010111
      With spacing  : 11011 101 100 00 0110 010 111 00 0111 1100 00 111 101 100 00 11010 010 111
The encoded string would take about 57 bits vs 18*8=144 bits for ASCII. The original string contains four spaces (coded as 00), three 't's (coded as 111), etc. To help the human reader, the second encoded string includes spaces between the bits for the individual letters. As you can see, this is a prefix code: no codeword is a prefix of another word (hmm, prefix-free would be a better name). Having a prefix code is important for converting coded strings back into ASCII.

Creating Huffman Codes:
Follow these steps to create the code:
  1. Build a frequency table for the string.
  2. For each character whose frequency is not zero, create a tree consisting of a single node. In that node store the character and its frequency.
  3. Use a greedy algorithm to combine lowest frequency trees until one tree remains
  4. The path to a leaf represents the code for that character:
See examples online.

Huffman_Pkg:
You will implement the following package:
      subtype Bit is Character range '0' .. '1';

      -- ASCII String
      subtype A_String is String;

      -- Huffman String
      subtype H_String is String with 
         dynamic_predicate => (for all c of H_String => c in Bit);

      -- Readable Huffman String
      subtype RH_String is String with 
         dynamic_predicate => (for all c of RH_String => c in Bit | ' ');

      type A_String_Ptr is access A_String;
      type H_String_Ptr is access H_String;

      type Frequency_Table is array (Character) Of Natural;
      type Huffman_Table is array (Character) Of H_String_Ptr;

      function create_freq_table (s : A_String) return Frequency_Table;
      function create_huffman_table (s : A_String) return Huffman_Table;

      function to_huffman (s: A_String; c : Huffman_Table; spaces : Boolean := false) return RH_String;

      function to_huffman (s: A_String; spaces : Boolean := false) return RH_String with
         post => to_huffman'result = to_huffman(s, create_huffman_table(s), spaces)
                 and ((    spaces and to_huffman'result in RH_String) or
                      (not spaces and to_huffman'result in  H_String));

      function to_ascii (s : H_String; ft : Frequency_Table) return A_String with
         post => to_huffman(to_ascii'result, false) = s;

      -- You might want to add this procedure to your package:
      -- Print the Huffman tree for s
      -- procedure print_tree (s: A_String);
   end huffman_pkg;
The notation to_huffman'result, for example, refers to the result returned by the function to_huffman. In to_huffman'result, spaces are included between letters when the parameter spaces is true. The subtypes A_String and H_String would be better if they were defined as new types, but subtypes are simpler! Of course, to_ascii and to_huffman are inverses.

Tree_Pkg:
You need trees, and so you will implement this package.
   generic
      type Item_Type is private;

      with procedure put(i : Item_Type);

   package Tree_Pkg is
      type Tree is private;

      Empty : constant Tree;

      function is_leaf (t : Tree) return Boolean with pre => not is_empty (t);
      function is_empty (t : Tree) return Boolean;

      function new_tree (i : Item_Type; l, r : Tree) return Tree;

      function left (t : Tree) return Tree with pre => not is_empty (t);
      function right (t : Tree) return Tree with pre => not is_empty (t);

      function data (t : Tree) return Item_Type with pre => not is_empty (t);

      procedure put_pre_order (t : Tree);
   private
      ...
   end Tree_Pkg;
   


Client:
Your client should read lines of text (use get_line) until EOF. For each line it should print the frequency table, the Huffman table, the original string, the encoded string without spaces, and the encoded string with spaces. Here is a simple client that shows how the routines work.
with huffman_pkg; use huffman_pkg;
procedure someclient is
   s : constant String := "The cat in the Hat";

   f_table : Frequency_Table := create_freq_table (s);
   h_table : Huffman_Table := create_huffman_table (s);
begin
   put_f_table (f_table);
   put_h_table (h_table);

   put_line (to_huffman (s, false));
   put_line (to_huffman (s, true));

   put_line (to_ascii (to_huffman (s, false), create_freq_table (s)));


Other Points:

Your program should have well-designed data, good decomposition, and make good use of language features. Use good style, including good indentation, descriptive constant, variable, type, function, procedure, and package names, and named constants. Also use good commenting style, remembering that the first thing in any program file should be a comment that gives a brief overview of what the file contains (and should do). Follow these commenting guidelines.

When debugging programs, run-time exceptions are a common occurrence. One way to find the line at which an exception occurs is to place put statements in your program so you can follow it's execution. Another way is to use the gdb debugger (ie gnu debugger): these instructions show you how. Note that you can also use gvd (the gnu visual debugger) to debug your program and find the source of an exception as well as other information.

Use the submit command to turn your program in for grading. You may submit as many times as you like, but since only the last submission is kept, if the last one is late, then your assignment is late. Always remember to use the -ls option of submit to check that your submission was received.



Dr. Okie's Home Page,