ITEC 360: Program 2


Program due by: 11:59:59 p.m. Monday, April 6, 2020

Example submit command: submit itec360-01 README.txt results.pdf closest_pair.adb


Last modified:

Updates:

  1. 2020 Apr 11 09:29:51 PM: Information on phase 2
  2. 2020 Mar 30 10:40:46 PM: Set due date

You can see an example of this program at assignment checker (which uses gnoga to integrate the GUI, web server, and problem solution). On-campus or VPN session required. If you'd like to download the files, see the links at the bottom of this page.

Overview: This assignment involves solving the solving the two-dimensional closest pair problem using either a brute force algorithm or a divide and conquer algorithm, or both. Specifically, do the following:

  1. Submit a program that reads a dataset, runs and times either or both algorithms (as specified), and then prints results from the run.
  2. Submit a file called README.txt (case sensitive) that briefly describes the following:
    1. the commands on rucs that will compile and run your program
    2. whether each algorithm works correctly, and for what dataset sizes, and if it does not work correctly, what kind of error to expect
  3. Turn in a pdf of a short paper that analyzes the actual performance of your program to confirm the expected n2 and n lg n behavior of the algorithms.

Command Line Argument: To receive credit for this assignment, your program must use its command line argument to determine which algorithm(s) to use:

>closest [BRUTE | DIVIDE | BOTH ] 
The keywords (eg BRUTE) are case insensitive.

Input: The first line of an input file will be a positive integer n and each of the remaining n lines of the file will contain a pair that specifies one point. The two numbers of the pair will be separated by white space. The pairs will be non-negative integers in the range 0 .. 2 ** 31 - 1 (ie roughly 0 .. 2_000_000_000). This will allow you to calculate (x1-x2)^2 + (y1-y2)^2 in a 64 bit integer. Your program is to read from standard input. DO NOT code a filename into your program!

Output: Output from your program should include:

If multiple pairs are separated by the minimum distance, then you can display any one (or more) of them. Having two implementations (ie brute force and divide and conquer) will, of course, give you a way to check your program results (at least for smaller values).

Algorithm: Your program should implement an n lg n divide and conquer algorithm, and it should sort only twice. It should not sort on every recursive call. Your algorithm will need to keep the Y-sorted list in sorted order. There are three ways to do this:

  1. Careful distribution (ie anti merge) [Distribute]. (Problems can occur when there are values with the same x coordinate.)
  2. Storing the x-sort-index along with the point to guide distribution [Distribute 2].
  3. Merging on returning from the recursion [Merge]. (This method seems to have the n lg n distance calculations, but n**2 time performance. Use this method only if you can explain the n**2 performance.)

Performance Measurement and Report: Run your program on a number of different dataset sizes and measure the time required to solve the problem for that size. Also count the number of distance calculations performed.

Write a short paper (a page or two) that shows your results. Your paper should analyze whether your results are as expected. Your paper should also include the processor architecture, speed, and compiler optimization level.

For example, for sizes 1024 and 4096 the expected ratio of the times would be (c * 4096 * 12) / (c * 1024 * 10) = 4 * (12 / 10). (What happens to these ratios as n increases?) Distance counts should also have this behavior. Your paper should compare your results with the expected ones?

When measuring the time taken by the algorithm, you should include only the time used to actually solve the closest pair problem, without including the time required to read the file and to sort the array. You can, of course, report the read and sort times as separate items (as does the assignment checker).

You should measure, report, and analyze the time required for datasets whose sizes are powers of 2. You may also measure powers of 10, but if you do, report and analyze them separately from the powers of 2.

Language and Platform: In general you should use a traditional, non-scripting language. If you want to use a language other than Ada, C, C++, Java please discuss it with me and get my approval first. I will test your program on rucs, and so you should verify that it works correctly there.

Program Name, Program Execution, and Comments: You may name your program anything that you like. Give the commands needed for compiling and running in both the comments of your main routine and in the README.txt file that you submit along with your program.

No specific commenting style is required, except that your name should be in all files. You should of course make good use of white space, and your comments should help the reader by describing the purpose of routines and sections of your code and by explaining any non-obvious and/or interesting design and implementation decisions.

Sharing Datasets: Of course, you will not share code, but it's fine if you want to share datasets so that you can compare the performance of your code.

Efficiency: Make your program reasonably efficient. Do not, for example, unnecessarily copy or sort arrays. On rucs, a reasonably efficient program should be able to find the closest pair of a million points in roughly one second. (Hint 1: To handle a million points when running on rucs I had to raise the stack size. Hint 2: Are you calling sqrt more often than needed?).

Errors: In general you can assume that the input is correct. However your program should be robust enough to handle common errors and should terminate gracefully for errors such as unexpected EOF or non-numeric input. You do not have to check for whether there are exactly two numbers per line.

Files for download:

  1. 2**01 points: 2points.txt
  2. 2**03 points: 8points.txt
  3. 2**05 points: 32points.txt

  4. 2**10 points: 1024points.txt
  5. 2**11 points: 2048points.txt
  6. 2**12 points: 4096points.txt
  7. 2**13 points: 8192points.txt
  8. 2**14 points: 16384points.txt

  9. 2**15 points: 32768points.txt
  10. 2**16 points: 65536points.txt
  11. 2**17 points: 131072points.txt
  12. 2**18 points: 262144points.txt
  13. 2**19 points: 524288points.txt

  14. 2**20 points: 1048576points.txt