COSC 3327 Assignment #5

DUE DATE: scripted by Wednesday, November 3, 2004

Your assignment is to write a simplified spell checker program. Your program will read in a dictionary of words, then open a user-specified text file, and check each word on the file with the dictionary. Your program will output the words that are not in the dictionary to the screen.

Your data structure to store the dictionary of words MUST be implemented as a HASH table. In your program DESCRIPTION, you should explain your hash table method and your HASH function.

For purposes of this program a "word" is any sequence of one or more alphabetic characters only, with the exception of the ' (apostrophe) character and the - ( hyphen )which ARE considered part of a word, no other punctuation. Words are separated by whitespace characters on the input file. Make sure if a word has a punctuation character like comma, exclamation point, semi colon, colon, or period at the end that the character is stripped off of the word and be sure if there is a double quote, (, < , [ on the front or the matching chars at the end, that those are stripped also. (only first and last char.).

I will provide 2 test data files for you to test your program with, I suggest that you make up some simpler files for preliminary testing as my test files are quite long. Format your output so that you have 3 columns of output per page NEATLY aligned containing the misspelled word as it appears on in the file.
For example:

  
 theat  		testi			anx 
 waas        		cant 			ist 
Test files are asg5a.dat and asg5b.dat . You may open them directly on the cs system
~lbaker/www/cosc3327/asg5a.dat and asg5b.dat . These will be released Monday, October 25, 2004 by 10am. You should therefore be testing your program using a simple input file.

In order to verify that your hash table works well, count the number of collisions that occur in your table. Since there are well over 100,000 words in the spell checker dictionary, the size of your table should be close to or a prime number above 400,000. (NOT much larger though, it would be best to find the closest prime number at or above 400,000.

The spell check dictionary is located in my www/cosc3327 directory on the CS system which would be "~lbaker/cosc3327/spellcheck.txt" . I suggest each of you just hard code the name of the file directly into your program so that everyone does not make a copy of this file, it is rather large with approximately 104,217 words on it, one word per line.

BONUS: 25points
With each misspelled word found on the input data file, print the line number along with the word that is misspelled, use the same 3 column output format.

Grade Based on:
20% well-organized testing of code with handwritten labels.
20% well-documented, modular code including use of hash table and function.
40% correctness of code.
20% style and readability, neatness of output.


Last revised: