Your assignment is to write a simplified spell checker program. Your program will read in a dictionary of words, then open a user-specified text file, and check each word on the file with the dictionary. Your program will output the words that are not in the dictionary to the screen.
Your data structure to store the dictionary of words MUST be implemented as a HASH table. In your program DESCRIPTION, you should explain your hash table method and your HASH function.
For purposes of this program a "word" is any sequence of one or more alphabetic characters only, with the exception of the ' (apostrophe) character and the - ( hyphen )which ARE considered part of a word, no other punctuation. Words are separated by whitespace characters on the input file. Make sure if a word has a punctuation character like comma, exclamation point, semi colon, colon, or period at the end that the character is stripped off of the word and be sure if there is a double quote, (, < , [ on the front or the matching chars at the end, that those are stripped also. (only first and last char.).
I will provide 2 test data files for you to test your program with, I suggest that you make
up some simpler files for preliminary testing as my test files are quite long.
Format your output so that you have 3 columns of output per page
NEATLY aligned containing the misspelled word
as it appears on in the file.
For example:
theat testi anx waas cant istTest files are asg5a.dat and asg5b.dat . You may open them directly on the cs system
In order to verify that your hash table works well, count the number of collisions that occur in your table. Since there are well over 100,000 words in the spell checker dictionary, the size of your table should be close to or a prime number above 400,000. (NOT much larger though, it would be best to find the closest prime number at or above 400,000.
The spell check dictionary is located in my www/cosc3327 directory on the CS system which would be "~lbaker/cosc3327/spellcheck.txt" . I suggest each of you just hard code the name of the file directly into your program so that everyone does not make a copy of this file, it is rather large with approximately 104,217 words on it, one word per line.
BONUS: 25points
With each misspelled word found on the input data file, print
the line number along with the word that is misspelled, use
the same 3 column output format.
Grade Based on:
20% well-organized testing of code with handwritten labels.
20% well-documented, modular code including use of hash table and function.
40% correctness of code.
20% style and readability, neatness of output.