APPENDIX 2: DEMO.TXT [Note: some of the examples discussed in this section will not produce the behavior implied in the text. I have included them in the misspelling-correction dictionary and, consequently, the program will handle them appropriately. The important point is that there are other similar words that will fail as described and you should be alert for them. You can always prevent the program from accepting any particular misspelling by putting it in the misspelling-correction dictionary.] This file is intended to demonstrate the capabilities and limitations of MicroSpell. To familiarize yourself with the program, I suggest you first read the section of the manual entitled "An Overview of MicroSpell", then read this file, and finally run MicroSpell on this text. If you want to test your spelling ability vis a vis the program, see how many spelling errors you can find in the list below. hindrance goverment Arknasas Kentucky did'nt vicious resemblance villiage sentance versatile apparant resistence mispelled modualtion pretain certian presistant disasterous aproximate maintenence neccessary embarass miniscule heruistic preceding minimun noticeable siezed sargeant congratulations newsstand twelfth inoculate suppress writting attorneys ^L Page 39 categorys auxiliary gurantee maneuver vengence seperate possesses nickle echos facinate forty reconnaissance changeable precede liscense irrelevent counterfeit shiek Exactly 21 of the words above are spelled correctly. (Some may argue with "miniscule" and "nickle", however, these do not appear to be the preferred spellings.) Run the program to see how most of the others should be spelled. A prime consideration in MicroSpell's design was to minimize the number of misspellings that could "sneak through" undetected. There are a few ways this can happen, though, and you should be aware of them. The most obvious is when the dictionary contains the misspelling. Of course, I have taken pains to insure the accuracy of the dictionary supplied, but no system or undertaking is immune to Murphy's law. Also, don't forget that the dictionary expands dynamically as the program runs, so if the suffix stripping routine inserts a misspelled stem/suffix combination into the dictionary, all subsequent occurrences will be accepted by the program. Since you will have to use a text editor to correct the errors that the suffix stripping routine accepts, your best defense against this problem is to search for ALL occurrences of the word you are correcting, not just the first (and only) one that the program displays. You can also use the /I switch as suggested in the "Limitations" ^L Page 40 section to circumvent this problem. Another category of "invisible errors" comprises nonsense "plurals". Since plurals are so common, the program attempts to handle these transparently so the output will not be cluttered with all the decompositions. Unfortunately, this allows it to be fooled into accepting nonsense plurals like "thes" or "nices". It does, however, have heuristics for detecting ill-formed plurals and will not allow words like "indexs" or "echos" to pass unchallenged. Since the program obviously has no understanding of the text it is reading, it will always pass words that are incorrectly or inappropriately used, if they happen to be spelled correctly. Thus, "principle" used for "principal" will always be accepted, as will "the" when you meant "they". This is one of the main reasons why you cannot rely solely on MicroSpell for proofreading. One of the first things you will notice about MicroSpell is that its dictionary is composed primarily of word stems and contains relatively few stem + suffix combinations. Instead, it relies heavily on its suffix stripping routine to process words appearing in other than canonical form. This "split dictionary" format greatly increases the number of distinct word types that the program ^L Page 41 can handle at the expense of burdening the operator with the task of verification, and a somewhat greater probability of error. In general, when the program prints a decomposition, you can assume the stem is correctly spelled (since it appears in the dictionary) and direct your attention to whether the suffix is properly formed and appropriate. There is a small class of words, however, whose members change spelling slightly with the addition of certain suffixes. For example, the word "disaster" loses its "e" with the addition of "ous", becoming "disastrous". Since the suffix stripping routine gladly accepts "disasterous" as "disaster" + "ous", you should pay close attention to all decompositions. The program occasionally prints a question mark on the line preceding the decomposition of certain words such as "occuring" and "encountered". The reason is that the word meets the three criteria which the program knows for doubling the final consonant (a suffix beginning with a vowel (1), added to a stem ending in a single final consonant (2), preceded by a single vowel (3)). Unfortunately, it cannot check the fourth condition (whether the final syllable of the stem is accented), so it calls your attention to this fact by printing the question mark. If the final syllable of the stem IS accented, the word is probably misspelled and you will have to use an editor to fix it (always search for ALL occurrences of the word in ^L Page 42 question because MicroSpell only displays the FIRST). You might wonder why the program can correctly guess "encounter" when it encounters "enconuter" but not "encounters" when it encounters "enconuters"; or why it correctly guesses "inserted" for "insreted" only if it has previously encountered "inserted". The reason is that it can only guess words which are contained in its dictionary. In the former example, "encounter" is in the dictionary, whereas, "encounters" is not. In the latter example, although "inserted" is not normally in the dictionary, the suffix stripping routine has temporarily inserted it, so the guessing routines will find the desired correction. This can be quite convenient since the program "learns" certain words it can't afford to keep around permanently, but it can also result in errors on the guess list if the suffix stripping routine has inserted a misspelling (see the "Limitations" section for more details). Occasionally the program produces a "creative" stem/suffix expansion. You will see an example of this with the (misspelled) word "writting" in this file. The program expands it as: WRIT+T+ING. Since "writ" is a valid english word, and since the suffix is properly formed, the program gladly accepts the nonsense word it has found, and the unexpected decomposition is your only clue that something is amiss. You will have to use a text editor to correct this ^L Page 43 error. If you make this particular mistake a lot, put "writting" into the misspelling-correction dictionary. I have deliberately omitted certain relatively uncommon words from the dictionary when their correct spelling happened to coincide with a likely misspelling of a common word. A case in point is the word "thew" which, if it were to appear in a text file, would likely be an instance of a misspelling of "the", "threw", "them", or "then", rather than of itself. Omitting such words from the dictionary is obviously the only way to give a non-intelligent program the ability to find this type of error. The suffixes "ence", "ance", "able", "ible", "ant", and "ent" were deliberately omitted from the suffix table because words ending with either member of each pair sound alike and consequently are difficult to spell. Since there is no easy way to determine which one is appropriate, the suffix stripping routine would have to accept either, and numerous (potential) errors would slip by. I have averted this problem by including many common words ending in these suffixes in the dictionary; thus, the program will find all such misspellings and will often be able to guess the correct spelling. Hint: virtually all words for which "ant" is the appropriate suffix also require "ance" (rather than "ence"; likewise for "ent"), so if the dictionary only ^L Page 44 contains one of the pair, you can, with reasonable confidence, deduce the other. I have included "woman", "women", "man", and "men" on the suffix list even though they are not usually considered suffixes. Since many (easy-to-spell) compounds are formed by appending one of these nouns, it seemed wasteful to include all of them in the dictionary when they could easily be handled by the suffix rowtine.