Two methods I’ve used:
1. A table of misspellings mapped to correct spellings, e.g. freind—> friend.
The problem with this is that it’s a never ending job to update the table. Also, do you include abbreviations etc or should these be in a separate table? And where do you stop? Do you include things like little—> small?
2. Approximate matching. Each word is checked against a list of words and if it doesn’t match any of them, the most similar word is substituted.
The problem with this is how to define the “most similar” word. One way is to use an edit distance similarity metric, but this can sometimes give silly results. For instance, teh—> ten when it’s obviously (to a human) a transposition of “the”.
Perhaps the best approach would be to combine both methods. Method 1 could cope with the most frequently occurring errors and method 2 could act as a backstop.
David