Make a webpage which explains the Wikipedia n-grams Make tools multi-language, i.e. allowing extended charsets, not only a-z For more TODOs, please look for the TODO-marks in the *.py and *.c files Also make Markov/Sinkov statistics based on normal dictionaries (non-mangled!). Pro: - n-Grams with a zero probability will have a score of zero. With Wikipedia articles, it is possible that an inpossible n-Gram comes when concatinating two words during the mangling process. Contra: - The statistic does not contain how often a n-Gram is used, since the dictionary does not show how often the word is used - Scoring of already mangled input texts (e.g. telegrams) will be bad, since the n-grams that form between words will have a low probability in the scoring table.