Word frequency counters introduced on the site

Recently I created a new online tool that I hope will help you learn foreign languages more effectively. It is online word frequency counters. So, how can they help you?

Well, we all sometimes face the following problem – you may want to watch a new movie or read a book in foreign language, but you don't know whether it suits your level. Of course, you can start watching the movie or reading the first chapter of the book, and it may seem that it is Ok, but later on you realize that it contains too many unknown words. And now you have a dilemma – to drop it or to continue watching (or reading). And it may be difficult to drop and it also may be difficult to continue – if the movie (or book) is too complicated.

Now you have a solution! You can just copy-paste the first chapter of your book in the word frequency counter, and it will show you the detailed statistics for your text: how many words from different frequency intervals it contains. The counter will also highlight high frequency words in different colors (based on their frequency rating). The counter supports subtitle files as well – if you want to analyze the vocabulary of a movie you want to watch.

The word frequency counters are available for the following languages:

If you want me to create a word frequency counter for another language, please feel free to contact me. If there's a word frequency list for this language available under a suitable license, I think I will be able to do that.

For those of you wondering where I found the frequency lists, the answer is – I used two sources:

  1. Frequency word lists on Invoke IT blog released under Creative Commons license. These frequency lists were compiled using subtitles from opensubtitles.org. The downside of that approach was that these lists contain the frequency rating only for modified forms of words. It may not be so important for the English language where usually there are maximum 2-4 forms for each word. But let's take French, for example, where each verb may have about 40 word forms! For such languages you should take into consideration that a rare form even of a very frequent word may have a low frequency rating.
  2. For the English language I also used word frequency list based on Corpus of Contemporary American English (COCA) compiled by Professor Mark Davies. I purchased the full version of this frequency list, and the professor accepted such usage of the list. If you choose this option, your text will be analyzed by lemmas. That means that all forms of a particular word will have the same frequency index. For example, the words "count", "counts", "counted" will all fall in the interval 1001-2000 most frequent words. By the way, this frequency list was used to create A Frequency Dictionary of Contemporary American English.

Another great news is that now the highlighting of high frequency words is integrated into all phonetic translators on the site (except for the Japanese which is coming). This option works only if you choose to convert your text to "transcription under each word" or "transcription under each line of text".

