Frequency Analysis Investigation

The fact that every letter in a language has a particular 'personality' helps when cracking monoalphabetic substitution ciphers where every letter has been substituted with another letter. Some letters in a particular language are used more often than other letters. In virtually all writing in English, 'E' is the most common letter, followed by 'T', followed by 'A'.

This can provide the basis of some great mathematical investigations.

1. Statistics

A short message will almost certainly not follow the average frequency distribution of letters in English. This is highlighted on the Code Book CD-ROM (link to http://www.virtualimage.co.uk/html/thecodebook.htm ) ('Cracking the Substitution Cipher'; 'Statistics'). The larger the sample, the closer the frequency distribution of letters adheres to the average. But there will always be exceptions, a book about Zorba riding a zebra across Zimbabwe, will not follow the average letter frequency distribution. The book A Void by Georges Perec (translated by Gilbert Adair) is written entirely without using the letter E. It can be an eye opening experience for students to try and hold a conversation using words without the letter E.

2. Different texts

An extension of this is investigating different English texts. The Code Book CD-ROM includes a video clip that shows Simon Singh comparing the distribution of letters in a Shakespeare sonnet with an article in the Sun ('How frequency analysis works'.) The words may be different, but the distribution of letters should be very similar.

3. Different Languages

Every language has its own particular pattern of letter frequencies. In German, the letter 'E' is much more common than it is in English. In Arabic, the letters 'A' and 'L' are the most common. If your students are studying other languages, or if English is their second language they can investigate how the frequencies differ.

4. Texting

Our language is constantly evolving, and today's school students are part of the 'Texting' generation, where 'be' and 'you' are one-letter words and whole words are represented by numbers or symbols. It would be very interesting to see how the frequency of letters in a students mobile inbox compares with the average for traditional English.