Mathematical research on languages

This topic is one of the chapter in my dissertation research. This is the trigger for me to start doing the Tone-da project. My dissertation was about using mathematical method to analyse the information and find out the relationship between subjects and use them into design.

The following chart is an analyzation I did summarising the relationship between the number of words that consist of different number of letters. The left side statistics can be also shown as the line chart on the right. The fact that the line appeals to be a 'bell curve'(normal distribution) but not a exponential function (y=a^x) is very interesting. And the mean of the curve is at around 8.5. That is saying, the words composed of 8 or 9 letters are most common in English


English is a phonetic language. 8 to 9 letters combination can be transfer into 2-4 syllables pronunciation. Therfore, 2-4 syllables words are most common in English. As a matter of fact, 2-4 syllables words are not only popular in English but also are the most common length of pronunciation in Chinese words as well. Due to this similarity of the two languages, I decided to start a project which helps English background learner to speak Chinese.

Ruohui Xu