zh-cn:汉语 zh-tw:漢語

The Chinese language (汉语, 华语, or 中文) is a member of the Sino-Tibetan family of languages. Although most Chinese conceptualize the language as a singular term, the term contains huge variation both in spoken and written forms. In speech, the differences between different variations of Chinese are larger than those of Romance languages. In writing, differences in style and changes in the language over time also results in variations which are not mutually intelligible.

The terms and concepts used by Chinese to think about language are different than those used in the West, and much of the difference stems from differences in the political and social development of China in comparison with Europe. Whereas after the fall of the Roman Empire, Europe fragmented into small nation-states, whose identity was often defined by language, China was able to preserve cultural and political unity through the same period. Hence Chinese tend to think of lingustic differences which are as vast as those found in different languages in Europe as variations of a single language rather than different languages.

Chinese is a tonal language related to Tibetan and Burmese, but unrelated to other neighbouring languages genetically, such as, Korean, Vietnamese, Thai or Japanese. However, these languages were strongly influenced by Chinese in the course of history, linguistically and also extralinguistically. Korean and Japanese both have writing systems employing Chinese characters, which are called Hanja and Kanji respectively. Along with those two languages, Vietnamese also contains many Chinese loanwords and formerly used Chinese characters.

About one-fifth of the world speaks some form of Chinese as its native language, making it the most common language in the world. The Chinese language (spoken in its Standard Mandarin form) is the official language of the People's Republic of China, Republic of China, one of four official languages of Singapore, and one of six official languages of the United Nations.

Table of contents
1 Spoken Chinese
2 Written Chinese
3 Development of Chinese
4 Related topics
5 References
6 External links

Spoken Chinese

Main article: Chinese dialects

Spoken Chinese comprises many regional and mutually unintelligible variants. In the West, many people are familiar with the fact that the Romance languages all derive from Latin and so have many underlying features in common while being mutually unintelligible. The linguistic evolution of Chinese is similar, while the socio-political context is quite different.

In Europe, political fragmentation created independent states which are roughly the size of Chinese provinces. This created a political desire to create separate cultural and literary standards between nation-states and to standardize the language within a nation-state. In China, a single cultural and literary standard continued to exist while at the same time there was no desire to standardize the spoken language between different cities and counties. This has created a linguistic context which is very different than that of Europe, and this has profound implications for how to describe spoken variations of Chinese.

For example, in Europe, the language of a nation-state was usually standardized to be be similar to be that of the capital, making it easy to for example classify a language as French or Spanish. This had the effect of sharpening linguistic differences. A farmer on one side of the border would start to model his speech after Paris while a farmer on the other side would model his speech after Madrid. In China, these standardization did not happen, and so even categorizing variations can be difficult, in part because different dialect merge into each other. As a result, linguists will disagree among themselves as to classification.

Within the Sinitic system linguists distinguish seven to ten main groups (fang1yan2 qu1). These seven main groups can then be further subdivided into several more levels of differentiation (for example, the pian4 and xiao3pian4 of the Wu group, or anywhere from five to seven subdivisions of Min) wherein are listed.

In describing dialects, Chinese typically use the term the speech of location, for example Beijing hua for the speech of Beijing or Shanghai hua for the speech of Shanghai. In parts of south China, each major city can have its own dialect which is marginally intelligible to neighboring cities and completely intelligible to persons further afield.

Most linguists classify all of the variations of Chinese as part of the Sino-Tibetan language family and believe that there was an original language similar to Proto Indo-European from which the Sinitic and Tibeto-Burman languages descended. The relations between Chinese and the other Sino-Tibetan languages is still unclear and an area of active research as is attempt to reconstruct proto-Sino-Tibetan. The main difficulty in this effort is that while there is very good documentation that allows us to reconstruct the ancient sounds of Chinese, there is no written documentation concerning the division between proto-Sino-Tibetan and Chinese. In addition, many of the languages which would allow us to reconstruct proto-Sino-Tibetan are very poorly documented or understood.

The confusion of terms above for the variations and subvariations of Chinese reflect differing views on the nature of the Chinese language(s). Most Chinese laypeople would consider Chinese to be a single language, and some linguists follow this convention, since this is the "self-perception" of the Chinese language by the majority of its speakers; others consider Chinese to be a group of anywhere from seven to seventeen related languages, since these languages are not at all mutually intelligible, and show variation comparable to the Romance languages. This distinction can have some political overtones in that describing Chinese as different languages can imply that China should actually be several different nations.

It will perhaps be easier to understand how these groups are associated with geographical areas of China by examining the maps above and to the right. The seven main groups are Mandarin (represented by the lines drawn from Beijing), Wu, Xiang, Gan, Hakka, Yue, and Min (which linguists further divide into of 5 to 7 subdivisions on its own, which are all mutually unintelligible). Linguists who distinguish ten instead of seven major groups would then separate Jin from Mandarin, Pinghua from Yue, and Hui from Wu. There are also many smaller groups that confound efforts at classification: for example Dungan, a dialect of northwestern Mandarin spoken among Chinese-descended Muslims in Kyrghyzstan; Danzhou-hua, spoken on Hainan Island; Xiang-hua 乡话 (not to be confused with Xiang 湘), spoken in western Hunan; or Shaozhou-Tuhua, in northern Guangdong. (An informative article written in Chinese may be found at [1].)

A large, detailed map of the Chinese dialects / languages (in Chinese) can be found at http://www.uijin.idv.tw/download/地圖/中國漢語地圖.jpg. Note that Dungan is not shown, and this is an interesting case because even thought Dungan is very closely related to Mandarin, no one considers it "Chinese" because it is written in Cyrillic and spoken by people outside of China who are not considered Chinese in any sense.

It is common for speakers of Chinese to be able to speak several variations of the language. Typically in southern China, a person will be able to speak the official Mandarin Chinese, the local dialect, and occasionally either speak or understand another regional dialect, such as Cantonese Chinese.

Chinese speakers will frequently code switch between Mandarin and the local dialect (depending on situation). Sometimes, the various dialects are mixed from other dialects, depending on geographical influence. A person living in Taiwan for example, will commonly mix pronunciations, phrases, and words from Mandarin and Min-nan, and this mixture is considered socially appropriate under many circumstances.

Written Chinese

The Chinese written language employs the Han characterss (漢字 pinyin Hnz), which are named after the Han culture to which it is largely attributed. In Japan and Korea, Han characters were adopted and integrated into their languages and became Kanji and Hanja, respectively. Japan still uses Kanji as an integral part of its writing system; however, Korea's use of Hanja has diminished (indeed, it is not used at all in North Korea). In the field of software and communications internationalization, CJK is a collective term for Chinese, Japanese, and Korean, all of which are double-byte languages, as they have more then 256 characters in their alphabet. The computerized processing of Chinese characters involves some special issues both in input and character encoding schemes, as the standard 100+ key keyboards of todays computers don't allow input of that many characters with one key-press.

The Chinese writing system is mostly logographic, i.e., each character expresses a monosyllabic word part, also known as a morpheme. This is helped by the fact that 90%+ of Chinese morphemes are monosyllabic. Multisyllabic words have a separate logogram for each syllable. Some, but not all, Han characters are ideographs, but most Han Chinese characters have forms that were based on their pronunciation rather than their meanings, so they do not directly express ideas.

Relationship between spoken and written Chinese

The relationship between the Chinese spoken and written languages is complex. This complexity is compounded by the fact that the numerous variations of spoken Chinese have gone through centuries of evolution since at least the late-Han dynasty. However, written Chinese has changed much less than the spoken language.

Until the 20th century, most formal Chinese writing was done in classical Chinese, which was very different from any of the spoken varieties of Chinese in much the same way that Classical Latin is different from modern Romance languages. Chinese characters that are closer to the spoken language were used to write informal works such as colloquial novels.

Since the May Fourth Movement, the formal standard for written Chinese has been Vernacular Chinese, the grammar and vocabulary of which are similar, but not identical, to the grammar and vocabulary of modern spoken Mandarin. Although few new works are written in classical Chinese, the ability to read classical Chinese is taught in middle and high school and forms part of college entrance examinations.

Chinese characters are understood as morphemes which are independent of phonetic change. Thus, although the number one is "yi" in Mandarin, "yat" in Cantonese and "tsit" in Hokkien, they derive from a common ancient Chinese word and still share an identical character: 一. Nevertheless, the orthographies of Chinese dialects are not identical. The vocabularies used in the different dialects have also diverged. In addition, while literary vocabulary is often shared among all dialects (at least in orthography; the readings are different), colloquial vocabularies are often different.

The complex interaction between the Chinese written and spoken languages can be illustrated with Cantonese. There are two standards forms used in writing Cantonese: formal written Cantonese and colloquial written Cantonese. Formal written Cantonese is very similar to written Mandarin and can be read by a Mandarin speaker without much difficulty. However, formal written Cantonese is rather different from spoken Cantonese. Colloquial written Cantonese is more similar to spoken Cantonese but is largely unreadable by an untrained Mandarin speaker.

Cantonese is unique among non-Mandarin dialects in having a widely used written standard. The other dialects do not have widely used alternative written standards, but many have local characters or use characters which are archaic in "bai hua".

Classification of writing styles

One can classify Chinese writings into four basic types:
  • bai hua (白話) (Vernacular Chinese)
  • wen yan (文言) (Classical Chinese)
  • "written colloquial Chinese"-In particular, written colloquial Cantonese.
Cantonese is unique in that it has a commonly used written character system that is different from "bai hua" or "wen yan". Colloquial Chinese usually involves the use of "dialectal characters".
  • Poems and other Chinese constrained writings.

As with other aspects of the Chinese language, the contrast between different written standards is not sharp and there can be a socially accepted continuum between the written standards. For example, in writing an informal love letter, one may use informal bai hua. In writing a newspaper article, the language used is different and begins to include aspects of wen yan. In writing a ceremonial document, one would use even more wen yan. The language used in the ceremonial document may be completely different from that of the love letter, but there is a socially accepted continuum existing between the two. Pure "wen yan", however, is rarely used.

Character forms

There are currently two standards for printed Chinese characters. One is the Traditional system, used in Hong Kong, Macau, and Taiwan. Mainland China and Singapore use the Simplified system (developed by the PRC government in the 1950s), which uses simplified forms for many of the more complicated characters. In addition, most Chinese use some personal simplifications.

Development of Chinese

Old Chinese, sometimes known as 'Archaic Chinese', was the language common during the early and middle Zhou Dynasty (11th to 7th centuries B.C.), whose texts include inscriptions on bronze artifacts, the poetry of the Shijing, the history of the Shujing, and portions of the Yijing (I Ching). Work on reconstructing Old Chinese started with Qing dynasty philologists. The pioneer of Western study of Old Chinese is the Swedish linguist Bernhard Karlgren, whose work is based on the forms of the characters and the rhymes of the 'Shijing'. The phonetic elements found in the majority of Chinese characters also provide hints to their Old Chinese pronunciations. Old Chinese was not wholly uninflected. It possessed a rich sound system in which aspiration or rough breathing differentiated the consonants.

Middle Chinese was the language used during the Sui, Tang, and Song dynasties (7th through 10th centuries A.D.). It can be divided into an early period, for which the 切韻 'Qieyun' rhyme table (A.D. 601) relates to, and a late period in the 10th, which the 廣韻 'Guangyun' rhyme table reflects. Bernhard Karlgren called this phase 'Ancient Chinese'. Linguists are confident in having a good reconstruction of which Middle Chinese sounded like. The evidence for the pronunciation of Middle Chinese comes from several sources: modern dialect variations, rhyming dictionaries, and foreign translations. Just as Proto-Indo-European can be reconstructed from modern Indo-European languages, so can Middle Chinese be reconstructed (very tentatively) from modern dialects. In addition, ancient Chinese philologists devoted great amount of effort in summarizing the Chinese phonetic system through "rhyming tables", and these tables serve as a basis for the work of modern linguists. Finally, Chinese phonetic translations of foreign words also provide plenty of clues about the nature of Middle Chinese phonetics.

The development of the spoken Chinese languages from early historical times to the present has been complex. The language tree shown here shows how the present main divisions of the Chinese language developed out of an early common language. Comparison with the map above will give some idea of the complexities that have been left out of the tree. For instance, the Min language that is centered in Fujian Province contains five subdivisions, and the so-called northern language (which is called Mandarin in the West), also contains named subdivisions such as Yun-nan hua, Si-chuan hua, etc.

Most Chinese living in northern China, in Sichuan, and, actually, in a broad arc from the north-east (Manchuria) to the south-west (Yun-nan), use various Mandarin dialects as their home language. (See the three regions colored yellow and brown in the map above.) The prevalence of Mandarin throughout northern China is largely the result of geography, namely the plains of north China. By contrast, the mountains and rivers of southern China have promoted linguistic diversity. The presence of Mandarin in Sichuan is largely due to a plague in the 12th century. This plague, which may have been related to the black death, depopulated the area, leading to later settlement from north China.

Until the mid-20th century, most Chinese living in southern China did not speak any Mandarin. However, despite the mix of officials and commoners speaking various Chinese dialects, Beijingese Mandarin became dominant at least during the officially Manchu-speaking Qing Empire. Since the 17th century, the Empire had set up Orthoepy Academies (正音書院 Zhengyin Shuyuan) in an attempt to make pronunciation conform to the Beijing standard. But these attempts had little success.

This situation changed with the creation (in both the PRC and the ROC) of an elementary school education system committed to teaching Mandarin. As a result, Mandarin is now spoken fluently by most people in Mainland China and in Taiwan. In Hong Kong, the language of education and formal speech remains Cantonese but Mandarin is becoming increasingly influential.

Chinese characters appear to have originated in the Shang dynasty as pictograms depicting concrete objects. Over the course of the Zhou and Han dynasties, the characters became more and more stylistic. In addition, characters were added for words based on the sound of the word.

Related topics


  • Hannas, William. C. 1997. Asia's Orthographic Dilemma. University of Hawaii Press. ISBN 082481892X (paperback); ISBN 0824818423 (hardcover)
  • DeFrancis, John. 1990. The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii Press. ISBN 0824810686

External links