Corpus-based Learning of Cantonese for Mandarin Speakers

Since its return to China in 1997, Hong Kong has received a large number of visitors from mainland China to study and work in the city. There has thus been a marked increase in the need to teach Cantonese, the dialect spoken in Hong Kong, to the mainland Chinese, who predominantly speak Mandarin Chinese as their first language.

While computer-aided learning of Mandarin vocabulary and idioms has been much investigated (e.g. Yang & Xie 2013; Lewis, Luk & Ng 1998), less attention has been paid to Cantonese. Learning Cantonese as a second language can be challenging because it is used mostly in the spoken form, and rarely in the written form. Indeed, most previous studies on Cantonese have focused on pronunciation (e.g. Shi 2002, Rao 2003; Ki 2006).

To learn Cantonese vocabulary and grammar, it is helpful to supplement the spoken with written examples. Unfortunately, examples given in Cantonese textbooks tend to be artificially created, and do not always reflect the most colloquial or current usage. To address this dearth of teaching material, a parallel corpus of Cantonese and Mandarin Chinese has recently been compiled (Lee 2011). The corpus contains Cantonese sentences, transcribed from current television programs, and their corresponding Mandarin Chinese captions. In addition, equivalent words in the Cantonese and Mandarin sentences are aligned. On a web interface, one can search for sentences containing a particular Mandarin word, retrieve the original (transcribed) Cantonese utterance, and observe the aligned Cantonese word (Lee et al. 2013). In this paper, we report our experience in using this interface to teach Cantonese vocabulary to Mandarin-speaking undergraduates.

Although Cantonese and Mandarin share much common vocabulary, some Mandarin words have multiple translations in Cantonese; these pose the greatest challenge to the students. For instance, Mandarin “le” can function both as a perfect aspect particle and as a mood particle. However, in Cantonese, the corresponding perfect aspect particle is “jó”, while the corresponding mood particle is “la”.

We conducted an experiment to evaluate the effect of using a corpus-based method of teaching Cantonese vocabulary. In a pre-test, the student was asked to translate a set of Mandarin words into Cantonese; the set of words for which each student failed to translate correctly was noted. The student then worked through a computer-assisted language learning (CALL) session on the web interface of the parallel corpus mentioned above. Given a list of Mandarin words, the student was asked to search for sentences in which they appear, retrieve the Cantonese translation, and study the Cantonese words to which they were aligned. The list was personalized for each student: it contained half of the Mandarin words in his or her set that was collected in the pre-test. In a post-test and a delayed post-test, the student was assessed again on these Mandarin words. Our results show that students learned the Cantonese equivalents of a Mandarin word significantly better when they had worked on it in the CALL session.


John Lee    
Department of Chinese, Translation and Linguistics
City University of Hong Kong
Hong Kong

Tak-sum Wong    
Hong Kong Polytechnic University
Hong Kong


