Online Systems
License Application
CKIP Members
Technical Report
Related Websites
Contact Us





Search All site
Search CKIP site

The goal of Sinica Treebank is to provide a syntactic, structure-tagged corpus for Chinese natural language processing. By extracting grammatical information from Treebank, we can improve the performance of the parser and learn more about the syntactic knowledge.

Sinica Treebank was built by CKIP in 1997 with texts taken from the Sinica Corpus. Based on ICG grammar (Information-based Case Grammar), the contexts are automatically parsed before being manually checked. The present version, Sinica Treebank v3.0, includes 61,087 trees (361,834 words). There are 1,000 tree structures open to the public for researchers to download. Meanwhile, a search interface on the website helps users who are interested in Chinese syntax and semantic relation.

The structural frame of Sinica Treebank is based on the Head-Driven Principle; that is, a sentence or phrase is composed of a core Head and its arguments, or adjuncts. The Head defines its phrasal category and relations with other constituents. For example, the Head of a sentence (S) or verb phrase (VP) is a verb (V). See Chen et al. (1999) The Construction of Sinica Treebank for details of supplementary principles, symbol illustrations, semantic roles, and phrasal structures.


Shih-Min Li, Su-Chu Lin, Keh-Jiann Chen, 2005, "A Probe into Ambiguities of Determinative-Measure Compounds", The 17th ROCLING Conference on Computational Linguistics and Speech Processing, september 15-16, 2005, national cheng hung university, tainan, taiwan, ROC.

Li Shih-Min, Su-Chu Lin and Keh-Jiann Chen, 2005. "Feature Representations and Logical Compatibility between Temporal Adverbs and Aspects", International Journal of Computational Linguistics & Chinese Language Processing, Vol. 10, No. 4. pp.445-457.

Li Shih-Min, Su-Chu Lin, Keh-Jiann Chen. 2004. "Feature Representations and Logical Compatibility between Temporal Adverbs and Aspects", 5th Chinese Lexical Semantics Workshop (CLSW-5). Singapore (June 14-16, 2004) & Genting Highland, Malaysia (June 17-19, 2004).

Lin Su-Chu, Shu-Ling Huang, Keh-Jiann Chen. 2004. " Taxonomy of Fine-grain Semantic Roles for Nominal Modifiers", 5th Chinese Lexical Semantics Workshop (CLSW-5). Singapore (June 14-16, 2004) & Genting Highland, Malaysia (June 17-19, 2004).

You Jia-Ming, Keh-Jiann Chen, 2004 "Automatic Semantic Role Assignment for a Tree Structure", Proceedings of SIGHAN workshop.

Chen Keh-Jiann, Yu-Ming Hsieh, 2004, "Chinese Treebanks and Grammar Extraction", Proceedings of IJCNLP-04, pp560-565.

Chen Keh-Jiann, Chu-Ren Huang, Feng-Yi Chen, Chi-Ching Luo,Ming-Chung Chang, Chao-Jan Chen, and Zhao-Ming Gao, 2003, "Sinica Treebank: Design Criteria, Representational Issues and Implementation". In Anne Abeille (Ed.) Treebanks Building and Using Parsed Corpora. Language and Speech series. Dordrecht:Kluwer, pp231-248.

Huang Chu-Ren, Keh-Jiann Chen, Feng-Yi Chen, Keh-Jiann Chen, Zhao-Ming Gao and Kuang-Yu Chen. 2000. Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface. Proceedings of 2nd Chinese Language Processing Workshop (Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL-2000). 29-37. October 7, 2000, Hong Kong.

Chen Keh-Jiann, et al. 1999. "The CKIP Chinese Treebank: Guidelines for Annotation", ATALA Workshop !V Treebanks, Paris, June 18-19 1999: pp85-96.

Feng-Yi Chen, Pi-Fang Tsai, Keh-Jiann Chen, Chu-Ren Hunag. 1999. The Construction of Sinica Treebank. Computational Linguistics and Chinese Language Processing, vol. 4, No. 2. pp.87-104.

Chen Keh-Jiann, Chu-Ren Huang, Li-Ping Chang, Hui-Li Hsu. 1996. "Sinica Corpus: Design Methodology for Balanced Corpra", Proceedings of the 11th Pacific Asia Conference on Language, Information, and Computation (PACLIC II), Seoul Korea, pp.167-176.

Chen Keh-Jiann. 1996. "A Model for Robust Chinese Parser", Computational Linguistics and Chinese Language Processing, vol. 1, No. 1. pp.183-204.

Chen Keh-Jiann, Chu-Ren Huang. 1994. "Features Constraints in Chinese Language Parsing", Proceedings of ICCPOL '94, pp. 223-228.

Chen Keh-Jiann. 1992. "Design Concepts for Chinese Parsers", 3rd International Conference on Chinese Information Processing, pp.1-22.


Su-Chu Lin

  Parser    Word Segmentation    Sinica Corpus    EHowNet