A Study of Semantic Disambiguation Based on HowNet

         Yang Xiaofeng, Li Tangqiu


          This thesis presents a description of a semantic disambiguation model applied in the syntax parsing process of the machine translation system.

The model uses Hownet as its main semantic resource, which is a common-sense knowledge base unveiling inter-conceptual relations and inter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents. It can provide rich semantic information for our disambiguation.

The model makes the word sense and structure disambiguation in the way of “preferring”. “preferring” is applied in the results produced by the parsing process. It combines the rule-based method and statistic based method.

          First we extract from a large the co-occurrence information of each sense-atom. The corpus is untagged so the extracting process is unguided. We can construct restricted rules from the co-occurrence information according to certain transfer template. The semantic entry of a word in the Hownet is made of sense-atoms, so we can make out the restricted rules for each entry of any word.

         During the course of disambiguation, the model constructs the context-related words set for each notational word in the input sentence. The semantic collocation relations between notional words can play a very important role in the syntax structure disambiguation. Our evaluation of some candidates is based on the degree of tightness of match between notional words in the structure. We compare the context-related words set of the word in the current structure with all the restricted rules of the word in the lexicon, and find the best match. Then the entry with the best match is taken as the word’s explanation. And the degree of similarity shows how the word in the structure matches with other notional words in it, so it can be taken as the reference of the notional words. Because the discrepancy of different candidate parses of a structure, the same word has different content-related words set, and so will get different scores. We can calculate the best match according to the score of all the notional words of the sentence. In this way we can solve the most of word sense disambiguation and structural disambiguation at the same time.

         The semantic disambiguation model proposed in this thesis has been implemented in MTG system. Our experiment shows that the model is very effective for this purpose. And it is obviously more tolerant and much better than traditional YES or NO clear cut method.

         In this thesis we first put forward the general idea of the method and give a brief introduce to the Hownet Dictionary. Then we give the methods of extracting co-occurrence information for each sense-atom from the corpus and transferring this information to restricted rules. Then the algorithm of disambiguation is proposed with detail, which includes constructing context-related words set, the calculation of the similarity between atom-senses, and between restricted-rules and the context-related sets. The experiment result given in the end of the paper shows that the method is effective.



          Word Sense Disambiguation, Hownet, InterLigua, Sense Atom, Corpus, Semantic Environment


PDF Version: paper3.pdf