Title:

      基於《知網》的辭彙語義相似度計算
      Word Similarity Computing Based on How-net

 

Authors:

劉群、李素建

Qun LIU , Sujian LI

 

Abstract:

Word similarity is broadly used in many applications, such as information retrieval, information extraction, text classification, word sense disambiguation, example-based machine translation, etc.  There are two different methods used to compute similarity: one is based on ontology or a semantic taxonomy; the other is based on collocations of words in a corpus.

As a lexical knowledgebase with rich semantic information, How-net has been employed in various researches.  Unlike other thesauri, such as WordNet and Tongyici Cilin, in which word similarity is defined based on the distance between words in a semantic taxonomy tree, How-net defines a word in a complicated multi-dimensional knowledge description language.  As a result, a series of problems arise in the process of word similarity computation using How-net.  The difficulties are outlined below:

The description of each word consists of a group of sememes.  For example,   the Chinese word “暗箱(camera obscura) is described as: part|部件, #TakePicture|拍攝, %tool|用具, body|身”, and the Chinese word “寫信(write a letter) is described as: write|, ContentProduct=letter|信件”;

The meaning of a word is not a simple combination of these sememes.  Sememes are organized using a specific knowledge description language.

To meet these challenges, our work includes:

A study on the How-net knowledge description language.  We rewrite the  How-net definition of a word in a more structural format, using the abstract data structure of set and feature structure.

A study on the algorithm used to compute word similarity based on How-net.  The similarity between sememes, that between sets, and that between feature structures are given.  To compute the similarity between two sememes, we use the distance between the sememes in the semantic taxonomy, as is done in Wordnet and Tongyici Cilin.  To compute the similarity between two sets or two feature structures, we first establish a one-to-one mapping between the elements of the sets or the feature structures.  Then, the similarity between the sets or feature structures is defined as the weighted average of the similarity between their elements.  For feature structures, a one-to-one mapping is established according to the attributes.  For sets, a one-to-one mapping is established according to the similarity between their elements.

Finally, we give experiment results to show the validity of the algorithm and compare them with results obtained using other algorithms.  Our results for word similarity agree with people’s intuition to a large extent, and they are better than the results of two comparative experiments.

 

Keywords:

How-net, Word Similarity Computing, Natural Language Processing

 

PDF Version: paper3.pdf