A Chinese Toponym Recognition Method Based on Conditional Random Field

被引:0
|
作者
Wu L. [1 ]
Liu L. [1 ]
Li H. [1 ]
Gao Y. [1 ]
机构
[1] Institute of Geographic Information System and Remote Sensing, Peking University, Beijing
来源
Gao, Yong (gaoyong@pku.edu.can) | 2017年 / Editorial Board of Medical Journal of Wuhan University卷 / 42期
基金
中国国家自然科学基金;
关键词
Chinese toponym; Conditional random field; Natural language processing; Toponym recognition;
D O I
10.13203/j.whugis20141009
中图分类号
学科分类号
摘要
With the rapid development of the World Wide Web, a huge quantity of geographic information resources are hidden as unstructured texts. Toponym recognition is the foundation of mining the potential geographic information from these texts. In traditional toponym recognition methods based on the natural language processing, the structure of Chinese toponym and features of user customs are ignored, which results in the low recall and precision. In this paper, linguistic knowledge is introduced to analyze Chinese toponym, and the more specific morpheme categories are recognized. Then the process of toponym recognition is transformed into an equivalent sequence labeling problem based on the conditional random field. A proper labeling schema for Chinese toponym is also designed to improve the recognition accuracy. In the experiments, the 1.7 million tagged corpus of The People's Daily are used to test the proposed method. The recall, precision and F value of the result are 92.69%, 96.73% and 94.67% respectively, which are better than other machine learning models. It is proven that the proposed method is effective to recognize Chinese toponym. This research can provide more precise Toponym services for geographic information applications. © 2017, Research and Development Office of Wuhan University. All right reserved.
引用
收藏
页码:150 / 156
页数:6
相关论文
共 18 条
  • [1] Salton G., McGill M.J., Introduction to Modern Information Retrieval, (1986)
  • [2] Hill L.L., Georeferencing: The Geographic Associations of Information, (2009)
  • [3] Longley P.A., Goodchild M.F., Maguire D.J., Geographic Information Systems: Principles, Techniques, Applications and Management, (2008)
  • [4] Tan H., Zheng J., Liu K., Et al., Automatic Recognition Method of Chinese Toponym, Collected Works of Computational Linguistics, pp. 174-179, (1999)
  • [5] Lafferty J., McCallum A., Pereira F.C., Et al., Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the 18th International Conference on Machine Learning, (2001)
  • [6] Sha F., Pereira F., Shallow Parsing with Conditional Random Fields, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, (2003)
  • [7] Sarawagi S., Cohen W.W., Semi-Markov Conditional Random Fields for Information Extraction, Advances in Neural Information Processing Systems, 17, pp. 1185-1192, (2004)
  • [8] Liao W., A Study on Chinese Location Names Recognition Based on CRF, (2010)
  • [9] Qiu S., A Y., Wang F., Et al., Study on Automatic Recognition of Chinese Location Names Based on Statistical Method, Computer Technology and Development, 21, 11, pp. 35-38, (2011)
  • [10] Dong X., Human Geography and Spatial Distribution Feature Analysis of Chinese Toponym, (2012)