A Seed-Based Method for Generating Chinese Confusion Sets

被引:6
|
作者
Liu, Liangliang [1 ]
Cao, Cungen [2 ]
机构
[1] Shanghai Univ Int Business & Econ, Sch Business Informat, Shanghai 201620, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Confusion set; pattern matching; context probability; pinyin similarity; shape similarity;
D O I
10.1145/2933396
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In natural language, people often misuse a word (called a "confused word") in place of other words (called "confusing words"). In misspelling corrections, many approaches to finding and correcting misspelling errors are based on a simple notion called a "confusion set." The confusion set of a confused word consists of confusing words. In this article, we propose a new method of building Chinese character confusion sets. Our method is composed of two major phases. In the first phase, we build a list of seed confusion sets for each Chinese character, which is based on measuring similarity in character pinyin or similarity in character shape. In this phase, all confusion sets are constructed manually, and the confusion sets are organized into a graph, called a "seed confusion graph" (SCG), in which vertices denote characters and edges are pairs of characters in the form (confused character, confusing character). In the second phase, we extend the SCG by acquiring more pairs of (confused character, confusing character) from a large Chinese corpus. For this, we use several word patterns (or patterns) to generate new confusion pairs and then verify the pairs before adding them into a SCG. Comprehensive experiments show that our method of extending confusion sets is effective. Also, we shall use the confusion sets in Chinese misspelling corrections to show the utility of our method.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Seed-based oral vaccines as allergen-specific immunotherapies
    Takaiwa, Fumio
    [J]. HUMAN VACCINES, 2011, 7 (03): : 357 - 366
  • [32] Supervised Evaluation of Seed-Based Interactive Image Segmentation Algorithms
    Andrade, Fernanda
    Carrera, Enrique V.
    [J]. 2015 20TH SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND COMPUTER VISION (STSIVA), 2015,
  • [33] Seed-Based De-Anonymizability Quantification of Social Networks
    Ji, Shouling
    Li, Weiqing
    Gong, Neil Zhenqiang
    Mittal, Prateek
    Beyah, Raheem
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016, 11 (07) : 1398 - 1411
  • [34] A new seed-based assay for meiotic recombination in Arabidopsis thaliana
    Melamed-Bessudo, C
    Yehuda, E
    Stuitje, AR
    Levy, AA
    [J]. PLANT JOURNAL, 2005, 43 (03): : 458 - 466
  • [35] SPIKE-BASED AND SEED-BASED SELECTION FOR PREHARVEST SPROUTING RESISTANCE IN WHEAT
    PATERSON, AH
    SORRELLS, ME
    [J]. EUPHYTICA, 1990, 46 (02) : 149 - 155
  • [36] On the Relationship Between Seed-Based and ICA-Based Measures of Functional Connectivity
    Joel, Suresh E.
    Caffo, Brian S.
    van Zijl, Peter C. M.
    Pekar, James J.
    [J]. MAGNETIC RESONANCE IN MEDICINE, 2011, 66 (03) : 644 - 657
  • [37] Adaptive intertidal seed-based seagrass restoration in the Dutch Wadden Sea
    Govers, Laura L.
    Heusinkveld, Jannes H. T.
    Graefnings, Max L. E.
    Smeele, Quirin
    van der Heide, Tjisse
    [J]. PLOS ONE, 2022, 17 (02):
  • [38] Comparison of functional thalamic segmentation from seed-based analysis and ICA
    Hale, Joanne R.
    Mayhew, Stephen D.
    Mullinger, Karen J.
    Wilson, Rebecca S.
    Arvanitis, Theodoros N.
    Francis, Susan T.
    Bagshaw, Andrew P.
    [J]. NEUROIMAGE, 2015, 114 : 448 - 465
  • [39] Seed-based approach for identifying flora at risk from climate warming
    Cochrane, Anne
    Daws, Matthew I.
    Hay, Fiona R.
    [J]. AUSTRAL ECOLOGY, 2011, 36 (08) : 923 - 935
  • [40] Spatial design improves efficiency and scalability of seed-based seagrass restoration
    Grafnings, Max L. E.
    Heusinkveld, Jannes H. T.
    Hijner, Nadia
    Hoeijmakers, Dieuwke J. J.
    Smeele, Quirin
    Zwarts, Maarten
    van Der Heide, Tjisse
    Govers, Laura L.
    [J]. JOURNAL OF APPLIED ECOLOGY, 2023, 60 (06) : 967 - 977