Improving homograph disambiguation with supervised machine learning

被引:0
|
作者
Gorman, Kyle [1 ]
Mazovetskiy, Gleb [1 ]
Nikolaev, Vitaly [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
Homograph disambiguation; machine learning; text normalization; text-to-speech synthesis;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a pre-existing rule-based homograph disambiguation system used for text-to-speech synthesis at Google, and compare it to a novel system which performs disambiguation using classifiers trained on a small amount of labeled data. An evaluation of these systems, using a new, freely available English data set, finds that hybrid systems (making use of both rules and machine learning) are significantly more accurate than either hand-written rules or machine learning alone. The evaluation also finds minimal performance degradation when the hybrid system is configured to run on limited-resource mobile devices rather than on production servers. The two best systems described here are used for homograph disambiguation on all US English text-to-speech traffic at Google.
引用
收藏
页码:1349 / 1352
页数:4
相关论文
共 50 条
  • [1] Combining Active and Semi-supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis
    Shen, Binbin
    Wu, Zhiyong
    Wang, Yongxin
    Cai, Lianhong
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2176 - 2179
  • [2] A supervised machine learning approach to author disambiguation in the Web of Science
    Rehs, Andreas
    [J]. JOURNAL OF INFORMETRICS, 2021, 15 (03)
  • [3] Sense disambiguation for Punjabi language using supervised machine learning techniques
    Singh, Varinder Pal
    Kumar, Parteek
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2019, 44 (11):
  • [4] Sense disambiguation for Punjabi language using supervised machine learning techniques
    Varinder Pal Singh
    Parteek Kumar
    [J]. Sādhanā, 2019, 44
  • [5] Neuropsychological correlates of subordinate homograph disambiguation
    Griggs, CB
    Shenton, ME
    McCarley, RW
    Salisbury, DF
    [J]. PSYCHOPHYSIOLOGY, 2001, 38 : S45 - S45
  • [6] Effect of Supervised Sense Disambiguation Model Using Machine Learning Technique and Word Embedding in Word Sense Disambiguation
    Mahajan, Rupesh
    Kokane, Chandrakant
    Pathak, Kishor
    Kodmelwar, Manohar
    Wagh, Kapil
    Bhandari, Mahesh
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (01) : 436 - 443
  • [7] Homograph disambiguation using formal concept analysis
    Old, LJ
    [J]. FORMAL CONCEPT ANALYSIS, PROCEEDINGS, 2006, 3874 : 221 - 232
  • [8] Homograph Disambiguation Through Selective Diacritic Restoration
    Alqahtani, Sawsan
    Aldarmaki, Hanan
    Diab, Mona
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 49 - 59
  • [9] DomainNet: Homograph Detection and Understanding in Data Lake Disambiguation
    Leventidis, Aristotelis
    Di Rocco, Laura
    Gatterbauer, Wolfgang
    Miller, Renee J.
    Riedewald, Mirek
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2023, 48 (03):
  • [10] Ethnicity-based name partitioning for author name disambiguation using supervised machine learning
    Kim, Jinseok
    Kim, Jenna
    Owen-Smith, Jason
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2021, 72 (08) : 979 - 994