Correcting the Autocorrect: Context-Aware Typographical Error Correction via Training Data Augmentation

被引:0
|
作者
Shah, Kshitij [1 ]
de Melo, Gerard [1 ]
机构
[1] Rutgers Univ New Brunswick, Dept Comp Sci, New Brunswick, NJ 08854 USA
关键词
Corpus; Error Generation; Deep Learning;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we explore the artificial generation of typographical errors based on real-world statistics. We first draw on a small set of annotated data to compute spelling error statistics. These are then invoked to introduce errors into substantially larger corpora. The generation methodology allows us to generate particularly challenging errors that require context-aware error detection. We use it to create a set of English language error detection and correction datasets. Finally, we examine the effectiveness of machine learning models for detecting and correcting errors based on this data.
引用
收藏
页码:6930 / 6936
页数:7
相关论文
共 50 条
  • [1] CARE: context-aware sequencing read error correction
    Kallenborn, Felix
    Hildebrandt, Andreas
    Schmidt, Bertil
    BIOINFORMATICS, 2021, 37 (07) : 889 - 895
  • [2] Labeling lateral prefrontal sulci using spherical data augmentation and context-aware training
    Lyu, Ilwoo
    Bao, Shuxing
    Hao, Lingyan
    Yao, Jewelia
    Miller, Jacob A.
    Voorhies, Willa
    Taylor, Warren D.
    Bunge, Silvia A.
    Weiner, Kevin S.
    Landman, Bennett A.
    NEUROIMAGE, 2021, 229
  • [3] Context-Aware Data Augmentation for Efficient Object Detection by UAV Surveillance
    Gordienko, Yuri
    Rokovyi, Oleksandr
    Alienin, Oleg
    Stirenko, Sergii
    2022 10TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2022,
  • [4] Context-aware Attention-based Data Augmentation for POI Recommendation
    Li, Yang
    Luo, Yadan
    Zhang, Zheng
    Sadiq, Shazia
    Cui, Peng
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 177 - 184
  • [5] CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge Extraction
    Zhang, Zhang
    Mao, Xinjun
    Wang, Shangwen
    Yang, Kang
    Lu, Yao
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 438 - 449
  • [6] CONTEXT-AWARE DATA AUGMENTATION FOR LIDAR 3D OBJECT DETECTION
    Hu, Xuzhong
    Duan, Zaipeng
    Huang, Xiao
    Xu, Ziwen
    Ming, Delie
    Ma, Jie
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 11 - 15
  • [7] ContextMix: A context-aware data augmentation method for industrial visual inspection systems
    Kim, Hyungmin
    Kim, Donghun
    Ahn, Pyunghwan
    Suh, Sungho
    Cho, Hansang
    Kim, Junmo
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [8] Context-Aware Adversarial Graph-Based Learning for Multilingual Grammatical Error Correction
    Kumar, Naresh
    Kumar, Parveen
    Tripath, Sushreeta
    Samal, Neelamani
    Gountia, Debasis
    Gatla, Praveen
    Singh, Teekam
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
  • [9] Temporal prediction model with context-aware data augmentation for robust visual reinforcement learning
    Yue, Xinkai
    Ge, Hongwei
    He, Xin
    Hou, Yaqing
    Neural Computing and Applications, 2024, 36 (31) : 19337 - 19352
  • [10] Data Management for Context-Aware Computing
    Xue, Wenwei
    Pung, Hungkeng
    Ng, Wenlong
    Gu, Tao
    EUC 2008: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING, VOL 1, MAIN CONFERENCE, 2008, : 492 - +