On-Device Language Detection and Classification of Extreme Short Text from Calendar Titles Across Languages

被引:0
|
作者
Muni, Rajasekhara Reddy Duvvuru [1 ]
Jayakumar, Devanand [1 ]
Sivakumar, Tadi Venkata [1 ]
Lee, ChangKu [2 ]
Hwang, YoungHa [2 ]
Kumaraguru, Karthikeyan [1 ]
机构
[1] Samsung R&D Inst India Bangalore, Bengaluru 560037, India
[2] Samsung Digital City, R3 Bldg 25F, Suwon, South Korea
关键词
Language detection; Short text classification; Event classification; fastText;
D O I
10.1007/978-3-031-08473-7_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Smartphones have become indispensable part of day-to-day human life. These devices provide rapid access to digital calendars enabling users to schedule their personal and professional activities with short titles referred as event titles. Event titles provide valuable information for personalization of various services. However, very nature of the event titles to be short with only few words, pose a challenge to identify language and exact event the user is scheduling. Deployment of robust machine learning pipelines that can continuously learn from data on the server side is not feasible as the event titles represent private user data and raise significant concerns. To tackle this challenge, we propose a privacy preserving on-device solution namely Calendar Event Classifier (CEC) to classify calendar titles into a set of 22 event types grouped into 3 categories using the fastText library. Our language detection models with accuracies of 96%, outperform existing language detection tools by 20% and our event classifiers achieved 92%, 94%, 87% and 90% accuracies across, English, Korean and German, French respectively. Currently tested CEC module architecture delivers the fastest (4 ms/event) predictions with <8 MB memory footprint and cater multiple personalization services. Taken together, we present the need for customization of machine learning models for language detection and information extraction from extremely short text documents such as calendar titles.
引用
收藏
页码:47 / 59
页数:13
相关论文
共 3 条
  • [1] Self-Governing Neural Networks for On-Device Short Text Classification
    Ravi, Sujith
    Kozareva, Zornitsa
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 804 - 810
  • [2] Self-Governing Neural Networks for On-Device Short Text Classification
    Ravi, Sujith
    Kozareva, Zornitsa
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 887 - 893
  • [3] Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police
    Quijano-Sanchez, Lara
    Liberatore, Federico
    Camacho-Collados, Jose
    Camacho-Collados, Miguel
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 149 : 155 - 168