Building a large-scale commonsense knowledge base by converting an existing one in a different language

被引:0
|
作者
Jung, Yuchul [1 ]
Lee, Joo-Young [2 ]
Kim, Youngho [1 ]
Park, Jaehyun [2 ]
Myaeng, Sung-Hyon [1 ]
Rim, Hae-Chang [2 ]
机构
[1] Informat & Commun Univ, Sch Engn, Taejon 305732, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul 136701, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of Concept-Relation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge. Our effort is an attempt to obtain its Korean version by utilizing a variety of language resources and tools. We not only employed a morphological analyzer and existing commercial machine translation software but also developed our own special-purpose translation and out-of-vocabulary handling methods. In order to handle ambiguity, we also devised a noisy concept filtering and concept generalization methods. Out of the 2.4 million assertions, i.e. triplets of concept-relation-concept, in the English ConceptNet, we generated about 200,000 Korean assertions so far. Based on our manual judgments of a 5% sample, the accuracy was 84.4%.
引用
收藏
页码:23 / +
页数:3
相关论文
共 50 条
  • [41] Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation
    Phan, Long
    Dang, Tai
    Tran, Hieu
    Trinh, Trieu H.
    Phan, Vy
    Chau, Lam D.
    Luong, Minh-Thang
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3131 - 3142
  • [42] Fusion Algorithm of Large-scale Language Model and Knowledge Graph for English Intelligent Teaching
    Ouyang, Censhu
    Hou, Boqi
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (09) : 445 - 452
  • [43] Investigation of large-scale building envelope leakage
    Searls, Carolyn L.
    Stubblefield, Taryn N.
    [J]. PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-FORENSIC ENGINEERING, 2013, 166 (01) : 27 - 40
  • [44] On one Approach to Building a Temporal Model of the Knowledge Base
    Burdaiev, Volodymyr
    [J]. COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
  • [45] Building a large knowledge base from a structured source
    Frank, G
    Farquhar, A
    Fikes, R
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (01): : 47 - 54
  • [46] Building theories of consistency and variability in children's language development: A large-scale data approach
    Tsui, Angeline Sin Mei
    Marchman, Virginia A.
    Frank, Michael C.
    [J]. ADVANCES IN CHILD DEVELOPMENT AND BEHAVIOR, VOL 61, 2021, 61 : 199 - 221
  • [47] Large-Scale Network Involvement in Language Processing
    Wylie, Korey P.
    Regner, Michael F.
    [J]. JOURNAL OF NEUROSCIENCE, 2014, 34 (47): : 15505 - 15507
  • [48] Large-scale photonic natural language processing
    Valensise, Carlo M.
    Grecco, Ivana
    Perangeli, Davide
    Conti, Laudio
    [J]. PHOTONICS RESEARCH, 2022, 10 (12) : 2846 - 2853
  • [49] Language requirements for large-scale generic libraries
    Siek, J
    Lumsdaine, A
    [J]. GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING, PROCEEDINGS, 2005, 3676 : 405 - 421
  • [50] Large-scale photonic natural language processing
    CARLO M.VALENSISE
    IVANA GRECCO
    DAVIDE PIERANGELI
    CLAUDIO C
    [J]. Photonics Research, 2022, 10 (12) : 2846 - 2853