Modeling Common Real-Word Relations Using Triples Extracted from n-Grams

被引:0
|
作者
Sipos, Ruben [1 ]
Mladenic, Dunja [1 ]
Grobelnik, Marko [1 ]
Brank, Janez [1 ]
机构
[1] Jozef Stefan Inst, Ljubljana 1000, Slovenia
来源
SEMANTIC WEB, PROCEEDINGS | 2009年 / 5926卷
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present an approach providing generalized relations for automatic ontology building based on frequent word n-grams. Using publicly available Google n-grams as our data source we can extract relations in form of triples and compute generalized and more abstract models. We propose an algorithm for building abstractions of the extracted triples using WordNet as background knowledge. We also present a novel approach to triple extraction using heuristics, which achieves notably better results than deep parsing applied on n-grams. This allows us to represent information gathered from the web as a set of triples modeling the common and frequent relations expressed in natural language. Our results have potential for usage in different settings including providing for a knowledge base for reasoning or simply as statistical data useful in improving understanding of natural languages.
引用
收藏
页码:16 / 30
页数:15
相关论文
共 34 条
  • [1] SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS
    ISOTANI, R
    MATSUNAGA, S
    SAGAYAMA, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 692 - 697
  • [2] Relation Extraction with Word Graphs from N-grams
    Qin, Han
    Tian, Yuanhe
    Song, Yan
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2860 - 2868
  • [3] Language Distance using Common N-Grams Approach
    Kosmajac, Dijana
    Keselj, Vlado
    2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [4] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
  • [5] Turkish Spelling Error Detection and Correction by Using Word N-grams
    Dalkilic, Gokhan
    Cebi, Yalcin
    2009 FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS IN SYSTEM ANALYSIS, DECISION AND CONTROL, 2010, : 63 - 66
  • [6] The use of word n-grams and parts of speech for hierarchical cluster language modeling
    Tang, Wen
    Vergyri, Dimitra
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1057 - 1060
  • [7] Classifying True and False Hebrew Stories Using Word N-Grams
    HaCohen-Kerner, Yaakov
    Dilmon, Rakefet
    Friedlich, Shimon
    Cohen, Daniel Nissim
    CYBERNETICS AND SYSTEMS, 2016, 47 (08) : 629 - 649
  • [8] Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
    Andrejkova, Gabriela
    Almarimi, Abdulwahed
    OPEN COMPUTER SCIENCE, 2016, 6 (01): : 168 - 177
  • [9] Modeling documents for structure recognition using generalized N-grams
    Brugger, R
    Zramdini, A
    Ingold, R
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 56 - 60
  • [10] Measuring similarity between Karel programs using character and word n-grams
    G. Sidorov
    M. Ibarra Romero
    I. Markov
    R. Guzman-Cabrera
    L. Chanona-Hernández
    F. Velásquez
    Programming and Computer Software, 2017, 43 : 47 - 50