Retrieving domain-specific collocations by co-occurrences and word order constraints

被引:2
|
作者
Shimohata, S [1 ]
Sugio, T [1 ]
Nagata, J [1 ]
机构
[1] Oki Elect Ind Co Ltd, Res & Dev Grp, Kansai Labs, Chuo Ku, Osaka 5406025, Japan
关键词
corpus-based approach; collocation; machine translation;
D O I
10.1111/0824-7935.00085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously This method is practical because it uses plain text with no specific-language-dependent information, such as lexical knowledge and parts of speech. Experimental results using English and Japanese text corpora show that the method is equally applicable to both languages.
引用
收藏
页码:92 / 100
页数:9
相关论文
共 50 条
  • [1] Retrieving collocations by co-occurrences and word order constraints
    Shimohata, S
    Sugio, T
    Nagata, J
    [J]. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 476 - 481
  • [2] Identification and Classification of Health Queries: Co-Occurrences vs. Domain-Specific Terminologies
    Lopes, Carla Teixeira
    Ribeiro, Cristina
    [J]. INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS, 2014, 9 (03) : 55 - 71
  • [3] Word discrimination based on bigram co-occurrences
    El-Nasan, A
    Veeramachaneni, S
    Nagy, G
    [J]. SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 149 - 153
  • [4] Domain-specific and domain-general constraints on word and sequence learning
    Archibald, Lisa M. D.
    Joanisse, Marc F.
    [J]. MEMORY & COGNITION, 2013, 41 (02) : 268 - 280
  • [5] Domain-specific and domain-general constraints on word and sequence learning
    Lisa M. D. Archibald
    Marc F. Joanisse
    [J]. Memory & Cognition, 2013, 41 : 268 - 280
  • [6] Visualizing Textbook Concepts: Beyond Word Co-occurrences
    Sastry, Chandramouli Shama
    Jagaluru, Darshan Siddesh
    Mahesh, Kavi
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 363 - 376
  • [7] Word co-occurrences as a principle of an algorithm for extraction of terminology
    Zunker, G
    Rapp, R
    [J]. COGNITIVE ASPECTS OF LANGUAGE, 1996, 360 : 293 - 298
  • [8] ViCo: Word Embeddings from Visual Co-occurrences
    Gupta, Tanmay
    Schwing, Alexander
    Hoiem, Derek
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7424 - 7433
  • [9] Extended Strategies for Document Clustering with Word Co-occurrences
    Wei, Yang
    Wei, Jinmao
    Yang, Zhenglu
    [J]. WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 461 - 472
  • [10] DIAGNOSTICS FOR DOMAIN-SPECIFIC CONSTRAINTS
    GRANT, J
    KARMILOFFSMITH, A
    [J]. BEHAVIORAL AND BRAIN SCIENCES, 1991, 14 (04) : 621 - 621