Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words

被引:0
|
作者
Davidov, Dmitry [1 ]
Rappoort, Ari [1 ]
机构
[1] Hebrew Univ Jerusalem, ICNC, IL-91904 Jerusalem, Israel
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel approach for discovering word categories, sets of words sharing a significant aspect of their meaning. We utilize meta-patterns of high-frequency words and content words in order to discover pattern candidates. Symmetric patterns are then identified using graph-based measures, and word categories are created based on graph clique sets. Our method is the first pattern-based method that requires no corpus annotation or manually provided seed patterns or words. We evaluate our algorithm on very large corpora in two languages, using both human judgments and WordNet-based evaluation. Our fully unsupervised results are superior to previous work that used a POS tagged corpus, and computation time for huge corpora are orders of magnitude faster than previously reported.
引用
收藏
页码:297 / 304
页数:8
相关论文
共 50 条
  • [1] Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
    Kamper, Herman
    Jansen, Aren
    Goldwater, Sharon
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 669 - 679
  • [2] Unsupervised word acquisition from speech using pattern discovery
    Park, Alex
    Glass, James R.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 409 - 412
  • [3] Unsupervised learning of human action categories using spatial-temporal words
    Niebles, Juan Carlos
    Wang, Hongcheng
    Fei-Fei, Li
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 79 (03) : 299 - 318
  • [4] Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
    Juan Carlos Niebles
    Hongcheng Wang
    Li Fei-Fei
    International Journal of Computer Vision, 2008, 79 : 299 - 318
  • [5] Unsupervised Part-of-Speech Disambiguation for High Frequency Words and Its Influence on Unsupervised Parsing
    Haenig, Christian
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 113 - 120
  • [6] Unsupervised Discovery of Recurring Spoken Terms Using Diagonal Patterns
    Sudhakar, P.
    Rao, K. Sreenivasa
    Mitra, Pabitra
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 61 - 69
  • [7] Unsupervised segmentation of words using prior distributions of morph length and frequency
    Creutz, M
    41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 280 - 287
  • [8] Unsupervised word alignment using frequency constraint in posterior regularized EM
    Kamigaito, Hidetaka
    Watanabe, Taro
    Takamura, Hiroya
    Okumura, Manabu
    EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2014, : 153 - 158
  • [9] Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics
    Rasanen, Okko
    Blandon, Maria Andrea Cruz
    INTERSPEECH 2020, 2020, : 4871 - 4875
  • [10] Efficient discovery of optimal word-association patterns in large text databases
    Shinichi Shimozono
    Hiroki Arimura
    Setsuo Arikawa
    New Generation Computing, 2000, 18 : 49 - 60