Effect of Text Color on Word Embeddings

被引:4
|
作者
Ikoma, Masaya [1 ]
Iwana, Brian Kenji [1 ]
Uchida, Seiichi [1 ]
机构
[1] Kyushu Univ, Fukuoka, Japan
来源
DOCUMENT ANALYSIS SYSTEMS | 2020年 / 12116卷
关键词
Word embedding; Text color;
D O I
10.1007/978-3-030-57058-3_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In natural scenes and documents, we can find a correlation between text and its color. For instance, the word, "hot," is often printed in red, while "cold" is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.
引用
收藏
页码:341 / 355
页数:15
相关论文
共 50 条
  • [31] MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS
    Ostapiuk, Z., V
    Korotyeyeva, T. O.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2020, (04) : 95 - 105
  • [32] A Large-scale Text Analysis with Word Embeddings and Topic Modeling
    Choi, Won-Joon
    Kim, Euhee
    JOURNAL OF COGNITIVE SCIENCE, 2019, 20 (01) : 147 - 187
  • [33] Continuous Word Embeddings for Detecting Local Text Reuses at the Semantic Level
    Zhang, Qi
    Kang, Jihua
    Qian, Jin
    Huang, Xuanjing
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 797 - 806
  • [34] Effect of dimensionality change on the bias of word embeddings
    Rai, Rohit Raj
    Awekar, Amit
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 601 - 602
  • [35] Deep text classification of Instagram data using word embeddings and weak supervision
    Hammar, Kim
    Jaradat, Shatha
    Dokoohaki, Nima
    Matskin, Mihhail
    WEB INTELLIGENCE, 2020, 18 (01) : 53 - 67
  • [36] Lexicons on Demand: Neural Word Embeddings for Large-Scale Text Analysis
    Fast, Ethan
    Chen, Binbin
    Bernstein, Michael S.
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4836 - 4840
  • [37] Comparing General and Locally-Learned Word Embeddings for Clinical Text Mining
    Thadajarassiri, Jidapa
    Sen, Cansu
    Hartvigsen, Thomas
    Kong, Xiangnan
    Rundensteiner, Elke
    2019 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2019,
  • [38] Interpretable Segmentation of Medical Free-Text Records Based on Word Embeddings
    Dobrakowski, Adam Gabriel
    Mykowiecka, Agnieszka
    Marciniak, Mlgorzata
    Jaworski, Wojciech
    Biecek, Przemyslaw
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 45 - 55
  • [39] Text Similarity Estimation Based on Word Embeddings and Matrix Norms for Targeted Marketing
    vor der Bruck, Tim
    Pouly, Marc
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1827 - 1836
  • [40] From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings
    Butnaru, Andrei M.
    Ionescu, Radu Tudor
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1783 - 1792