Automatic Creation of a Domain Specific Thesaurus Using Siamese Networks

被引:1
|
作者
Dhaliwal, Mehak Preet [1 ]
Tiwari, Hemant [1 ]
Vala, Vanraj [1 ]
机构
[1] Samsung R&D Inst Bangalore, Bangalore 560037, Karnataka, India
关键词
BOOLEAN SEARCH; KEYWORD SEARCH;
D O I
10.1109/ICSC50631.2021.00066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent trends have increasingly indicated a shift in search technologies across all applications from syntactic and lexical matching approaches to semantic methods, aiming to understand the intent and contextual meaning of search queries, in order to yield more relevant and accurate results. Such methods often rely on semantic ontologies to map query words to concepts and aid in expansion. However, most applications require a domain specific language definition in order to overcome issues of ambiguity and misinterpretation of meaning. General purpose ontologies are often lacking in this purpose and fail to yield appropriate results in specific applications. In this paper, we propose a novel method of building a domain specific thesaurus for aiding semantic search through automatically creating a refined general thesaurus, followed by training a Siamese Network in two phases to classify candidate synonyms as relevant or non-relevant to the particular domain. We focus on the application of tag-based gallery image retrieval and extract and utilise information from Google's Conceptual Captions dataset in order to improve our model's performance. In order to investigate and justify our training method and architecture, we conduct an ablation study and compare results with our model. We further analytically and empirically demonstrate the advantage of representing terms in a domain-specific environment through semantic vectors fine-tuned on corpora related to the domain. Although our experiments are focused on building a word ontology specific to image retrieval, our method is generic and can be generalised to any field requiring a domain specific semantic language.
引用
收藏
页码:355 / 361
页数:7
相关论文
共 50 条
  • [1] Automatic thesaurus construction using Bayesian networks
    Park, YC
    Choi, KS
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (05) : 543 - 553
  • [2] Automatic generation of domain representations using thesaurus structures
    Lloréns, J
    Velasco, M
    de Amescua, A
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (10): : 846 - 858
  • [3] Use of automatic keyphrase generation for creation of a construction thesaurus
    Kosovac, B
    Vanier, DJ
    [J]. DURABILITY OF BUILDING MATERIALS AND COMPONENTS 8, VOLS 1-4, PROCEEDINGS, 1999, : 2507 - 2516
  • [4] Automatic creation of domain-specific reconfigurable CPLDs for SoC
    Holland, M
    Hauck, S
    [J]. FCCM 2005: 13TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2005, : 289 - 290
  • [5] Automatic creation of domain-specific reconfigurable CPLDs for SoC
    Holland, Mark
    Hauck, Scott
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2007, 26 (02) : 291 - 295
  • [6] Automatic Diagnosis of Mild Cognitive Impairment Using Siamese Neural Networks
    Estella-Nonay, E.
    Bachiller-Mayoral, M.
    Valladares-Rodriguez, S.
    Rincon, M.
    [J]. ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 416 - 425
  • [7] An automatic method to generate domain-specific investigator networks using PubMed abstracts
    Yu, Wei
    Yesupriya, Ajay
    Wulf, Anja
    Qu, Junfeng
    Gwinn, Marta
    Khoury, Muin J.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2007, 7 (1)
  • [8] An automatic method to generate domain-specific investigator networks using PubMed abstracts
    Wei Yu
    Ajay Yesupriya
    Anja Wulf
    Junfeng Qu
    Marta Gwinn
    Muin J Khoury
    [J]. BMC Medical Informatics and Decision Making, 7
  • [9] Automatic indexing from a thesaurus using Bayesian networks:: Application to the classification of parliamentary initiatives
    de Campos, Luis M.
    Fernandez-Luna, Juan M.
    Huete, Juan F.
    Romero, Alfonso E.
    [J]. SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, PROCEEDINGS, 2007, 4724 : 865 - +
  • [10] Semi-automatic construction of Thesaurus applying Domain Analysis techniques
    Díaz, I
    Velasco, M
    Lloréns, J
    Martínez, V
    [J]. INTERNATIONAL FORUM ON INFORMATION AND DOCUMENTATION, 1998, 23 (02): : 11 - 19