Vector representation of Internet Domain Names using a Word Embedding technique

被引:0
|
作者
Lopez, Waldemar [1 ]
Merlino, Jorge [1 ]
Rodriguez-Bocca, Pablo [1 ]
机构
[1] Univ Republica, Fac Ingn, Inst Comp, Julio Herrera y Reissig 565, Montevideo 11300, Uruguay
关键词
DNS; Word embeddings; word2vec; Tensorflow; Semantic Similarity; Natural Language Processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Word embeddings is a well known set of techniques widely used in natural language processing (NLP), and word2vec is a computationally-efficient predictive model to learn such embeddings. This paper explores the use of word embeddings in a new scenario. We create a vector representation of Internet Domain Names (DNS) by taking the core ideas from NLP techniques and applying them to real anonymized DNS log queries from a large Internet Service Provider (ISP). Our main objective is to find semantically similar domains only using information of DNS queries without any other previous knowledge about the content of those domains. We use the word2vec unsupervised learning algorithm with a Skip-Gram model to create the embeddings. And we validate the quality of our results by expert visual inspection of similarities, and by comparing them with a third party source, namely, similar sites service offered by Alexa Internet, Inc.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Vector Representation of Bengali Word Using Various Word Embedding Model
    Rafat, Ashik Ahamed Aman
    Salehin, Mushfiqus
    Khan, Fazle Rabby
    Hossain, Syed Akhter
    Abujar, Sheikh
    [J]. PROCEEDINGS OF THE 2019 8TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2019), 2019, : 27 - 30
  • [2] Learning semantic information from Internet Domain Names using word embeddings
    Lopez, Waldemar
    Merlino, Jorge
    Rodriguez-Bocca, Pablo
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 94
  • [3] Word sense disambiguation: Mathematical modelling of adaptive word embedding technique for word vector
    Kokane, Chandrakant D.
    Babar, Sachin D.
    Mahalle, Parikshit N.
    Patil, Shivprasad P.
    [J]. JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2023, 26 (03) : 475 - 482
  • [4] Extractive Text Summarization using Word Vector Embedding
    Jain, Aditya
    Bhatia, Divij
    Thakur, Manish K.
    [J]. 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, : 51 - 55
  • [5] Precedent for Internet domain names
    不详
    [J]. ONLINE & CDROM REVIEW, 1997, 21 (04): : 245 - 245
  • [6] Precedent for Internet domain names
    [J]. Online and CDROM Review, 1997, 21 (04):
  • [7] Machine Learning Technique for Fake News Detection Using Text-Based Word Vector Representation
    Gaurav, Akshat
    Gupta, B. B.
    Hsu, Ching-Hsien
    Castiglione, Arcangelo
    Chui, Kwok Tai
    [J]. COMPUTATIONAL DATA AND SOCIAL NETWORKS, CSONET 2021, 2021, 13116 : 340 - 348
  • [8] Validating the representation of distance between infarct diseases using word embedding
    Daiki Yokokawa
    Kazutaka Noda
    Yasutaka Yanagita
    Takanori Uehara
    Yoshiyuki Ohira
    Kiyoshi Shikino
    Tomoko Tsukamoto
    Masatomi Ikusaka
    [J]. BMC Medical Informatics and Decision Making, 22
  • [9] Validating the representation of distance between infarct diseases using word embedding
    Yokokawa, Daiki
    Noda, Kazutaka
    Yanagita, Yasutaka
    Uehara, Takanori
    Ohira, Yoshiyuki
    Shikino, Kiyoshi
    Tsukamoto, Tomoko
    Ikusaka, Masatomi
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [10] In Defense of Word Embedding for Generic Text Representation
    Lev, Guy
    Klein, Benjamin
    Wolf, Lior
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 35 - 50