Vector representation of Internet Domain Names using a Word Embedding technique

被引:0
|
作者
Lopez, Waldemar [1 ]
Merlino, Jorge [1 ]
Rodriguez-Bocca, Pablo [1 ]
机构
[1] Univ Republica, Fac Ingn, Inst Comp, Julio Herrera y Reissig 565, Montevideo 11300, Uruguay
关键词
DNS; Word embeddings; word2vec; Tensorflow; Semantic Similarity; Natural Language Processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Word embeddings is a well known set of techniques widely used in natural language processing (NLP), and word2vec is a computationally-efficient predictive model to learn such embeddings. This paper explores the use of word embeddings in a new scenario. We create a vector representation of Internet Domain Names (DNS) by taking the core ideas from NLP techniques and applying them to real anonymized DNS log queries from a large Internet Service Provider (ISP). Our main objective is to find semantically similar domains only using information of DNS queries without any other previous knowledge about the content of those domains. We use the word2vec unsupervised learning algorithm with a Skip-Gram model to create the embeddings. And we validate the quality of our results by expert visual inspection of similarities, and by comparing them with a third party source, namely, similar sites service offered by Alexa Internet, Inc.
引用
下载
收藏
页数:8
相关论文
共 50 条
  • [21] Evaluation of Word Embedding via Domain Keywords
    Fu, Qunchao
    Li, Zongyang
    Han, Xu
    Wang, Cong
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 290 - 294
  • [22] Incorporating Domain Knowledge in Learning Word Embedding
    Roy, Arpita
    Park, Youngja
    Pan, Shimei
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1568 - 1573
  • [23] Governing Internet in Korea: NEIS and domain names
    Chung, CM
    ELECTRONIC GOVENMENT, PROCEEDINGS, 2003, 2739 : 480 - 483
  • [24] Domain names and trademarks in the internet age.
    Smith, N
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2000, 219 : U494 - U494
  • [25] Settlement of disputes over Internet domain names
    Bartosiak, Fah
    E-MENTOR, 2010, (01): : 73 - 76
  • [26] Internet: Domain names windfall causes flap
    Science, 5306 (1563):
  • [27] The Internet Commons: Encroached and Disputed Domain Names
    Chaisse, Julien
    INTERNATIONAL COMMUNITY LAW REVIEW, 2020, 22 (05) : 613 - 638
  • [28] Internet - Domain names windfall causes flap
    Mervis, J
    SCIENCE, 1997, 275 (5306) : 1563 - 1563
  • [29] Question Classification for the Travel Domain using Deep Contextualized Word Embedding Models
    Weerakoon, Charmy
    Ranathunga, Surangika
    MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, : 573 - 578
  • [30] Tracking the net: Using domain names to measure the growth of the Internet in US cities
    Moss, ML
    Townsend, A
    JOURNAL OF URBAN TECHNOLOGY, 1997, 4 (03) : 47 - 60