Evaluating of Word Embeddings Hyper-parameters of the Master Data in Russian-Language Information Systems

被引:0
|
作者
Sergey, Dudnikov [1 ]
Petr, Mikheev [1 ]
Tatyana, Grinkina [1 ]
机构
[1] Bauman Moscow State Tech Univ, Moscow, Russia
关键词
Master data; Quality of master data; Word embeddings; Word2Vec model; Hyper-parameter settings;
D O I
10.1007/978-3-030-39216-1_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating of word embeddings hyper-parameters for master data quality support task is conducted in this work. We introduce the structure and management of master data. We also describe a method of training the embeddings model for elements and methods for estimating the resulting vectors. Using a corpus of training and validation data sets - 264 thousand records, we have conducted an experiment on models with a different set of parameters and get results. Our vectors give good results in mapping and classification problems on specific industry texts in comparison with standard approaches. In conclusion, we present the main recommendations for the hyper-parameters setting in the task of management of master data for industry conditions. Our methods successfully using in the RPA-systems and in the Data Warehouse like the text analysis module.
引用
收藏
页码:64 / 75
页数:12
相关论文
共 2 条
  • [1] Application of Information Parameters for the Classification of Russian-language Texts
    Filimonov, V. V.
    Zhivodyorov, A. A.
    Chernykh, Y. A.
    Gorbich, L. G.
    [J]. PHYSICS, TECHNOLOGIES AND INNOVATION (PTI-2019), 2019, 2174
  • [2] Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling
    Boudreau, Jeremie
    Patra, Akankshya
    Suvarna, Ashima
    Cook, Paul
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2736 - 2745