Mathematical formula information retrieval system

被引:0
|
作者
Hou, Yong [1 ]
机构
[1] Bengbu Univ, Sch Comp & Informat Engn, Bengbu 233030, Anhui, Peoples R China
关键词
Mathematical formula; index; retrieval; mathematical content representation; document sorting; retrieval engine;
D O I
10.3233/JCM-226961
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Design and implementation of the system for retrieving information about mathematical formulas - MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 x 10(5) arXiv documents and about 1.5 x 10(8) mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.
引用
收藏
页码:2949 / 2973
页数:25
相关论文
共 50 条
  • [21] A mathematical investigation on retrieval performance evaluation measures of information retrieval algorithm
    Song, JF
    Zhang, WM
    Xiao, WD
    ITCC 2005: International Conference on Information Technology: Coding and Computing, Vol 1, 2005, : 806 - 810
  • [22] MATHEMATICAL-MODEL OF INFORMATION-RETRIEVAL SYSTEM BASED ON CONCEPT OF FUZZY THESAURUS
    RADECKI, T
    INFORMATION PROCESSING & MANAGEMENT, 1976, 12 (05) : 313 - 318
  • [23] A Mathematical Formula Retrieval Method Using Structure Sub-tree
    Guan, Mingjie
    Tian, Xuedong
    Yang, Fang
    Yang, Songqiang
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 583 - 586
  • [24] The Effectiveness of Graph Contrastive Learning on Mathematical Information Retrieval
    Wang, Pei-Syuan
    Chen, Hung-Hsuan
    ADVANCES ON GRAPH-BASED APPROACHES IN INFORMATION RETRIEVAL, IRONGRAPHS 2024, 2025, 2197 : 60 - 72
  • [25] Transformer-Encoder-Based Mathematical Information Retrieval
    Reusch, Anja
    Thiele, Maik
    Lehner, Wolfgang
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2022), 2022, 13390 : 175 - 189
  • [26] A Modular Efficiency Determination Formula for Information Retrieval Evaluations and Optimizations
    Budak, Veli Ozcan
    ACTA INFOLOGICA, 2023, 7 (01): : 209 - 228
  • [28] Thesaurus and beyond: An advanced formula for linguistic engineering and information retrieval
    Schmitz-Esser, W
    KNOWLEDGE ORGANIZATION, 1999, 26 (01): : 10 - 22
  • [29] Quality of Service transferred to information retrieval: The adaptive information retrieval system
    Rolker, C
    Kramer, R
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 399 - 404
  • [30] Multilingual information retrieval system
    Hong, Z
    Syin, C
    Lia, KF
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 33 - 44