Mathematical formula information retrieval system

被引:0
|
作者
Hou, Yong [1 ]
机构
[1] Bengbu Univ, Sch Comp & Informat Engn, Bengbu 233030, Anhui, Peoples R China
关键词
Mathematical formula; index; retrieval; mathematical content representation; document sorting; retrieval engine;
D O I
10.3233/JCM-226961
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Design and implementation of the system for retrieving information about mathematical formulas - MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 x 10(5) arXiv documents and about 1.5 x 10(8) mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.
引用
收藏
页码:2949 / 2973
页数:25
相关论文
共 50 条
  • [1] Embedding and generalization of formula with context in the retrieval of mathematical information
    Dadure, Pankaj
    Pakray, Partha
    Bandyopadhyay, Sivaji
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 6624 - 6634
  • [2] Formula Citation Graph Based Mathematical Information Retrieval
    Yuan, Ke
    Gao, Liangcai
    Jiang, Zhuoren
    Tang, Zhi
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 631 - 647
  • [3] Binary vector transformation of math formula for mathematical information retrieval
    Pathak, Amarnath
    Pakray, Partha
    Gelbukh, Alexander
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4685 - 4695
  • [4] Multi-dimensional Formula Feature Modeling for Mathematical Information Retrieval
    Yuan, Ke
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1381 - 1381
  • [5] WikiMirs: A Mathematical Information Retrieval System for Wikipedia
    Hu, Xuan
    Gao, Liangcai
    Lin, Xiaoyan
    Tang, Zhi
    Lin, Xiaofan
    Baker, Josef B.
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 11 - 20
  • [6] A Mathematical Information Retrieval System Based on RankBoost
    Yuan, Ke
    Gao, Liangcai
    Wang, Yuehan
    Yi, Xiaohan
    Tang, Zhi
    2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2016, : 259 - 260
  • [7] Learning to Rank for Mathematical Formula Retrieval
    Mansouri, Behrooz
    Zanibbi, Richard
    Oard, Douglas W.
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 952 - 961
  • [8] A Novel Mathematical Formula for Retrieval Algorithm
    Qin, Yuping
    Karimi, Hamid Reza
    Zhang, Aihua
    Leng, Qiangkui
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [9] Mathematical Information Retrieval: A Review
    Dadure, Pankaj
    Pakray, Partha
    Yay, Sivaji bandyopadh
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [10] A Synonym Mapping Method of Operators in Mathematical Formula Retrieval
    Li, Shanshan
    Tian, Xuedong
    Zuo, Lina
    Li, Xinfu
    2015 8TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI), 2015, : 629 - 633