Text-based paper-level classification procedure for non-traditional sciences using a machine learning approach

被引:0
|
作者
Daniela Moctezuma
Carlos López-Vázquez
Lucas Lopes
Norton Trevisan
José Pérez
机构
[1] Centro de Investigación en Ciencias de Información Geoespacial,LatinGEO Lab IGM+ORT
[2] Universidad ORT,School of Arts Sciences and Humanities
[3] University of São Paulo,undefined
来源
关键词
Gold standard; NLP; Machine learning; Agreement’s annotator;
D O I
暂无
中图分类号
学科分类号
摘要
Science as a whole is organized into broad fields, and as a consequence, research, resources, students, etc., are also classified, assigned, or invited following a similar structure. Some fields have been established for centuries, and some others are just flourishing. Funding, staff, etc., to support fields are offered if there is some activity on it, commonly measured in terms of the number of published scientific papers. How to find them? There exist well-respected listings where scientific journals are ascribed to one or more knowledge fields. Such lists are human-made, but the complexity begins when a field covers more than one area of knowledge. How to discern if a particular paper is devoted to a field not considered in such lists? In this work, we propose a methodology able to classify the universe of papers into two classes; those belonging to the field of interest, and those that do not. This proposed procedure learns from the title and abstract of papers published in monothematic or “pure” journals. Provided that such journals exist, the procedure could be applied to any field of knowledge. We tested the process with Geographic Information Science. The field has contacts with Computer Science, Mathematics, Cartography, and others, a fact which makes the task very difficult. We also tested our procedure and analyzed its results with three different criteria, illustrating its power and capabilities. Interesting findings were found, where our proposed solution reached similar results as human taggers also similar results compared with state-of-the-art related work.
引用
收藏
页码:1503 / 1520
页数:17
相关论文
共 50 条
  • [1] Text-based paper-level classification procedure for non-traditional sciences using a machine learning approach
    Moctezuma, Daniela
    Lopez-Vazquez, Carlos
    Lopes, Lucas
    Trevisan, Norton
    Perez, Jose
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (02) : 1503 - 1520
  • [2] Analyzing journal category assignment using a paper-level classification system: multidisciplinary sciences journals
    Zhang, Jiandong
    Shen, Zhesi
    [J]. SCIENTOMETRICS, 2024,
  • [3] Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison
    Razali, Ansari
    Daud, Salwani Mohd
    Zin, Nor Azan Mat
    Shahidi, Faezehsadat
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 570 - 576
  • [4] Machine learning in bank merger prediction: A text-based approach 
    Katsafados, Apostolos G.
    Leledakis, George N.
    Pyrgiotakis, Emmanouil G.
    Androutsopoulos, Ion
    Fergadiotis, Manos
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (02) : 783 - 797
  • [5] Evaluating Content-Related Validity Evidence Using a Text-Based Machine Learning Procedure
    Anderson, Daniel
    Rowley, Brock
    Stegenga, Sondra
    Irvin, P. Shawn
    Rosenberg, Joshua M.
    [J]. EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2020, 39 (04) : 53 - 64
  • [6] Stock Market Prediction using Text-based Machine Learning
    Jordan, Tristan
    Elgazzar, Heba
    [J]. 2020 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS 2020), 2020, : 322 - 326
  • [7] Multi-Label Emotion Classification of Online Learners' Reviews Using Machine Learning Text-Based Multi-Label Classification Approach
    Makhoukhi, Hajar
    Roubi, Sarra
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON EDUCATION DEVELOPMENT AND STUDIES, ICEDS 2024, 2024, : 59 - 64
  • [8] Text-Based Emotion Recognition Using Deep Learning Approach
    Bharti, Santosh Kumar
    Varadhaganapathy, S.
    Gupta, Rajeev Kumar
    Shukla, Prashant Kumar
    Bouye, Mohamed
    Hingaa, Simon Karanja
    Mahmoud, Amena
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [9] Text-Based Emotion Recognition Using Deep Learning Approach
    Bharti, Santosh Kumar
    Varadhaganapathy, S.
    Gupta, Rajeev Kumar
    Shukla, Prashant Kumar
    Bouye, Mohamed
    Hingaa, Simon Karanja
    Mahmoud, Amena
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [10] Fruit Classification Using Traditional Machine Learning and Deep Learning Approach
    Saranya, N.
    Srinivasan, K.
    Kumar, S. K. Pravin
    Rukkumani, V
    Ramya, R.
    [J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 79 - 89