Novel approach for quantitative and qualitative authors research profiling using feature fusion and tree-based learning approach

被引:0
|
作者
Umer M. [1 ]
Aljrees T. [2 ]
Ullah S. [1 ]
Bashir A.K. [3 ]
机构
[1] Department of Computer Science, Khwaja Fareed University of Engineering & IT, Punjab, Rahim Yar Khan
[2] Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin
[3] Department of Computing and Mathematics, The Manchester Metropolitan University, Manchester
关键词
Authors research profiling; Citation sentiment analysis; Ensemble learning; Feature engineering; Feature fusion; Intelligent recommendation and text analysis; Self citation analysis;
D O I
10.7717/PEERJ-CS.1752
中图分类号
学科分类号
摘要
Article citation creates a link between the cited and citing articles and is used as a basis for several parameters like author and journal impact factor, H-index, i10 index, etc., for scientific achievements. Citations also include self-citation which refers to article citation by the author himself. Self-citation is important to evaluate an author’s research profile and has gained popularity recently. Although different criteria are found in the literature regarding appropriate self-citation, self-citation does have a huge impact on a researcher’s scientific profile. This study carries out two cases in this regard. In case 1, the qualitative aspect of the author’s profile is analyzed using hand-crafted feature engineering techniques. The sentiments conveyed through citations are integral in assessing research quality, as they can signify appreciation, critique, or serve as a foundation for further research. Analyzing sentiments within in-text citations remains a formidable challenge, even with the utilization of automated sentiment annotations. For this purpose, this study employs machine learning models using term frequency (TF) and term frequency-inverse document frequency (TF-IDF). Random forest using TF with Synthetic Minority Oversampling Technique (SMOTE) achieved a 0.9727 score of accuracy. Case 2 deals with quantitative analysis and investigates direct and indirect self-citation. In this study, the top 2% of researchers in 2020 is considered as a baseline. For this purpose, the data of the top 25 Pakistani researchers are manually retrieved from this dataset, in addition to the citation information from the Web of Science (WoS). The selfcitation is estimated using the proposed model and results are compared with those obtained from WoS. Experimental results show a substantial difference between the two, as the ratio of self-citation from the proposed approach is higher than WoS. It is observed that the citations from the WoS for authors are overstated. For a comprehensive evaluation of the researcher's profile, both direct and indirect selfcitation must be included. © 2023 Umer et al.
引用
收藏
相关论文
共 50 条
  • [41] Survival analysis with time-varying regression effects using a tree-based approach
    Xu, RH
    Adak, S
    BIOMETRICS, 2002, 58 (02) : 305 - 315
  • [42] Towards tree-based systems disturbance monitoring of tropical mosaic landscape using a time series ensemble learning approach
    Abera, Temesgen
    Pellikka, Petri
    Johansson, Tino
    Mwamodenyi, James
    Heiskanen, Janne
    REMOTE SENSING OF ENVIRONMENT, 2023, 299
  • [43] Classification of multiple and single power quality disturbances using a decision tree-based approach
    Barbosa B.H.G.
    Ferreira D.D.
    Journal of Control, Automation and Electrical Systems, 2013, 24 (05) : 638 - 648
  • [44] Tree-based data aggregation approach in wireless sensor network using fitting functions
    Atoui, Ibrahim
    Ahmad, Ali
    Medlej, Maguy
    Makhoul, Abdallah
    Tawbe, Samar
    Hijazi, Abbas
    2016 SIXTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS (ICDIPC), 2016, : 146 - 150
  • [45] Snow avalanche susceptibility mapping using novel tree-based machine learning algorithms (XGBoost, NGBoost, and LightGBM) with eXplainable Artificial Intelligence (XAI) approach
    Iban, Muzaffer Can
    Bilgilioglu, Suleyman Sefa
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2023, 37 (06) : 2243 - 2270
  • [46] Snow avalanche susceptibility mapping using novel tree-based machine learning algorithms (XGBoost, NGBoost, and LightGBM) with eXplainable Artificial Intelligence (XAI) approach
    Muzaffer Can IBAN
    Suleyman Sefa BILGILIOGLU
    Stochastic Environmental Research and Risk Assessment, 2023, 37 : 2243 - 2270
  • [47] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [48] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
    Khan, Waleed Akram
    ul Qudous, Hamad
    Farhan, Asma Ahmad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
  • [49] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [50] Machine learning approach of speech emotions recognition using feature fusion technique
    Paul, Bachchu
    Bera, Somnath
    Dey, Tanushree
    Phadikar, Santanu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8663 - 8688