A Multiple Feature Category Data Mining and Machine Learning Approach to Characterize and Detect Health Misinformation on Social Media

被引:6
|
作者
Safarnejad, Lida [1 ]
Xu, Qian [2 ]
Ge, Yaorong [3 ]
Chen, Shi [3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Elon Univ, Elon, NC 27244 USA
[3] Univ N Carolina, Charlotte, NC 28223 USA
关键词
Feature extraction; Social networking (online); Data mining; Measurement; Internet; Blogs; Support vector machines;
D O I
10.1109/MIC.2021.3063257
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this article, we characterize health misinformation infiltration as a dynamic dissemination process on social media in addition to content-based features. Using Zika discussion on Twitter in 2016 as the study system, we identified 264 most influential tweets with misinformation and matched 455 tweets with real information. We developed an algorithm to infer information dissemination network through retweeting for each tweet, and extracted nine network metrics. We then approximated information dissemination as nonhomogeneous Poisson process (NHPP) signal. We then extracted 40 signal features to characterize each NHPP. For content-based features, we applied both linguistic inquiry and word count and document-to-vector to further extract 63 and 50 features for each tweet, respectively. Finally, we also considered four user features. Based on these extracted feature categories, we trained support vector machine and random forest (RF) classifiers. Using all feature categories combined as input, an RF classifier achieved > 83% accuracy and > 90% AUC to detect misinformation.
引用
收藏
页码:43 / 51
页数:9
相关论文
共 50 条
  • [21] Data Mining Approach for News Inspection on Social Media: A Survey
    Galphat, Yugchhaya
    Banga, Heena
    Dalvi, Isha
    Jethmalani, Priya
    Talreja, Shraddha
    [J]. PROCEEDING OF THE INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS, BIG DATA AND IOT (ICCBI-2018), 2020, 31 : 873 - 880
  • [22] Developing a Workflow Approach for Mining Online Social Media Data
    He, Wu
    Yan, Gongjun
    Shen, Jiancheng
    Tian, Xin
    [J]. 2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [23] Chicken Swarm-Based Feature Subset Selection with Optimal Machine Learning Enabled Data Mining Approach
    Hamdi, Monia
    Hilali-Jaghdam, Ines
    Khayyat, Manal M.
    Elnaim, Bushra M. E.
    Abdel-Khalek, Sayed
    Mansour, Romany F.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [24] Extracting Mental Health Indicators From English and Spanish Social Media: A Machine Learning Approach
    Villa-Perez, Miryam Elizabeth
    Trejo, Luis A.
    Moin, Maisha Binte
    Stroulia, Eleni
    [J]. IEEE ACCESS, 2023, 11 : 128135 - 128152
  • [25] A machine learning approach predicts future risk to suicidal ideation from social media data
    Arunima Roy
    Katerina Nikolitch
    Rachel McGinn
    Safiya Jinah
    William Klement
    Zachary A. Kaminsky
    [J]. npj Digital Medicine, 3
  • [26] FVEC feature and Machine Learning Approach for Indonesian Opinion Mining on YouTube Comments
    Musdholifah, Aina
    Rinaldi, Ekki
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTER SCIENCE AND INFORMATICS (EECSI 2018), 2018, : 724 - 729
  • [27] A machine learning approach predicts future risk to suicidal ideation from social media data
    Roy, Arunima
    Nikolitch, Katerina
    McGinn, Rachel
    Jinah, Safiya
    Klement, William
    Kaminsky, Zachary A.
    [J]. NPJ DIGITAL MEDICINE, 2020, 3 (01)
  • [28] A machine learning approach for sentiment analysis of breast implant recipients using social media data
    Saifudeen, Safa
    Shah, Shimonee
    Coplan, Paul
    Wood, Jennifer
    Debnath, Subhadeep
    Gupta, Shubham
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 353 - 353
  • [29] Heuristic Model to Improve Feature Selection Based on Machine Learning in Data Mining
    Majumdar, Jahin
    Mal, Anwesha
    Gupta, Shruti
    [J]. 2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, : 73 - 77
  • [30] Practical data mining and machine learning for optics applications: introduction to the feature issue
    Abdulla, Ghaleb
    Awwal, Abdul
    Borne, Kirk
    Ho, Tin Kam
    Vestrand, W. Thomas
    [J]. APPLIED OPTICS, 2011, 50 (22) : PDM1 - PDM2