Beyond Beall's Blacklist: Automatic Detection of Open Access Predatory Research Journals

被引:5
|
作者
Adnan, Awais [1 ]
Anwar, Sajid [1 ]
Zia, Tehseen [2 ]
Razzaq, Saad [3 ]
Maqbool, Fahad [3 ]
Rehman, Zia Ur [1 ]
机构
[1] Inst Management Sci Peshawar, Phase 7, Peshawar, Pakistan
[2] COMSATS Inst Informat Technol, Islamabad 44000, Pakistan
[3] Univ Sargodha, Dept Comp Sci, Sargodha, Pakistan
关键词
component; formatting; style; styling; insert (key words);
D O I
10.1109/HPCC/SmartCity/DSS.2018.00274
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The term "predatory journal" is referred for journals that exploit publishing model of open access publishers. They accept manuscripts that may either have flaws in terms of scholarly quality or they are charging fees to authors without delivering necessary editorial and pub-lishing services. To the best interest of high quality scholarly publications, the predatory jour-nals should be avoided for publications. Predatory journals are detected manually by using blacklists or using some non-verified heuristics. The former is frail since with continuously growing outbreak, it may be infeasible to constantly act as a watchdog. The latter is fragile since the heuristics are based on observations of some authors and have no documented proof of effectiveness. The paper presents a methodology and analysis to design automated predatory journal detection system. The detection task is posed as classification problem and utility of two different feature sets are tested for training classifiers. The first feature set is based on heuristics while the second set is composed of text features. Three classification algorithms are used to measure effectiveness of feature sets including kNN, support vector machine (SVM) and naive Bays. Since each classifier makes some assumptions about dataset, comparatively analyze between classifiers is an additional objective of this study. The highest achieved performance is by using SVM with heuristics-based features is 0.98 while SVM with text-based features show 0.96 which is second highest. However, one advantage that text-based features has over heuris-tics-based is that text features are comparatively easy to extract.
引用
收藏
页码:1692 / 1697
页数:6
相关论文
共 50 条