Cloud-based textual analysis as a basis for document classification

被引:4
|
作者
Weir, George R. S. [1 ]
Owoeye, Kolade [1 ]
Oberacker, Alice [1 ]
Alshahrani, Haya [1 ]
机构
[1] Univ Strathclyde, Dept Comp & Informat Sci, Glasgow, Lanark, Scotland
关键词
data mining; textual analysis; classification; feature-set; Cloud-service; Posit; SELECTION;
D O I
10.1109/HPCS.2018.00110
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Growing trends in data mining and developments in machine learning, have encouraged interest in analytical techniques that can contribute insights on data characteristics. The present paper describes an approach to textual analysis that generates extensive quantitative data on target documents, with output including frequency data on tokens, types, parts-of-speech and word n-grams. These analytical results enrich the available source data and have proven useful in several contexts as a basis for automating manual classification tasks. In the following, we introduce the Posit textual analysis toolset and detail its use in data enrichment as input to supervised learning tasks, including automating the identification of extremist Web content. Next, we describe the extension of this approach to Arabic language. Thereafter, we recount the move of these analytical facilities from local operation to a Cloud-based service. This transition, affords easy remote access for other researchers seeking to explore the application of such data enrichment to their own text-based data sets.
引用
收藏
页码:672 / 676
页数:5
相关论文
共 50 条
  • [1] SecureCEdit: An Approach for Secure Cloud-based Document Editing
    Arora, Shashank
    Varshney, Gaurav
    Atrey, Pradeep K.
    Mishra, Manoj
    [J]. 2016 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2016, : 561 - 564
  • [2] IMACEL: A cloud-based bioimage analysis platform for morphological analysis and image classification
    Shimahara, Yuki
    Sugawara, Ko
    Kojo, Kei H.
    Kawai, Hiroki
    Yoshida, Yuya
    Hasezawa, Seiichiro
    Kutsuna, Natsumaro
    [J]. PLOS ONE, 2019, 14 (02):
  • [3] Cloud-based differentially private image classification
    Elie Chicha
    Bechara Al Bouna
    Mohamed Nassar
    Richard Chbeir
    [J]. Wireless Networks, 2023, 29 : 997 - 1004
  • [4] Cloud-based differentially private image classification
    Chicha, Elie
    Al Bouna, Bechara
    Nassar, Mohamed
    Chbeir, Richard
    [J]. WIRELESS NETWORKS, 2023, 29 (03) : 997 - 1004
  • [5] Toward Cloud-based Classification and Annotation Support
    Swoboda, Tobias
    Kaufmann, Michael
    Hemmje, Matthias L.
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 2 (CLOSER), 2016, : 131 - 137
  • [6] Energy Consumption of Interactive Cloud-Based Document Processing Applications
    Vishwanath, Arun
    Jalali, Fatemeh
    Ayre, Robert
    Alpcan, Tansu
    Hinton, Kerry
    Tucker, Rodney S.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2013, : 4212 - 4216
  • [7] Healthcare Data Classification - Cloud-based Architecture Concept
    Miskuf, Martin
    Zolotova, Iveta
    Mocnej, Jozef
    [J]. 2018 CYBERNETICS & INFORMATICS (K&I), 2018,
  • [8] Efficient Cloud-Based Framework for Big Data Classification
    Pakdel, Rezvan
    Herbert, John
    [J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 195 - 201
  • [9] Recommendation Systems Based on Textual Document Analysis
    Prebreza, Rrezart
    Gotseva, Daniela
    Nakov, Ognyan
    [J]. 2021 29TH NATIONAL CONFERENCE WITH INTERNATIONAL PARTICIPATION (TELECOM), 2021, : 90 - 95
  • [10] Performance Analysis of Cloud-Based Application
    Budai, Peter
    Goldschmidt, Balazs
    [J]. LARGE-SCALE SCIENTIFIC COMPUTING, LSSC 2013, 2014, 8353 : 476 - 483