Active Manifold Learning with Twitter Big Data

被引:2
|
作者
Silva, Catarina [1 ,2 ]
Antunes, Mario [1 ,3 ]
Costa, Joana [1 ,2 ]
Ribeiro, Bernardete [2 ]
机构
[1] Polytech Inst Leiria, Sch Technol & Management, Leiria, Portugal
[2] Univ Coimbra, Ctr Informat & Syst, P-3000 Coimbra, Portugal
[3] Univ Porto, INESC TEC, Ctr Res Adv Comp Syst, P-4100 Oporto, Portugal
关键词
Big data; Support Vector Machine; Manifold; Twitter; CLASSIFICATION; VISUALIZATION;
D O I
10.1016/j.procs.2015.07.296
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The data produced by Internet applications have increased substantially. Big data is a flaring field that deals with this deluge of data by using storage techniques, dedicated infrastructures and development frameworks for the parallelization of defined tasks and its consequent reduction. These solutions however fall short in online and highly data demanding scenarios, since users expect swift feedback. Reduction techniques are efficiently used in big data online applications to improve classification problems. Reduction in big data usually falls in one of two main methods: (i) reduce the dimensionality by pruning or reformulating the feature set; (ii) reduce the sample size by choosing the most relevant examples. Both approaches have benefits, not only of time consumed to build a model, but eventually also performance-wise, usually by reducing overfitting and improving generalization capabilities. In this paper we investigate reduction techniques that tackle both dimensionality and size of big data. We propose a framework that combines a manifold learning approach to reduce dimensionality and an active learning SVM-based strategy to reduce the size of labeled sample. Results on Twitter data show the potential of the proposed active manifold learning approach.
引用
收藏
页码:208 / 215
页数:8
相关论文
共 50 条
  • [1] Scalable Manifold Learning for Big Data with Apache Spark
    Schoeneman, Frank
    Zola, Jaroslaw
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
  • [2] Rejoinder on: “On active learning methods for manifold data”
    Hang Li
    Enrique Del Castillo
    George Runger
    [J]. TEST, 2020, 29 : 42 - 49
  • [3] Comments on: On Active Learning Methods for Manifold Data
    Gahrooei, Mostafa Reisi
    Yan, Hao
    Paynabar, Kamran
    [J]. TEST, 2020, 29 (01) : 38 - 41
  • [4] Comments on: On active learning methods for manifold data
    Abhik Ghosh
    [J]. TEST, 2020, 29 : 34 - 37
  • [5] Rejoinder on: "On active learning methods for manifold data"
    Li, Hang
    Del Castillo, Enrique
    Runger, George
    [J]. TEST, 2020, 29 (01) : 42 - 49
  • [6] Active Learning for Mining Big Data
    Jahan, Sadia
    Shatabda, Swakkhar
    Farid, Dewan Md
    [J]. 2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [7] Twitter: Big data opportunities Response
    Lazer, David
    Kennedy, Ryan
    King, Gary
    Vespignani, Alessandro
    [J]. SCIENCE, 2014, 345 (6193) : 148 - 149
  • [8] TWITTER AS A SOURCE OF BIG SPATIAL DATA
    Kocich, David
    Horak, Jiri
    [J]. INFORMATICS, GEOINFORMATICS AND REMOTE SENSING CONFERENCE PROCEEDINGS, SGEM 2016, VOL I, 2016, : 921 - 928
  • [9] A Manifold Learning Framework for Reducing High-dimensional Big Text Data
    Salem, Rashed
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2017, : 347 - 352
  • [10] The big data mining forecasting model based on combination of improved manifold learning and deep learning
    Chen, Xiurong
    Tian, Yixiang
    [J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2019, 10 (02) : 119 - 131