Content Based Audiobooks Indexing using Apache Hadoop Framework

被引:0
|
作者
Shetty, Sonal [1 ]
Sabarad, Akash [1 ]
Hebballi, Harish [1 ]
Husain, Moula [1 ]
Meena, S. M. [1 ]
Nagaralli, Shiddu [1 ]
机构
[1] BVBCET, Vidya Nagar, Hubli, India
关键词
Hadoop; MapReduce; tf-idf and CMU SPHINX-4;
D O I
10.1145/2791405.2791485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, content based audio indexing has become the key research area, as the audio content defines the content more precisely and has comparatively subservient density. In this paper, we present conversion of audio books into textual information using CMU SPHINX-4 speech transcriber and efficient indexing of audio books using term frequency inverse document frequency(tf-idf) weights on Apache Hadoop MapReduce framework. In the first phase, audiobook datasets are converted into textual words by training CMU SPHINX 4 speech recognizer with acoustic models. In the next phase, the keywords present in the text file generated from the speech recognizer are filtered using tf-idf weights. Finally, we index audio files based on the keywords extracted from the speech converted text file. As, conversion of speech to text and indexing of audio are space and time intensive tasks, we ported execution of these algorithms on Hadoop MapReduce Framework. Porting content based indexing of audio books on to a Hadoop distributed framework resulted in considerable improvement in time and space utilization. As the amount of data being uploaded and downloaded is escalating, this can be further extended to indexing of image, video and other multimedia forms.
引用
收藏
页码:496 / 501
页数:6
相关论文
共 50 条
  • [1] Color and Texture Feature Extraction using Apache Hadoop Framework
    Sabarad, Akash K.
    Kankudti, Mohamed Humair
    Meena, S. M.
    Husain, Moula
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 585 - 588
  • [2] Apache Hadoop Based Distributed Denial of Service Detection Framework
    Patil, Nilesh Vishwasrao
    Krishna, C. Rama
    Kumar, Krishan
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY (ICICCT 2019), 2019, 1025 : 25 - 35
  • [3] Medical Content Based Image Retrieval by Using the HADOOP Framework
    Jai-Andaloussi, Said
    Elabdouli, Abdeljalil
    Chaffai, Abdelmajid
    Madrane, Nabil
    Sekkaki, Abderrahim
    2013 20TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS (ICT), 2013,
  • [4] Extensible Video Processing Framework in Apache Hadoop
    Ryu, Chungmo
    Lee, Daecheol
    Jang, Minwook
    Kim, Cheolgi
    Seo, Euiseong
    2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 2, 2013, : 305 - 308
  • [5] Distributed Content Based Image Search Engine using Hadoop Framework
    Uttarwar, Dhananjay
    Agarwal, Aakash
    Kadiwar, Riyaz
    Katkar, Vijay D.
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 1706 - 1710
  • [6] CLUSTERING AND INDEXING OF MULTIPLE DOCUMENTS USING FEATURE EXTRACTION THROUGH APACHE HADOOP ON BIG DATA
    Lydia, E. Laxmi
    Moses, G. Jose
    Varadarajan, Vijayakumar
    Nonyelu, Fredi
    Maseleno, Andino
    Perumal, Eswaran
    Shankar, K.
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, : 108 - 123
  • [7] Content Based Image Retrieval on Hadoop Framework
    Raju, U. S. N.
    George, Shibin
    Praneeth, V. Sairam
    Deo, Ranjeet
    Jain, Priyanka
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 661 - 664
  • [8] Study of Distributed Framework Hadoop and Overview of Machine Learning using Apache Mahout
    Solanki, Raxitkumar
    Ravilla, Sree Harsha
    Bein, Doina
    2019 IEEE 9TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2019, : 252 - 257
  • [9] Apache Hadoop-MapReduce on YARN framework latency
    El Yazidi, Abdelaziz
    Azizi, Mohamed Saad
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    12TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 4TH INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2021, 184 : 803 - 808
  • [10] Content sensitivity based access control framework for Hadoop
    Ashwin Kumar T.K.
    Hong Liu
    Johnson P.Thomas
    Xiaofeh Hou
    Digital Communications and Networks, 2017, 3 (04) : 213 - 225