Content Based Audiobooks Indexing using Apache Hadoop Framework

被引:0
|
作者
Shetty, Sonal [1 ]
Sabarad, Akash [1 ]
Hebballi, Harish [1 ]
Husain, Moula [1 ]
Meena, S. M. [1 ]
Nagaralli, Shiddu [1 ]
机构
[1] BVBCET, Vidya Nagar, Hubli, India
关键词
Hadoop; MapReduce; tf-idf and CMU SPHINX-4;
D O I
10.1145/2791405.2791485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, content based audio indexing has become the key research area, as the audio content defines the content more precisely and has comparatively subservient density. In this paper, we present conversion of audio books into textual information using CMU SPHINX-4 speech transcriber and efficient indexing of audio books using term frequency inverse document frequency(tf-idf) weights on Apache Hadoop MapReduce framework. In the first phase, audiobook datasets are converted into textual words by training CMU SPHINX 4 speech recognizer with acoustic models. In the next phase, the keywords present in the text file generated from the speech recognizer are filtered using tf-idf weights. Finally, we index audio files based on the keywords extracted from the speech converted text file. As, conversion of speech to text and indexing of audio are space and time intensive tasks, we ported execution of these algorithms on Hadoop MapReduce Framework. Porting content based indexing of audio books on to a Hadoop distributed framework resulted in considerable improvement in time and space utilization. As the amount of data being uploaded and downloaded is escalating, this can be further extended to indexing of image, video and other multimedia forms.
引用
收藏
页码:496 / 501
页数:6
相关论文
共 50 条
  • [21] Hap: Protecting the Apache Hadoop Clusters with Hadoop Authentication Process Using Kerberos
    Valliyappan, V.
    Singh, Parminder
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 151 - 161
  • [22] A Framework for Semantic Video Content Indexing Using Textual Information
    Mansouri, Sadek
    Charhad, Mbarek
    Rekik, Ali
    Zrigui, Mounir
    2018 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2018, : 107 - 110
  • [23] Performing Bayesian Inference using Apache Hadoop MapReduce
    Jongsawat, Nipat
    Premchaiswadi, Wichian
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFTWARE ENGINEERING (AISE 2014), 2014, : 420 - 424
  • [24] Generic content-based audio indexing and retrieval framework
    Kiranyaz, S.
    Gabbouj, M.
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2006, 153 (03): : 285 - 297
  • [25] Fast execution of RDF queries using Apache Hadoop
    Mazumdar, Somnath
    Scionti, Alberto
    ADVANCES IN COMPUTERS, VOL 119, 2020, 119 : 1 - 33
  • [26] Design of Friend Recommender System Using Apache Hadoop
    Nagpal, Lakshay
    Khurana, Nikhil
    2017 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT TECHNIQUES IN CONTROL, OPTIMIZATION AND SIGNAL PROCESSING (INCOS), 2017,
  • [27] TPM-Based Authentication Mechanism for Apache Hadoop
    Khalil, Issa
    Dou, Zuochao
    Khreishah, Abdallah
    INTERNATIONAL CONFERENCE ON SECURITY AND PRIVACY IN COMMUNICATION NETWORKS, SECURECOMM 2014, PT I, 2015, 152 : 105 - 122
  • [28] Design of Large-scale Content-based Recommender System using Hadoop MapReduce Framework
    Saravanan, S.
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 302 - 307
  • [29] Practical scalable image analysis and indexing using Hadoop
    Hare, Jonathon S.
    Samangooei, Sina
    Lewis, Paul H.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (03) : 1215 - 1248
  • [30] Practical scalable image analysis and indexing using Hadoop
    Jonathon S. Hare
    Sina Samangooei
    Paul H. Lewis
    Multimedia Tools and Applications, 2014, 71 : 1215 - 1248