Content Based Audiobooks Indexing using Apache Hadoop Framework

被引:0
|
作者
Shetty, Sonal [1 ]
Sabarad, Akash [1 ]
Hebballi, Harish [1 ]
Husain, Moula [1 ]
Meena, S. M. [1 ]
Nagaralli, Shiddu [1 ]
机构
[1] BVBCET, Vidya Nagar, Hubli, India
关键词
Hadoop; MapReduce; tf-idf and CMU SPHINX-4;
D O I
10.1145/2791405.2791485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, content based audio indexing has become the key research area, as the audio content defines the content more precisely and has comparatively subservient density. In this paper, we present conversion of audio books into textual information using CMU SPHINX-4 speech transcriber and efficient indexing of audio books using term frequency inverse document frequency(tf-idf) weights on Apache Hadoop MapReduce framework. In the first phase, audiobook datasets are converted into textual words by training CMU SPHINX 4 speech recognizer with acoustic models. In the next phase, the keywords present in the text file generated from the speech recognizer are filtered using tf-idf weights. Finally, we index audio files based on the keywords extracted from the speech converted text file. As, conversion of speech to text and indexing of audio are space and time intensive tasks, we ported execution of these algorithms on Hadoop MapReduce Framework. Porting content based indexing of audio books on to a Hadoop distributed framework resulted in considerable improvement in time and space utilization. As the amount of data being uploaded and downloaded is escalating, this can be further extended to indexing of image, video and other multimedia forms.
引用
收藏
页码:496 / 501
页数:6
相关论文
共 50 条
  • [41] A structured learning framework for content-based image indexing and visual query
    Lim, JH
    Jin, JS
    MULTIMEDIA SYSTEMS, 2005, 10 (04) : 317 - 331
  • [42] A structured learning framework for content-based image indexing and visual query
    Joo-Hwee Lim
    Jesse S. Jin
    Multimedia Systems, 2005, 10 : 317 - 331
  • [43] PRACTICAL RESULTS USING APACHE HADOOP PLATFORM FOR DISTRIBUTED AND PARALLEL COMPUTING
    Toma, Cristian
    INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2012, : 30 - 35
  • [44] Development of a Network Intrusion Detection System Using Apache Hadoop and Spark
    Kato, Keisuke
    Klyuev, Vitaly
    2017 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING, 2017, : 416 - 423
  • [45] Content based image retrieval using category-based indexing
    Wardhani, A
    Thomson, T
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 783 - 786
  • [46] Design of Effective Indexing Technique in Hadoop-Based Database
    Shim, Jae-Sung
    Jang, Young-Hwan
    Ju, Yong-Wan
    Park, Seok-Cheon
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 90 - 95
  • [47] Numerical Calculations for Geophysics Inversion Problem Using Apache Hadoop Technology
    Krauzowicz, Lukasz
    Szostek, Kamil
    Dwornik, Maciej
    Oleksik, Pawel
    Piorkowski, Adam
    COMPUTER NETWORKS, 2012, 291 : 440 - 447
  • [48] Geometric content based indexing
    Huber, B
    Stiller, P
    Wan, C
    EXPLOITING NEW IMAGE SOURCES AND SENSORS, 26TH AIPR WORKSHOP, 1998, 3240 : 96 - 104
  • [49] SmartGrids: MapReduce Framework using Hadoop
    Fanibhare, Vaibhav
    Dahake, Vijay
    2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 406 - 411
  • [50] Content Based Image Retrieval with Hadoop
    Gaber, Heba
    Marey, Mohammed
    Amin, Safaa E.
    Tolba, Mohamed F.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 257 - 265