Content Based Audiobooks Indexing using Apache Hadoop Framework

被引：0

作者：

Shetty, Sonal ^{[1
]}

Sabarad, Akash ^{[1
]}

Hebballi, Harish ^{[1
]}

Husain, Moula ^{[1
]}

Meena, S. M. ^{[1
]}

Nagaralli, Shiddu ^{[1
]}

机构：

[1] BVBCET, Vidya Nagar, Hubli, India

来源：

PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015) | 2015年

关键词：

Hadoop; MapReduce; tf-idf and CMU SPHINX-4;

D O I：

10.1145/2791405.2791485

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, content based audio indexing has become the key research area, as the audio content defines the content more precisely and has comparatively subservient density. In this paper, we present conversion of audio books into textual information using CMU SPHINX-4 speech transcriber and efficient indexing of audio books using term frequency inverse document frequency(tf-idf) weights on Apache Hadoop MapReduce framework. In the first phase, audiobook datasets are converted into textual words by training CMU SPHINX 4 speech recognizer with acoustic models. In the next phase, the keywords present in the text file generated from the speech recognizer are filtered using tf-idf weights. Finally, we index audio files based on the keywords extracted from the speech converted text file. As, conversion of speech to text and indexing of audio are space and time intensive tasks, we ported execution of these algorithms on Hadoop MapReduce Framework. Porting content based indexing of audio books on to a Hadoop distributed framework resulted in considerable improvement in time and space utilization. As the amount of data being uploaded and downloaded is escalating, this can be further extended to indexing of image, video and other multimedia forms.

引用

页码：496 / 501

页数：6

共 50 条

[31] Processing of Big Educational Data in the Cloud Using Apache Hadoop
Machova, Renata
Komarkova, Jitka
Lnenicka, Martin
INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
[32] Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark
Mavridis, Ilias
Karatza, Helen
JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 125 : 133 - 151
[33] Recommending Top N Movies Using Content-Based Filtering and Collaborative Filtering with Hadoop and Hive Framework
Bharti, Roshan
Gupta, Deepak
RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 109 - 118
[34] Context Based Genuine Content Recommendation System Using Hadoop
Bende, Sachin
Shedge, Rajashree
2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH, 2016, : 208 - 215
[35] Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework
Wei, Chih-Chiang
Chou, Tzu-Hao
ATMOSPHERE, 2020, 11 (08)
[36] Automated Indexing of Structured Scientific Metadata Using Apache Solr
Guntupally, Kavya
Dumas, Kyle
Darnell, Wade
Crow, Michael
Devarakonda, Ranjeet
Giri, Prakash
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5685 - 5687
[37] Retrieval and extraction of Unique Patterns from Compressed Text Data using the SVD Technique on Hadoop Apache Mahout Framework
Dhumal, Poonam
Deshmukh, S. S.
2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
[38] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
Ahmed, N.
Barczak, Andre L. C.
Susnjak, Teo
Rashid, Mohammed A.
JOURNAL OF BIG DATA, 2020, 7 (01)
[39] A Cloud Computing Implementation of XML Indexing Method Using Hadoop
Hsu, Wen-Chiao
Liao, I-En
Shih, Hsiao-Chen
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2012), PT III, 2012, 7198 : 256 - 265
[40] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
N. Ahmed
Andre L. C. Barczak
Teo Susnjak
Mohammed A. Rashid
Journal of Big Data, 7

← 1 2 3 4 5 →