Reintroducing KAPD as a Dataset for Machine Learning and Data Mining Applications

被引:3
|
作者
Seddiq, Yasser [1 ,2 ]
Meftah, Ali [2 ]
Alghamdi, Mansour [1 ]
Alotaibi, Yousef [2 ]
机构
[1] King Abdulaziz City Sci & Technol, Riyadh, Saudi Arabia
[2] King Saud Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
imbalanced dataset; speech processing; KAPD; Arabic; corpus; data mining; machine learning;
D O I
10.1109/EMS.2016.21
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
KACST Arabic Phonetic Database (KAPD) has been in use by researchers for around fifteen years since its initial release. Researches in acoustics and phonetics have benefited from its phonetically rich content. In fact, KAPD has the potential to go further steps with the research community. In this work, KAPD is subject to enhancements and improvements in order to serve as dataset for machine learning and data mining application. This work involves refining and reviewing the already existing metadata of KAPD and adding new material that are necessary for machine learning and data mining applications. The updated phoneme statistics after the corpus upgrade are presented from different perspectives. Data format and time units are made compatible with those of HTK. The paper discusses the potential of KAPD to serve as either a balanced or an imbalanced dataset.
引用
收藏
页码:70 / 74
页数:5
相关论文
共 50 条
  • [1] FOWD: A Free Ocean Wave Dataset for Data Mining and Machine Learning
    Hafner, Dion
    Gemmrich, Johannes
    Jochum, Markus
    [J]. JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 2021, 38 (07) : 1305 - 1322
  • [2] Exploration of Machine Learning and Data Mining techniques on a horse racing dataset
    Kyriacou, E
    Toolan, F
    Dunnion, J
    [J]. MLMTA '05: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MACHINE LEARNING MODELS TECHNOLOGIES AND APPLICATIONS, 2005, : 161 - 166
  • [3] Machine learning, data mining, and computational statistics applications
    Wegman, Edward J.
    Said, Yasmin H.
    Scott, David W.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2011, 3 (03): : 187 - 187
  • [4] Machine Learning and Data Mining Applications in Power Systems
    Leonowicz, Zbigniew
    Jasinski, Michal
    [J]. ENERGIES, 2022, 15 (05)
  • [5] A Syllabus on Data Mining and Machine Learning with Applications to Cybersecurity
    Epishkina, Anna
    Zapechnikov, Sergey
    [J]. 2016 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION PROCESSING, DATA MINING, AND WIRELESS COMMUNICATIONS (DIPDMWC), 2016, : 194 - 199
  • [6] Data Mining and Machine Learning Applications for Educational Big Data in the University
    Abe, Keisuke
    [J]. IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 350 - 355
  • [7] Machine learning and data mining
    Mitchell, TM
    [J]. COMMUNICATIONS OF THE ACM, 1999, 42 (11) : 30 - 36
  • [8] Dataset of cannabis seeds for machine learning applications
    Chumchu, Prawit
    Patil, Kailas
    [J]. DATA IN BRIEF, 2023, 47
  • [9] Applications of data mining and machine learning framework in aquaculture and fisheries: A review
    Gladju, J.
    Kamalam, Biju Sam
    Kanagaraj, A.
    [J]. SMART AGRICULTURAL TECHNOLOGY, 2022, 2
  • [10] Research on the Structured Data Mining Algorithm and the Applications on Machine Learning Field
    Deng, Xiaodui
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND TECHNOLOGY EDUCATION (ICSSTE 2016), 2016, 55 : 865 - 870