SpeechToText: An open-source software for automatic detection and transcription of voice recordings in digital forensics

被引:6
|
作者
Negra, Miguel [1 ,2 ]
Domingues, Patricio [1 ,2 ,3 ]
机构
[1] Polytech Inst Leiria, Sch Technol & Management, Leiria, Portugal
[2] Comp Sci & Commun Res Ctr, Leiria, Portugal
[3] Inst Telecomunicacoes, Aveiro, Portugal
关键词
Voice recordings; Automatic speech recognition; Automatic speech transcription; Digital forensics; Android applications; RECOGNITION;
D O I
10.1016/j.fsidi.2021.301223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Voice is the most natural way for humans to communicate with each other, and more recently, to interact with voice controlled digital machines. Although text is predominant in digital platforms, voice and video are becoming increasingly important, with communication applications supporting voice messages and videos. This is relevant for digital forensic examinations, as content held in voice format can hold relevant evidence for the investigation. In this paper, we present the open source SpeechToText software, which resorts to state-of-the art Voice Activity Detection (VAD) and Automatic Speech Recognition (ASR) modules to detect voice content, and then to transcribe it to text. This allows integrating voice content into the regular flow of a digital forensic investigation, with transcribed audio indexed by text search engines. Although SpeechToText can be run independently, it also provides a Jython-based software module for the well-known Autopsy software. The paper also analyzes the availability, storage location and audio format of voice-recorded content in 14 popular Android applications featuring voice recordings. SpeechToText achieves 100% accuracy for detecting voice in unencrypted audio/video files, a word error rate (WER) of 27.2% when transcribing English voice messages by non-native speakers and a WER of 7.80% for the test-clean set of LibriSpeech. It achieves a real time factor of 0.15 for the detection and transcription process in a medium-range laptop, meaning that 1 min of speech is processed in roughly 9 s. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] The Case for Open Source Software in Digital Forensics
    Zanero, Stefano
    Huebner, Ewa
    [J]. OPEN SOURCE SOFTWARE FOR DIGITAL FORENSICS, 2010, : 3 - +
  • [2] On the inadequacy of open-source application logs for digital forensics
    Azahari, Afiqah
    Eurecom, Davide Balzarotti
    [J]. FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 49
  • [3] Digital Preservation in Open-Source Digital Library Software
    Madalli, Devika P.
    Barve, Sunita
    Amin, Saiful
    [J]. JOURNAL OF ACADEMIC LIBRARIANSHIP, 2012, 38 (03): : 161 - 164
  • [4] Evaluating OpenFace: an open-source automatic facial comparison algorithm for forensics
    Fydanaki, Angeliki
    Geradts, Zeno
    [J]. FORENSIC SCIENCES RESEARCH, 2018, 3 (03) : 202 - 209
  • [5] Turkish Broadcast News Transcription with Open-Source Software
    Can, Dogan
    Saraclar, Murat
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 567 - 570
  • [6] Automatic Classification of Software Artifacts in Open-Source Applications
    Ma, Yuzhan
    Fakhoury, Sarah
    Christensen, Michael
    Arnaoudova, Venera
    Zogaan, Waleed
    Mirakhorli, Mehdi
    [J]. 2018 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR), 2018, : 414 - 425
  • [7] Instrumentation of open-source software for intrusion detection
    Mahoney, William
    Sousan, William
    [J]. RUNTIME VERIFICATION, 2007, 4839 : 151 - 163
  • [8] Digital Sovereignty and Open-Source Software - A Discussion Paper
    Bechara, John
    Lechner, Ulrike
    [J]. INNOVATIONS FOR COMMUNITY SERVICES, I4CS 2024, 2024, 2109 : 397 - 407
  • [9] iBEX: Modular Open-Source Software for Digital Radiography
    Brusan, Altay
    Durmaz, F. Aytac
    Yaman, Alper
    Ozturk, Cengizhan
    [J]. JOURNAL OF DIGITAL IMAGING, 2020, 33 (03) : 708 - 721
  • [10] iBEX: Modular Open-Source Software for Digital Radiography
    Altay Brusan
    F. Aytaç Durmaz
    Alper Yaman
    Cengizhan Öztürk
    [J]. Journal of Digital Imaging, 2020, 33 : 708 - 721