A framework for efficient development of Slovenian written language resources used in speech processing applications

被引:0
|
作者
Rojc, Matej [1 ]
Verdonik, Darinka [1 ]
Kacic, Zdravko [1 ]
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, Smetanova Ulica 17, Maribor 2000, Slovenia
关键词
Written language resources; Morphology lexicon; Phonetic lexicon; Heterogeneous relation graphs (HRG); Finite-state machines (FSM); Slovenian language;
D O I
10.1007/s10772-009-9032-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a framework for the efficient development and representation of morphological and phonetic lexicons, to be used in speech technology applications. Solutions that would be the most appropriate for developing speech technologies for specific language have to be analyzed when developing the lexicons. In the paper issues such as the development of resources, good word coverage in general texts, efficient coding of lexicons, representation (regarding time and memory space) and the integration of lexicons in speech processing applications are addressed. The construction process within the proposed framework is based on the use of finite-state machines and heterogeneous relation-graphs structures, and significantly reduces the time and effort needed for the construction of large-scale lexica, minimizes any analysis errors, and efficiently represents the lexicons, regarding time and memory usage. The wordlist construction process presented in the paper also guarantees that by using the constructed lexicons high word coverage is achieved in general texts. SIlex lexicons are large-scale phonetic and morphology lexicons for the Slovenian language, constructed within the new framework and with a developed toolset, and represent valuable language resources for the development of various speech processing applications for the Slovenian language.
引用
收藏
页码:121 / 141
页数:21
相关论文
共 50 条
  • [1] Spoken language resources for Cantonese speech processing
    Lee, T
    Lo, WK
    Ching, PC
    Meng, H
    [J]. SPEECH COMMUNICATION, 2002, 36 (3-4) : 327 - 342
  • [2] On the Development of Speech Resources for the Mixtec Language
    Caballero-Morales, Santiago-Omar
    [J]. SCIENTIFIC WORLD JOURNAL, 2013,
  • [3] Lexicon development for speech and language processing
    Litkowski, K
    [J]. COMPUTATIONAL LINGUISTICS, 2001, 27 (03) : 457 - 458
  • [4] Optimization Algorithms and Applications for Speech and Language Processing
    Wright, Stephen J.
    Kanevsky, Dimitri
    Deng, Li
    He, Xiaodong
    Heigold, Georg
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (11): : 2231 - 2243
  • [5] Galician DBpedia: resources and applications in language processing
    Solla Portela, Miguel Anxo
    Gomez Guinovart, Xavier
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2016, (57): : 139 - 142
  • [6] Experiences with Shared Resources for Research and Education in Speech and Language Processing
    Bates, Rebecca
    Fosler-Lussier, Eric
    Metze, Florian
    Larson', Martha
    Levow, Gina-Anne
    Provost, Emily Mower
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1627 - 1631
  • [7] Language resources and CALL applications: speech data and speech technology in the DISCO project
    Strik, Helmer
    Colpaert, Jozef
    van Doremalen, Joost
    Cucchiarini, Catia
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : B1 - B6
  • [8] Development of Language Resources for Speech Application in Gujarati and Marathi
    Madhavi, Maulik C.
    Sharma, Shubham
    Patil, Hemant A.
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 115 - 118
  • [9] APPLICATIONS OF DYNAMIC-PROGRAMMING TO SPEECH AND LANGUAGE PROCESSING
    LEE, CH
    [J]. AT&T TECHNICAL JOURNAL, 1989, 68 (03): : 114 - 130
  • [10] Applications of neural networks in speech processing for Romanian language
    Gavat, I
    [J]. 2002 6TH SEMINAR ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING, PROCEEDINGS, 2002, : 65 - 70