Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language

被引:10
|
作者
Verdonik, Darinka
Rojc, Matej
Stabej, Marko
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, Maribor 2000, Slovenia
[2] Univ Ljubljana, Fac Arts, Ljubljana, Slovenia
关键词
discourse markers; speech corpora; annotating; conversation; discourse analysis; speech-to-speech translation; spontaneous speech; Slovenian language;
D O I
10.1007/s10579-007-9035-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech-to-speech translation technology has difficulties processing elements of spontaneity in conversation. We propose a discourse marker attribute in speech corpora to help overcome some of these problems. There have already been some attempts to annotate discourse markers in speech corpora. However, as there is no consistency on what expressions count as discourse markers, we have to reconsider how to set a framework for annotating, and, in order to better understand what we gain by introducing a discourse marker category, we have to analyse their characteristics and functions in discourse. This is especially important for languages such as Slovenian where no or little research on the topic of discourse markers has been carried out. The aims of this paper are to present a scheme for annotating discourse markers based on the analysis of a corpus of telephone conversations in the tourism domain in the Slovenian language, and to give some additional arguments based on the characteristics and functions of discourse markers that confirm their special status in conversation.
引用
收藏
页码:147 / 180
页数:34
相关论文
共 50 条
  • [1] Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language
    Darinka Verdonik
    Matej Rojc
    Marko Stabej
    [J]. Language Resources and Evaluation, 2007, 41 : 147 - 180
  • [2] CONNECTIVES AND OTHER DISCOURSE MARKERS IN WRITTEN LANGUAGE AND SPONTANEOUS SPEECH
    Hrzica, Gordana
    Kosutar, Sara
    Posavec, Kristina
    [J]. FLUMINENSIA, 2021, 33 (01): : 25 - 52
  • [3] Toolbox for annotating spontaneous speech corpora (Computational Linguistics Lab - UAM)
    Moreno Sandoval, Antonio
    Guirao Miras, Jose Ma.
    Torre Toledano, Doroteo
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 301 - 302
  • [4] Annotating structural constraints in discourse corpora
    Sassen, C
    Kühnlein, P
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 435 - 442
  • [5] Annotating the meaning of discourse connectives in multilingual corpora
    Zufferey, Sandrine
    Degand, Liesbeth
    [J]. CORPUS LINGUISTICS AND LINGUISTIC THEORY, 2017, 13 (02) : 399 - 422
  • [6] War in Slovenian and Polish language corpora
    Bednarska, Katarzyna
    [J]. CONTRIBUTIONS TO THE 21ST ANNUAL SCIENTIFIC CONFERENCE OF THE ASSOCIATION OF SLAVISTS (POLYSLAV), 2018, 64 : 12 - 19
  • [7] Spontaneous Speech Corpora for language learners of Spanish, Chinese and Japanese
    Moreno-Sandoval, Antonio
    Campillos, Leonardo
    Dong, Yang
    Takamori, Emi
    Guirao, Jose M.
    Gozalo, Paula
    Kimura, Chieko
    Matsui, Kengo
    Garrote, Marta
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2695 - 2701
  • [8] A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora
    de Oliveira, Aillkeen Bezerra
    Baptista, Claudio de Souza
    Firmino, Anderson Almeida
    de Paiva, Anselmo Cardoso
    [J]. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1461 - 1468
  • [9] Using ELAN for annotating sign language corpora in a team setting
    Crasborn, Onno
    Sloetjes, Han
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : A61 - A64
  • [10] Discourse markers in spontaneous speech: Oh what a difference an oh makes
    Tree, JEF
    Schrock, JC
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 1999, 40 (02) : 280 - 295