Text Incoherence, or Some Pitfalls of Automatic Text Processing

被引:0
|
作者
Inkova, O. Yu [1 ,2 ]
机构
[1] Univ Geneva, Geneva, Switzerland
[2] Russian Acad Sci, Inst Informat Problems, Moscow, Russia
关键词
text coherence; semantics; automatic text processing; rhetorical relations; frame expressions;
D O I
10.17223/19986645/74/5
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The article discusses the little-studied aspects of analysis of framework expressions, i.e. units that divide information into semantic blocks interpreted according to certain criteria (temporal, spatial, or communicative) set by framework expressions thus ensuring text coherence. The analysis is based on examples from the Russian National Corpus. Unlike most studies using the ascending approach to text coherence, i.e. the integration of minimal discourse units higher-level units, such as in the Rhetorical Structure Theory, this article uses a descending approach, analyzing, on the contrary, the segmentation of the text into smaller units. This approach proves its productivity by allowing to show that texts have not only signals of coherence, but also signals of discreteness, warning that there is no direct connection between the previous and the subsequent context or that there is no such connection at all. Frame expressions function as those signals. The article, without claiming to be exhaustive, raises questions that arise while describing framework expressions; the author gives answers to some of them. First of all, semantic and functional properties of framework expressions are described: 1) weak syntactic dependence on the predication (peripheral syntactic position), 2) topic status, 3) certain semantics features. The author then analyzes how scopes of frame expressions interact and gives a number of possible configurations: 1) the state of affairs q is integrated into the U-1 frame that opens with the state of affairs p and remains open; 2) the state of affairs q is integrated into the U-2 frame that closes the U-1 frame; 3) the state of affairs q is integrated into the U-2, a frame that is subordinate to the U-1 frame. Finally, the author analyzes the interaction of frame expressions with connectives as logical-semantic relations markers. This interaction is manifested in the fact that the frame defines the boundaries of text spans between which this relation is established. The final section shows, using four resources as examples (Penn Discourse Treebank, Supra-corpora database of connectives, RST Discourse Treebank, ANNODIS) the way in which frame expressions are used in the text annotation process. Establishing the heterogeneity of the linguistic units ensuring text coherence, the author concludes that each category of these units should be annotated separately and only in that case the mechanisms of their interaction can be shown. The results obtained can be used to study the discourse structure and in text annotation.
引用
收藏
页码:81 / 98
页数:18
相关论文
共 50 条
  • [1] Automatic Processing of Arabic Text
    Osman, Ziad
    Hamandi, Lama
    Zantout, Rached
    Sibai, Fadi N.
    [J]. 2009 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2009, : 6 - +
  • [2] Automatic Text processing for Spanish Texts
    Daniela Lopez de Luise, M.
    Sciffer, Mariana
    [J]. CERMA 2008: ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE, PROCEEDINGS, 2008, : 74 - 79
  • [3] AUTOMATIC INTERLINEAR TEXT-PROCESSING
    BENEDEK, D
    [J]. JOURNAL OF ENGLISH LINGUISTICS, 1989, 22 (01) : 40 - 46
  • [4] AUTOMATIC TEXT ANALYSIS FOR MACHINE PROCESSING
    BATORI, I
    [J]. NACHRICHTEN FUR DOKUMENTATION, 1969, 20 (02): : 92 - &
  • [5] Text segmentation for automatic document processing
    Mital, DP
    Leng, GW
    [J]. ETFA '96 - 1996 IEEE CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, PROCEEDINGS, VOLS 1 AND 2, 1996, : 642 - 648
  • [6] Text segmentation for automatic document processing
    Mital, DP
    Leng, GW
    [J]. JOURNAL OF MICROCOMPUTER APPLICATIONS, 1995, 18 (04): : 375 - 392
  • [7] Representation of structured data of the text genre as a technique for automatic text processing
    Fonseca, Claudia Aparecida
    Carvalho Guelpeli, Marcus Vinicius
    de Souza Netto, Rafael Santiago
    [J]. TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2022, 15
  • [8] AUTOMATIC TEXT-PROCESSING - SALTON,G
    BOOKSTEIN, A
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (01) : 188 - 189
  • [9] Automatic Text Summarization in Natural Language Processing
    Desai, M. R.
    Gachhinakatti, Bhagyashree
    Balaganur, Pooja
    Rajeshwari, Y.
    Rathod, Laxmi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MOBILE NETWORKS AND WIRELESS COMMUNICATIONS (ICMNWC), 2021,
  • [10] AUTOMATIC TEXT ANALYSIS FOR MACHINE INFORMATION PROCESSING
    BATORI, I
    [J]. NACHRICHTEN FUR DOKUMENTATION, 1969, 20 (03): : 123 - &