Stable Classification of Text Genres

被引:27
|
作者
Petrenz, Philipp [1 ]
Webber, Bonnie [1 ]
机构
[1] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
D O I
10.1162/COLI_a_00052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Every text has at least one topic and at least one genre. Evidence for a text's topic and genre comes, in part, from its lexical and syntactic features-features used in both Automatic Topic Classification and Automatic Genre Classification (AGC). Because an ideal AGC system should be stable in the face of changes in topic distribution, we assess five previously published AGC methods with respect to both performance on the same topic-genre distribution on which they were trained and stability of that performance across changes in topic-genre distribution. Our experiments lead us to conclude that (1) stability in the face of changing topical distributions should be added to the evaluation critera for new approaches to AGC, and (2) part-of-speech features should be considered individually when developing a high-performing, stable AGC system for a particular, possibly changing corpus.
引用
收藏
页码:385 / 393
页数:9
相关论文
共 50 条
  • [41] THE ROLE OF TEXT GENRES OFFER AND AUTHORISATION/APPROVAL IN MANAGEMENT COMMUNICATION
    Jurin, Suzana
    [J]. TOURISM AND HOSPITALITY MANAGEMENT-CROATIA, 2011, 17 (02): : 251 - 265
  • [42] Sequencers in different text genres: Academic writing, journalese and fiction
    Hempel, Susanne
    Degand, Liesbeth
    [J]. JOURNAL OF PRAGMATICS, 2008, 40 (04) : 676 - 693
  • [43] Clockwork Genres: Temperance and the Articulated Text in Late Medieval France
    Singer, Julie
    [J]. EXEMPLARIA-A JOURNAL OF THEORY IN MEDIEVAL AND RENAISSANCE STUDIES, 2009, 21 (03): : 225 - 246
  • [44] Approach to text genres in the Portuguese language Enem's questions
    Lima Gondim, Ana Angelica
    Silva, Meire Celedonio
    [J]. LINHA D AGUA, 2023, 36 (02): : 5 - 25
  • [45] THE CONDITIONS OF PRODUCTION OF TEXT GENRES IN EDUCATIONAL BOOKS OF MIDDLE SCHOOL
    de Magalhaes, Fernanda Pizarro
    [J]. REVISTA VIRTUAL DE ESTUDOS DA LINGUAGEM-REVEL, 2009, 7 (13):
  • [46] CONTEXT IN TEXT - THE DEVELOPMENT OF ORAL AND WRITTEN LANGUAGE IN 2 GENRES
    PELLEGRINI, AD
    GALDA, L
    RUBIN, DL
    [J]. CHILD DEVELOPMENT, 1984, 55 (04) : 1549 - 1555
  • [47] Exploring Topics and Genres in Storytime Books: A Text Mining Approach
    Joo, Soohyung
    Ingram, Erin
    Cahill, Maria
    [J]. EVIDENCE BASED LIBRARY AND INFORMATION PRACTICE, 2021, 16 (04): : 41 - 62
  • [48] Text Generation for Imbalanced Text Classification
    Akkaradamrongrat, Suphamongkol
    Kachamas, Pornpimon
    Sinthupinyo, Sukree
    [J]. 2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019), 2019, : 181 - 186
  • [49] Discovering genres of online discussion threads via text mining
    Lin, Fu-Ren
    Hsieh, Lu-Shih
    Chuang, Fu-Tai
    [J]. COMPUTERS & EDUCATION, 2009, 52 (02) : 481 - 495
  • [50] Empirical Text Analysis for Identifying the Genres of Bengali Literary Work
    Afroze, Ayesha
    Dutta, Kishowloy
    Sadik, Sadman
    Khanam, Sadia
    Rab, Raqeebir
    Rahim, Mohammad Asifur
    [J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (05) : 602 - 613