Representation of structured data of the text genre as a technique for automatic text processing

被引:0
|
作者
Fonseca, Claudia Aparecida [1 ]
Carvalho Guelpeli, Marcus Vinicius [2 ]
de Souza Netto, Rafael Santiago [3 ]
机构
[1] Univ Fed Vales Jequitinhonha & Mucuri, Dept Letras, Diamantina, MG, Brazil
[2] Univ Fed Vales Jequitinhonha & Mucuri, Dept Sistema Informacao, Diamantina, MG, Brazil
[3] Ctr Univ Barra Mansa, Dept Ciencia Comp, Barra Mansa, Rio De Janeiro, Brazil
来源
关键词
Corpus linguistics; Natural language processing; Scientific article; Text genre; Corpora annotation;
D O I
10.35699/1983-3652.2021.35445
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The present article was developed in the field of Natural Language Processing and Language Studies based on a corpus compiled by computational tools. This study is based on the assumption that it is helpful to trace a close relationship between corpus generation/annotation and the assessment of the constitutive elements of the text genre source. It aims to demonstrate, through specific studies of structured data from the text genre 'scientific article', alternatives to automatic text processing techniques. In order to reach the intended goal, the authors created a computational model for the compilation of a linguistic, specialized Corpus, representative of the genre Scientific Article -CorpACE. The object of study includes the constitutive elements of scientific articles, marked in XML, extracted and collected from the SciELO-Scientific Electronic Library On-line database. The final product was a database obtained with information extracted and structured in XML format, which designates and identifies the markups of the genre being analyzed and is available for many tools and applications. The results demonstrate how the representation of constitutive elements of the genre can condense available information with hierarchical and dynamic processes built during the compilation. At the end of the study, it is believed that more research will be required for bringing Language Science and Computer Science closer with emphasis on NLP in the attempt to represent and manipulate linguistic knowledge in its many levels - morphological, syntactic, semantic and discursive - in order to improve implementation and manipulation of automatic text processing.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] The performance of BERT as data representation of text clustering
    Alvin Subakti
    Hendri Murfi
    Nora Hariadi
    [J]. Journal of Big Data, 9
  • [42] The performance of BERT as data representation of text clustering
    Subakti, Alvin
    Murfi, Hendri
    Hariadi, Nora
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [43] Manually structured digital abstracts: A scaffold for automatic text mining
    Seringhaus, Michael
    Gerstein, Mark
    [J]. FEBS LETTERS, 2008, 582 (08) : 1170 - 1170
  • [44] FULL TEXT RETRIEVAL FROM STRUCTURED TEXT
    GOLDSTEIN, CM
    [J]. BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1989, 15 (06): : 11 - 11
  • [45] Text Genre Classification Research
    Xu, Zhijuan
    Liu, Lizhen
    Song, Wei
    Du, Chao
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (IEEE CITS), 2017, : 176 - 179
  • [46] Text genre and kanji frequency
    Long, Eric y
    Yokoyama, Shoichi
    [J]. GLOTTOMETRICS, 2005, 10 : 55 - 72
  • [47] FLOW, GENRE, AND THE TELEVISION TEXT
    WALLER, GA
    [J]. JOURNAL OF POPULAR FILM AND TELEVISION, 1988, 16 (01) : 6 - 11
  • [48] NEWS TEXT: GENRE TRANSFORMATION
    Volkova, Yana A.
    Panchenko, Nadezhda N.
    [J]. VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2021, 20 (02): : 121 - 132
  • [49] EXPLORING LIST OF MARKERS IN UNSTRUCTURED TEXT AUTOMATIC PROCESSING
    Petic, Mircea
    Cojocaru, Svetlana
    Gisca, Veronica
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2015, 2015, : 125 - 136
  • [50] ROETEXT - SYSTEM FOR AUTOMATIC TEXT PROCESSING IN DIAGNOSTIC RADIOLOGY
    NOVAK, D
    [J]. RADIOLOGE, 1974, 14 (06): : 277 - 285