Do we still Need Gold Standards for Evaluation?

被引：0

作者：

Poibeau, Thierry ^{[1
]}

Messiant, Cedric

机构：

[1] CNRS, UMR 7030, Lab Informat Paris Nord, 99 Ave Jean Baptiste Clement, F-93430 Villetaneuse, France

来源：

SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 | 2008年

关键词：

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

The availability of a huge mass of textual data in electronic format has increased the need for fast and accurate techniques for textual data processing. Machine learning and statistical approaches have been increasingly used in NLP since the 1990s, mainly because they are quick, versatile and efficient. However, despite this evolution of the field, evaluation still rely (most of the time) on a comparison between the output of a probabilistic or statistical system on the one hand, and a non-statistic, most of the time hand-crafted, gold standard on the other hand. In order to be able to compare these two sets of data, which are inherently of a different nature, it is first necessary to modify the statistical data so that they fit with the hand-crafted reference. For example, a statistical parser, instead of producing a score of grammaticality, will have to produce a binary value for each sentence (grammatical vs ungrammatical) or a tree similar to the one stored in the treebank used as a reference. In this paper, we take the example of the acquisition of subcategorization frames from corpora as a practical example. Our study is motivated by the fact that, even if a gold standard is an invaluable resource for evaluation, a gold standard is always partial and does not really show how accurate and useful results are. We describe the task (SCF acquisition) and show how it is a typical NLP task. We then very briefly describe our SCF acquisition system before discussing different issues related to the evaluation using a gold standard. Lastly, we adopt the classical distinction between intrinsic and extrinsic evaluation and show why this framework is relevant for SCF acquisition. We show that, even if intrinsic evaluation correlates with extrinsic evaluation, these two evaluation frameworks give a complementary insight on the results. In the conclusion, we quickly discuss the case of other NLP tasks.

引用

页码：547 / 552

页数：6

共 50 条

[21] Do we still need skyscrapers?
Mitchell, WJ
SCIENTIFIC AMERICAN, 1997, 277 (06) : 112 - 113
[22] Do we still need the sacraments?
De Volder, L
CONTEMPORARY SACRAMENTAL CONTOURS OF A GOOD INCARNATE, 2001, (16): : 39 - 50
[23] Do we still need CAVEs?
de Vasconcelos, Guilherme Nunes
Malard, Maria Lucia
van Stralen, Mateus
Campomori, Mauricio
de Abreu, Sandro Canavezzi
Lobosco, Tales
Gomes, Isabella Flach
Costa Lima, Lucas Duarte
ECAADE SIGRADI 2019: ARCHITECTURE IN THE AGE OF THE 4TH INDUSTRIAL REVOLUTION, VOLUME 3, 2019, : 133 - 142
[24] Do we need standards for formwork?
Concrete (London), 2 (23-24):
[25] Why do we need standards?
Loudon, Neil
PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-BRIDGE ENGINEERING, 2021, 174 (03) : 158 - 159
[26] NEED - THE IDEA WONT DO - BUT WE STILL NEED IT
CULYER, AJ
SOCIAL SCIENCE & MEDICINE, 1995, 40 (06) : 727 - 730
[27] Do we still need structural engineers?
Debney, P. M.
CURRENT PERSPECTIVES AND NEW DIRECTIONS IN MECHANICS, MODELLING AND DESIGN OF STRUCTURAL SYSTEMS, 2022, : 493 - 494
[28] Do We Still Need the Faculty System?
George, Charles
ECCLESIASTICAL LAW JOURNAL, 2020, 22 (03) : 281 - 299
[29] Do we still need deep learning?
Vasile, Cristian
JOURNAL OF EDUCATIONAL SCIENCES & PSYCHOLOGY, 2024, 14 (01): : 1 - 3
[30] Do we still need traffic engineers?
Harvey Mudd College, United States
不详
不详
ITE J, 3 (49-52):

← 1 2 3 4 5 →