The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions

被引:27
|
作者
Futrell, Richard [1 ]
Gibson, Edward [2 ]
Tily, Harry J. [3 ]
Blank, Idan [4 ]
Vishnevetsky, Anastasia [2 ]
Piantadosi, Steven T. [5 ]
Fedorenko, Evelina [2 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
[2] MIT, Cambridge, MA 02139 USA
[3] Viome Inc, Seattle, WA USA
[4] Univ Calif Los Angeles, Los Angeles, CA USA
[5] Univ Calif Berkeley, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Cognitive modeling; Reading time; Psycholinguistics; PREDICTABILITY; FREQUENCY; WORDS; TRACKING;
D O I
10.1007/s10579-020-09503-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data.
引用
收藏
页码:63 / 77
页数:15
相关论文
共 11 条
  • [1] The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
    Richard Futrell
    Edward Gibson
    Harry J. Tily
    Idan Blank
    Anastasia Vishnevetsky
    Steven T. Piantadosi
    Evelina Fedorenko
    Language Resources and Evaluation, 2021, 55 : 63 - 77
  • [2] Reading-time annotations for balanced corpus of contemporary written Japanese national institute for the humanities, Japan
    Asahara, Masayuki
    Ono, Hajime
    Miyamoto, Edson T.
    COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers, 2016, : 684 - 694
  • [3] Syntactic complexity in legal translated texts and the use of plain English: a corpus-based study
    Lin, Xiaowen
    Afzaal, Muhammad
    Aldayel, Hessah Saleh
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2023, 10 (01):
  • [4] Syntactic complexity in legal translated texts and the use of plain English: a corpus-based study
    Xiaowen Lin
    Muhammad Afzaal
    Hessah Saleh Aldayel
    Humanities and Social Sciences Communications, 10
  • [5] The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts
    Hollenstein, Nora
    Barrett, Maria
    Bjornsdottir, Marina
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1712 - 1720
  • [6] READABILITY OF SYNTACTIC CONSTRUCTIONS IN TEXTS FOR READING OF BASIC (RUSSIAN) STATE EXAMINATION IN ENGLISH (BASED ON EXPERIMENTAL DATA)
    Varlamova, Elena V.
    Safonkina, Olga S.
    Ilyasova, Liliya G.
    12TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED), 2018, : 5104 - 5111
  • [7] Functional Types of Lexical Bundles in Reading Texts of Malaysian University English Test: A Corpus Study
    Beng, Christina Ong Sook
    Keong, Yuen Chee
    GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2015, 15 (01): : 77 - 90
  • [8] A Corpus Study of English Language Exam Texts: Vocabulary Difficulty and the Impact on Students' Wider Reading (or Should Students be Reading More Texts by Dead White Men?)
    Jennings, Beverley
    Powell, Daisy
    Jaworska, Sylvia
    Joseph, Holly
    JOURNAL OF ADOLESCENT & ADULT LITERACY, 2024, 67 (05) : 303 - 316
  • [9] THE SYNTACTIC STRUCTURE OF ENGLISH-TEXTS - A COMPUTER-BASED STUDY OF 4 KINDS OF TEXT IN THE BROWN-UNIVERSITY CORPUS - ELLEGARD,A
    BOURQUIN, G
    ETUDES ANGLAISES, 1980, 33 (03): : 337 - 338