Do Long-Range Language Models Actually Use Long-Range Context?

被引：0

作者：

Sun, Simeng ^{[1
]}

Krishna, Kalpesh ^{[1
]}

Mattarella-Micke, Andrew ^{[2
]}

Iyyer, Mohit ^{[1
]}

机构：

[1] Univ Massachusetts Amherst, Amherst, MA 01003 USA

[2] Intuit AI, Mountain View, CA USA

来源：

2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models, which can process much longer sequences than models of the past. However, the ways in which such models take advantage of the long-range context remain unclear. In this paper, we perform a fine-grained analysis of two long-range Transformer language models (including the Routing Transformer, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens. Our results reveal that providing long-range context (i.e., beyond the previous 2K tokens) to these models only improves their predictions on a small set of tokens (e.g., those that can be copied from the distant context) and does not help at all for sentence-level prediction tasks. Finally, we discover that PG-19 contains a variety of different document types and domains, and that long-range context helps most for literary novels (as opposed to textbooks or magazines).

引用

页码：807 / 822

页数：16

共 50 条

[1] ABSENCE OF LONG-RANGE ORDER WITH LONG-RANGE POTENTIALS
BAUS, M
[J]. JOURNAL OF STATISTICAL PHYSICS, 1980, 22 (01) : 111 - 119
[2] Verifying the Long-range Dependency of RNN Language Models
Tseng, Tzu-Hsuan
Yang, Tzu-Hsuan
Chen, Chia-Ping
[J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 75 - 78
[3] CHAPTERBREAK: A Challenge Dataset for Long-Range Language Models
Sun, Simeng
Thai, Katherine
Iyyer, Mohit
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3704 - 3714
[4] Long-range and very long-range charge transport in DNA
Bixon, M
Jortner, J
[J]. CHEMICAL PHYSICS, 2002, 281 (2-3) : 393 - 408
[5] MULTIPERIPHERAL MODELS WITH LONG-RANGE CORRELATIONS
STEINHOFF, J
[J]. NUCLEAR PHYSICS B, 1974, B 82 (03) : 461 - 476
[6] Correlation models with long-range dependence
Ma, CS
[J]. JOURNAL OF APPLIED PROBABILITY, 2002, 39 (02) : 370 - 382
[7] Long-Range Correlation Underlying Childhood Language and Generative Models
Tanaka-Ishii, Kumiko
[J]. FRONTIERS IN PSYCHOLOGY, 2018, 9
[8] LONG-RANGE PLANNING FOR COMPUTER USE
GRADY, MT
[J]. EDUCATIONAL LEADERSHIP, 1983, 40 (08) : 16 - 19
[9] LONG-RANGE GOALS AND LANGUAGE USE - RESULTS OF A STUDENT SURVEY
OCONNOR, P
[J]. FOREIGN LANGUAGE ANNALS, 1977, 10 (02) : 137 - 144
[10] Correlations, long-range entanglement, and dynamics in long-range Kitaev chains
Francica, Gianluca
Dell'Anna, Luca
[J]. PHYSICAL REVIEW B, 2022, 106 (15)

← 1 2 3 4 5 →