From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology

被引:0
|
作者
Dingemanse, Mark [1 ]
Liesenfeld, Andreas [1 ]
机构
[1] Radboud Univ Nijmegen, Ctr Language Studies, Nijmegen, Netherlands
基金
荷兰研究理事会;
关键词
NATURAL-LANGUAGE; TURN-TAKING; REACTIVE TOKENS; ORGANIZATION; UNIVERSALS; ENGLISH; CORPUS; SPEAKERS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future.
引用
收藏
页码:5614 / 5633
页数:20
相关论文
共 11 条
  • [1] Building and curating conversational corpora for diversity-aware language science and technology
    Liesenfeld, Andreas
    Dingemanse, Mark
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1178 - 1192
  • [2] Diversity-Aware Top-k Publish/Subscribe for Text Stream
    Chen, Lisi
    Cong, Gao
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 347 - 362
  • [3] Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder
    Lakhal, Mohamed Ilyes
    Bowden, Richard
    [J]. 2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [4] Automated Context-Aware Phrase Mining from Text Corpora
    Zhang, Xue
    Li, Qinghua
    Li, Cuiping
    Chen, Hong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 20 - 36
  • [5] ConPhrase: Enhancing Context-Aware Phrase Mining From Text Corpora
    Zhang, Xue
    Li, Qinghua
    Li, Cuiping
    Chen, Hong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 6767 - 6783
  • [6] Detecting Time Expressions for Bulgarian and Slovak Language from Electronic Text Corpora
    Stoykova, Velislava
    Simkova, Maria
    Majchrakova, Daniela
    Gajdosova, Katarina
    [J]. PROCEEDINGS OF 5TH WORLD CONFERENCE ON LEARNING, TEACHING AND EDUCATIONAL LEADERSHIP, 2015, 186 : 257 - 260
  • [7] AN ITERATIVE APPROACH TO THE TERMINOLOGY EXTRACTION FROM UKRAINIAN-LANGUAGE SCIENTIFIC TEXT CORPORA
    Glybovets, A. M.
    Reshetnov, I. V.
    [J]. CYBERNETICS AND SYSTEMS ANALYSIS, 2014, 50 (06) : 866 - 873
  • [8] Working From Dominant Identity Positions: Reflections From "Diversity-Aware" White People About Their Cross-Race Work Relationships
    Crary, Marcy
    [J]. JOURNAL OF APPLIED BEHAVIORAL SCIENCE, 2017, 53 (02): : 290 - 316
  • [9] Harnessing AI to Generate Indian Sign Language from Natural Speech and Text for Digital Inclusion and Accessibility
    Yadav, Parul
    Sharma, Puneet
    Khanna, Pooja
    Chawla, Mahima
    Jain, Rishi
    Noor, Laiba
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (04) : 1129 - 1138
  • [10] Textual History as Language History? Text Categories, Corpora, Editions, and the Witness Depositions from the Salem Witch Trials
    Grund, Peter J.
    [J]. STUDIA NEOPHILOLOGICA, 2012, 84 : 40 - 54