Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

被引:0
|
作者
Maeda, Kazuaki [1 ]
Lee, Haejoong [1 ]
Medero, Shawn [1 ]
Medero, Julie [1 ]
Parker, Robert [1 ]
Strassel, Stephanie [1 ]
机构
[1] Univ Penn, Linguist Data Consortium, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The Linguistic Data Consortium (LDC) creates a variety of linguistic resources - data, annotations, tools, standards and best practices - for many sponsored projects. The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for these projects, creating tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and work flow management. This paper introduces a number of samples of LDC programming staff's work, with particular focus on the recent additions and updates to the suite of software tools developed by LDC. Tools introduced include the GScout Web Data Scouting Tool, LDC Data Selection Toolkit, ACK - Annotation Collection Kit, XTrans Transcription and Speech Annotation Tool, GALE Distillation Toolkit, and the GALE MT Post Editing Work flow Management System.
引用
收藏
页码:3052 / 3056
页数:5
相关论文
共 50 条
  • [41] THOUGHTS ON LARGE-SCALE PROGRAMMING PROJECTS
    MCLAUGHLIN, R
    SIGPLAN NOTICES, 1991, 26 (08): : 86 - 89
  • [42] Accessing and Mining Data from Large-Scale Mouse Phenotyping Projects
    Morgan, Hugh
    Simon, Michelle
    Mallon, Ann-Marie
    BIOINFORMATICS OF BEHAVIOR: PT 2, 2012, 104 : 47 - 70
  • [43] Data management strategies for multinational large-scale systems biology projects
    Wruck, Wasco
    Peuker, Martin
    Regenbrecht, Christian R. A.
    BRIEFINGS IN BIOINFORMATICS, 2014, 15 (01) : 65 - 78
  • [44] On a Game of Large-Scale Projects Competition
    Nikonov, Oleg I.
    Medvedeva, Marina A.
    NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS 1 AND 2, 2009, 1168 : 982 - 986
  • [45] Holistic Inter-Annotator Agreement and Corpus Coherence Estimation in a Large-scale Multilingual Annotation Campaign
    Stefanovitch, Nicolas
    Piskorski, Jakub
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 71 - 86
  • [46] STEM: A software tool for large-scale proteomic data analyses
    Shinkawa, T
    Taoka, M
    Yamauchi, Y
    Ichimura, T
    Kaji, H
    Takahashi, N
    Isobe, T
    JOURNAL OF PROTEOME RESEARCH, 2005, 4 (05) : 1826 - 1831
  • [47] Optimal Phasing and Inventory Decisions for Large-Scale Residential Development Projects
    Steven H. Ott
    W. Keener Hughen
    Dustin C. Read
    The Journal of Real Estate Finance and Economics, 2012, 45 : 888 - 918
  • [48] Large-scale hydroelectric projects and mountain development on the upper Yangtze River
    Yao Yonghui
    Zhang Baiping
    Ma Xiaoding
    Ma Peng
    MOUNTAIN RESEARCH AND DEVELOPMENT, 2006, 26 (02) : 109 - 114
  • [49] Team-external coordination in large-scale software development projects
    Sablis, Aivars
    Smite, Darja
    Moe, Nils
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2021, 33 (03)
  • [50] BIG: a large-scale data integration tool for renal physiology
    Zhao, Yue
    Yang, Chin-Rang
    Raghuram, Viswanathan
    Parulekar, Jaya
    Knepper, Mark A.
    AMERICAN JOURNAL OF PHYSIOLOGY-RENAL PHYSIOLOGY, 2016, 311 (04) : F787 - F792