Language Homogeneity in the Japanese Wikipedia

被引:0
|
作者
Skevik, Karl-Andre [1 ]
机构
[1] Inferno Nettverk AS, Forskningspk,Gaustadalleen 21, NO-0349 Oslo, Norway
来源
PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION | 2010年
关键词
wikipedia; japanese; nlp;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Wikipedia is a potentially very useful source of information, but intuitively it is difficult to have confidence in the quality of an encyclopedia that anyone can modify. One aspect of correctness is writing style, which we examine in a computer based study of the full Japanese Wikipedia. This is possible because Japanese is a language with clearly distinct writing styles using e.g., different verb forms. We find that the writing style of the Japanese Wikipedia is largely consistent with the style guidelines for the project. Exceptions appear to occur primarily in articles with a small number of changes and editors.
引用
收藏
页码:527 / 534
页数:8
相关论文
共 50 条
  • [41] Curating an Offline Wikipedia for Schools in any Language: A Road Map
    Al-Khmisy, Rashad
    Hosman, Laura
    Nova, Rachel
    International Journal of Emerging Technologies in Learning, 2023, 18 (21) : 129 - 148
  • [42] Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language
    Heilman, James M.
    West, Andrew G.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2015, 17 (03)
  • [43] A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia
    Baigutanova, Aitolkyn
    Saez-Trumper, Diego
    Redi, Miriam
    Cha, Meeyoung
    Aragon, Pablo
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 3743 - 3747
  • [44] Contraception in the German-language Wikipedia: a content and quality analysis
    Doring, Nicola
    Lehmann, Stephan
    Schumann-Doermer, Claudia
    BUNDESGESUNDHEITSBLATT-GESUNDHEITSFORSCHUNG-GESUNDHEITSSCHUTZ, 2022, 65 (06) : 706 - 717
  • [45] Wikipedia-based cross-language text classification
    Mourino Garcia, Marcos Antonio
    Perez Rodriguez, Roberto
    Anido Rifon, Luis
    INFORMATION SCIENCES, 2017, 406 : 12 - 28
  • [46] Rich Ontology Extraction and Wikipedia Expansion Using Language Resources
    Schoenberg, Christian
    Pree, Helmuth
    Freitag, Burkhard
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 151 - 156
  • [47] WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse
    Faruqui, Manaal
    Pavlick, Ellie
    Tenney, Ian
    Das, Dipanjan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 305 - 315
  • [48] Building a Japanese Typo Dataset from Wikipedia's Revision History
    Tanaka, Yu
    Murawaki, Yugo
    Kawahara, Daisuke
    Kurohashi, Sadao
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 230 - 236
  • [49] Intelligent Humanoid Robot with Japanese Wikipedia Ontology and Robot Action Ontology
    Kobayashi, Shotaro
    Tamagawa, Susumu
    Morita, Takeshi
    Yamaguchi, Takahira
    PROCEEDINGS OF THE 6TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTIONS (HRI 2011), 2011, : 417 - 424
  • [50] Populating ConceptNet knowledge base with Information Acquired from Japanese Wikipedia
    Krawczyk, Marek
    Rzepka, Rafal
    Araki, Kenji
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 2985 - 2989