Domain-independent Data-to-Text Generation for Open Data

被引:0
|
作者
Burgdorf, Andreas [1 ]
Barkmann, Micaela [1 ]
Pomp, Andre [1 ]
Meisen, Tobias [1 ]
机构
[1] Univ Wuppertal, Chair Technol & Management Digital Transformat, Wuppertal, Germany
关键词
Open Data; Data to Text Generation; Natural Language Generation; Transformer; Semantic Data Management;
D O I
10.5220/0011272900003269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a result of the efforts of the Open Data movements, the number of Open Data portals and the amount of data published in them is steadily increasing. An aspect that increases the utilizability of data enormously but is nevertheless often neglected is the enrichment of data with textual data documentation. However, the creation of descriptions of sufficient quality is time-consuming and thus cost-intensive. One approach to solving this problem is Data to text generation which creates descriptions to raw data. In the past, promising results were achieved on data from Wikipedia. Based on a seq2seq model developed for such purposes, we investigate whether this technique can also be applied in the Open Data domain and the associated challenges. In three studies, we reproduce the results obtained from a previous work and apply them to additional datasets with new challenges in terms of data nature and data volume. We can conclude that previous methods are not suitable to be applied in the Open Data sector without further modification, but the results still exceed our expectations and show the potential of applicability.
引用
收藏
页码:95 / 106
页数:12
相关论文
共 50 条
  • [1] Domain-independent Unsupervised Text Segmentation For Data Management
    Sakahara, Makoto
    Okada, Shogo
    Nitta, Katsumi
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 481 - 487
  • [2] Syntax and Data-to-Text Generation
    Gardent, Claire
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 3 - 20
  • [3] Data-to-text Generation with Entity Modeling
    Puduppully, Ratish
    Dong, Li
    Lapata, Mirella
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2023 - 2035
  • [4] Neural Methods for Data-to-text Generation
    Sharma, Mandar
    Gogineni, Ajay kumar
    Ramakrishnan, Naren
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (05)
  • [5] Compositional Generalization for Data-to-Text Generation
    Xul, Xinnuo
    Titov, Ivan
    Lapata, Mirella
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9299 - 9317
  • [6] Data-to-Text Generation with Style Imitation
    Lin, Shuai
    Wang, Wentao
    Yang, Zichao
    Liang, Xiaodan
    Xu, Frank F.
    Xing, Eric P.
    Hu, Zhiting
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1589 - 1598
  • [7] A Survey on Neural Data-to-Text Generation
    Lin, Yupian
    Ruan, Tong
    Liu, Jingping
    Wang, Haofen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1431 - 1449
  • [8] Data-to-text Generation with Macro Planning
    Puduppully, Ratish
    Lapata, Mirella
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 510 - 527
  • [9] Data-to-text Generation with Variational Sequential Planning
    Puduppully, Ratish
    Fu, Yao
    Lapata, Mirella
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 697 - 715
  • [10] Stylized Data-to-text Generation: A Case Study in the E-Commerce Domain
    Jing, Liqiang
    Song, Xuemeng
    Lin, Xuming
    Zhao, Zhongzhou
    Zhou, Wei
    Nie, Liqiang
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)