Information Extraction Challenges in Managing Unstructured Data

被引:0
|
作者
Doan, AnHai [1 ]
Naughton, Jeffrey F. [1 ]
Ramakrishnan, Raghu [1 ]
Baid, Akanksha [1 ]
Chai, Xiaoyong [1 ]
Chen, Fei [1 ]
Chen, Ting [1 ]
Chu, Eric [1 ]
DeRose, Pedro [1 ]
Gao, Byron [1 ]
Gokhale, Chaitanya [1 ]
Huang, Jiansheng [1 ]
Shen, Warren [1 ]
Vuong, Ba-Quy [1 ]
机构
[1] Univ Wisconsin, Madison, WI 53706 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wiki-based user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community.
引用
收藏
页码:14 / 20
页数:7
相关论文
共 50 条
  • [41] Ge(o)Lo(cator): Geographic Information Extraction from Unstructured Text Data and Web Documents
    Nesi, Paolo
    Pantaleo, Gianni
    Tenti, Marco
    2014 9TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP), 2014, : 60 - 65
  • [42] Analytics in Industry 4.0: Investigating the Challenges of Unstructured Data
    Moehring, Michael
    Keller, Barbara
    Schmidt, Rainer
    Schoenitz, Fabian
    Mohr, Frederik
    Scheuerle, Max
    PERSPECTIVES IN BUSINESS INFORMATICS RESEARCH, BIR 2022, 2022, 462 : 113 - 125
  • [43] Unstructured data extraction of Chinese expert web page
    Hong, Xudong
    Shen, Tao
    Shen, Longhua
    Yu, Zhengtao
    Guo, Jianyi
    International Journal of Wireless and Mobile Computing, 2014, 7 (02) : 132 - 136
  • [44] HTNSystem: Hypertension information extraction system for unstructured clinical notes
    Jonnagaddala, Jitendra
    Liaw, Siaw-Teng
    Ray, Pradeep
    Kumar, Manish
    Dai, Hong-Jie
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8916 : 219 - 227
  • [45] GeoXTag: Relative Spatial Information Extraction and Tagging of Unstructured Text
    Syed, Mehtab Alam
    Arsevska, Elena
    Roche, Mathieu
    Teisseire, Maguelonne
    25TH AGILE CONFERENCE ON GEOGRAPHIC INFORMATION SCIENCE ARTIFICIAL INTELLIGENCE IN THE SERVICE OF GEOSPATIAL TECHNOLOGIES, 2022, 3
  • [46] A System for Medical Information Extraction and Verification from Unstructured Text
    Juric, Damir
    Stoilos, Giorgos
    Melo, Andre
    Moore, Jonathan
    Khodadadi, Mohammad
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13314 - 13319
  • [47] Extraction of Failure Graphs from Structured and Unstructured data
    Schierle, Martin
    Trabold, Daniel
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 324 - 330
  • [48] Challenges of Managing Information Quality in Service Organizations
    Tsiakali, Konstantina
    INTERNATIONAL JOURNAL OF CONTEMPORARY HOSPITALITY MANAGEMENT, 2009, 21 (4-5) : 627 - 629
  • [49] Managing information in law firms: changes and challenges
    Evans, Nina
    Price, James
    INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2017, 22 (01):
  • [50] Introduction to the special issue on managing information extraction
    University of Wisconsin, United States
    不详
    不详
    不详
    SIGMOD Rec., 2008, 4 (5-6): : 5 - 6