Information Extraction Challenges in Managing Unstructured Data

被引:0
|
作者
Doan, AnHai [1 ]
Naughton, Jeffrey F. [1 ]
Ramakrishnan, Raghu [1 ]
Baid, Akanksha [1 ]
Chai, Xiaoyong [1 ]
Chen, Fei [1 ]
Chen, Ting [1 ]
Chu, Eric [1 ]
DeRose, Pedro [1 ]
Gao, Byron [1 ]
Gokhale, Chaitanya [1 ]
Huang, Jiansheng [1 ]
Shen, Warren [1 ]
Vuong, Ba-Quy [1 ]
机构
[1] Univ Wisconsin, Madison, WI 53706 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past few years, we have been trying to build an end-to-end system at Wisconsin to manage unstructured data, using extraction, integration, and user interaction. This paper describes the key information extraction (IE) challenges that we have run into, and sketches our solutions. We discuss in particular developing a declarative IE language, optimizing for this language, generating IE provenance, incorporating user feedback into the IE process, developing a novel wiki-based user interface for feedback, best-effort IE, pushing IE into RDBMSs, and more. Our work suggests that IE in managing unstructured data can open up many interesting research challenges, and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community.
引用
收藏
页码:14 / 20
页数:7
相关论文
共 50 条
  • [22] Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
    Michelson, Matthew
    Knoblock, Craig A.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 10 (3-4) : 211 - 226
  • [23] Managing Unstructured Information and Knowledge Flow in Knowledge Work Team
    Lee, Rongbin
    Cheung, Benny
    Wang, Y.
    PROCEEDINGS OF THE 2ND EUROPEAN CONFERENCE ON INTELLECTUAL CAPITAL, 2010, : 355 - 361
  • [24] Role and Challenges of Unstructured Big Data in Healthcare
    Adnan, Kiran
    Akbar, Rehan
    Khor, Siak Wang
    Ali, Adnan Bin Amanat
    DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2019, VOL 1, 2020, 1042 : 301 - 323
  • [25] Knowledge-based extraction of intellectual capital-related information from unstructured data
    Tsui, Eric
    Wang, W. M.
    Cai, Linlin
    Cheung, C. F.
    Lee, W. B.
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) : 1315 - 1325
  • [26] An Approach for Analyzing Unstructured Text Data Using Topic Modeling Techniques for Efficient Information Extraction
    Zadgaonkar, Ashwini
    Agrawal, Avinash J.
    NEW GENERATION COMPUTING, 2024, 42 (01) : 109 - 134
  • [27] An Approach for Analyzing Unstructured Text Data Using Topic Modeling Techniques for Efficient Information Extraction
    Ashwini Zadgaonkar
    Avinash J. Agrawal
    New Generation Computing, 2024, 42 : 109 - 134
  • [28] Toward Semi-autonomous Information Extraction for Unstructured Maintenance Data in Root Cause Analysis
    Sharp, Michael
    Sexton, Thurston
    Brundage, Michael P.
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: THE PATH TO INTELLIGENT, COLLABORATIVE AND SUSTAINABLE MANUFACTURING, 2017, 513 : 425 - 432
  • [29] Terms Extraction from Unstructured Data Silos
    Lomotey, Richard K.
    Deters, Ralph
    2013 8TH INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE), 2013, : 19 - 24
  • [30] Challenges in Managing Real-Time Data in Health Information System (HIS)
    Akhtar, Usman
    Khattak, Asad Masood
    Lee, Sungyoung
    INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 305 - 313