Structured Data Extraction from Emails

被引:0
|
作者
Mahlawi, Ashraf Q. [1 ]
Sasi, Sreela [1 ]
机构
[1] Gannon Univ, Dept Comp & Informat Sci, Erie, PA 16541 USA
关键词
NLP; Text mining; Email summarization; Structured data extraction; Knowledge extraction;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Structured data, typically, is predefined data. Semi-structured and unstructured data are not predefined data that includes documents, emails, social media posts, images, videos, etc. In this research, a novel process is presented to extract structured data from emails about a domain such as on a project or product. This process consists of three phases: data cleaning, data extraction, and data consolidation. Data cleaning is done by validating the format for each email. Data extraction consists of keyword extraction, sentiment analysis, regular expression, entity extraction and summary extraction. Data consolidation is used to combine the extracted data to obtain structured data from emails. This will make the knowledge extraction process easy to manage and analyze. In large industries, it is better to consolidate all the emails regarding a project/product as one document using this process for later use. This solution will facilitate better decision-making.
引用
收藏
页码:323 / 328
页数:6
相关论文
共 50 条
  • [1] Keyword extraction from emails
    Lahiri, S.
    Mihalcea, R.
    Lai, P. -H.
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (02) : 295 - 317
  • [2] Large-Scale Information Extraction from Emails with Data Constraints
    Gupta, Rajeev
    Kondapally, Ranganath
    Guha, Siddharth
    [J]. BIG DATA ANALYTICS (BDA 2019), 2019, 11932 : 124 - 139
  • [3] Title extraction from Loosely Structured Data Records
    Wu, Yi-Pu
    Zhang, Xue-Jie
    Li, Qing
    Chen, Jing
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2623 - +
  • [4] DBpedia and the live extraction of structured data from Wikipedia
    Morsey, Mohamed
    Lehmann, Jens
    Auer, Soeren
    Stadler, Claus
    Hellmann, Sebastian
    [J]. PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2012, 46 (02) : 157 - 181
  • [5] Extraction of Failure Graphs from Structured and Unstructured data
    Schierle, Martin
    Trabold, Daniel
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 324 - 330
  • [6] Building a Dataset for Summarization and Keyword Extraction from Emails
    Loza, Vanessa
    Lahiri, Shibamouli
    Mihalcea, Rada
    Lai, Po-Hsiang
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2441 - 2446
  • [7] THE EXTRACTION OF LINE-STRUCTURED DATA FROM ENGINEERING DRAWINGS
    CLEMENT, TP
    [J]. PATTERN RECOGNITION, 1981, 14 (1-6) : 43 - 52
  • [8] Interactive tuples extraction from semi-structured data
    Gilleron, Remi
    Marty, Patrick
    Tommasi, Marc
    Torre, Fabien
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 997 - 1004
  • [9] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    [J]. INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [10] Web Service for Data Extraction from Semi-structured Data Sources
    Yashina, Marina V.
    Nakonechnyy, Ivan I.
    [J]. PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON DEPENDABILITY AND COMPLEX SYSTEMS DEPCOS-RELCOMEX, 2014, 286 : 499 - 510