Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques

被引:4
|
作者
Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
机构
[1] China University of Geosciences,School of Geography and Information Engineering
[2] National Engineering Research Center of Geographic Information System,undefined
来源
Earth Science Informatics | 2020年 / 13卷
关键词
Geoscience document; Knowledge graph; Geological text mining; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
A large number of georeferenced quantitative data about rock and geoscience surveys are buried in geological documents and remain unused. Data analytics and information extraction offer opportunities to use this data for improved understanding of ore forming processes and to enhance our knowledge. Extracting spatiotemporal and semantic information from a set of geological documents enables us to develop a rich representation of the geoscience knowledge recorded in unstructured text written in Chinese. This paper presents the workflow for spatiotemporal and semantic information extraction, which is a geological document analysis approach that uses automated techniques for browsing and searching relevant geological content. The developed workflow applies spatial and temporal gazetteer matching, pattern-based rules and spatiotemporal relationship extraction to identify and label terms in geological text documents. It offers a representation of contextual information in knowledge graph form, extracts a set of relevant tables and figures, and queries a list of relevant documents by using geological topic information. Here, text mining techniques are used to facilitate the analysis of geological knowledge and to show the effectiveness of text analysis for improving the rapid assessment of a massive number of documents. Furthermore, autogenerated keyword suggestions derived from extracted keyword associations are used to reduce document search efforts. This research illustrates the usefulness and effectiveness of the developed information extraction workflow and demonstrates the potential of incorporating text mining and NLP techniques for geoscience.
引用
收藏
页码:1393 / 1410
页数:17
相关论文
共 50 条
  • [1] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Tao, Liufeng
    EARTH SCIENCE INFORMATICS, 2020, 13 (04) : 1393 - 1410
  • [2] Automatic information extraction from unstructured mammography reports using distributed semantics
    Gupta, Anupama
    Banerjee, Imon
    Rubin, Daniel L.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 78 : 78 - 86
  • [3] EXTRACTION OF MANUFACTURING RULES FROM UNSTRUCTURED TEXT USING A SEMANTIC FRAMEWORK
    Kang, SungKu
    Patil, Lalit
    Rangarajan, Arvind
    Moitra, Abha
    Jia, Tao
    Robinson, Dean
    Dutta, Debasish
    INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2015, VOL 1B, 2016,
  • [4] Semantic Representation Extraction from Unstructured Arabic Text
    Zakria, Gehad
    Farouk, Mamdouh
    Fathy, Khaled
    Makar, Malak N.
    PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 222 - 226
  • [5] Towards Automatic Semantic Models by Extraction of Relevant Information from Online Text
    Krupp, Lars
    Gruenerbl, Agnes
    Bahle, Gernot
    Lukowicz, Paul
    2019 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2019), 2019, : 481 - 483
  • [6] Automatic Construction of Amharic Semantic Networks From Unstructured Text Using Amharic WordNet
    Tefera, Alelgn
    Assabie, Yaregal
    PROCEEDINGS OF THE SEVENTH GLOBAL WORDNET CONFERENCE, GWC 2014, 2014, : 172 - 177
  • [7] Spatiotemporal and semantic information extraction from Web news reports about natural hazards
    Wang, Wei
    Stewart, Kathleen
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2015, 50 : 30 - 40
  • [8] Automatic problem extraction and analysis from unstructured text in IT tickets
    Agarwal, S.
    Aggarwal, V.
    Akula, A. R.
    Dasgupta, G. B.
    Sridhara, G.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (01) : 41 - 52
  • [10] An Approach for Analyzing Unstructured Text Data Using Topic Modeling Techniques for Efficient Information Extraction
    Zadgaonkar, Ashwini
    Agrawal, Avinash J.
    NEW GENERATION COMPUTING, 2024, 42 (01) : 109 - 134