A Framework for Linking RDF Datasets for Thailand Open Government Data Based on Semantic Type Detection

被引:3
|
作者
Krataithong, Pattama [1 ,2 ]
Buranarach, Marut [1 ]
Hongwarittorrn, Nattanont [2 ]
Supnithi, Thepchai [1 ]
机构
[1] Natl Elect & Comp Technol Ctr NECTEC, Language & Semant Technol Lab, Pathum Thani, Thailand
[2] Thammasat Univ, Fac Sci & Technol, Dept Comp Sci, Pathum Thani, Thailand
关键词
Finding semantic types; Name Entity Recognition (NER); Automatic ontology creation; Automatic linked dataset creation;
D O I
10.1007/978-3-319-49304-6_31
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most of datasets in open government data portals are mainly in tabular format in spreadsheet, e.g. CSV and XLS. To increase the value and reusability of these datasets, the datasets should be made available in RDF format that can support better data querying and data integration. Our previous work proposed a semi-automatic framework for generating RDF datasets from existing datasets in tabular format. In this paper, we extend our framework to support automatic linking of the RDF datasets. One of the important steps is mapping some literal values that appear in a dataset to some standard URIs. Several previous researches use semantic search API such as DBpedia or Sindice for URI mapping. However, this approach is not appropriate for the datasets of Thailand open data portal (Data.go.th) because there is insufficient data for Thai name entities. In addition, a name may match with more than one URI, i.e. word ambiguity. For example, the name "Bangkok" may match with those referenced by URIs of a province, a hospital or a university. To resolve these issues, our framework proposes that finding semantic types is essential to resolve word ambiguity in retrieving a proper URI for a name entity. This paper presents a framework for finding semantic types and mapping name entities to URIs, i.e. URI lookup. A Name Entity Recognition (NER) technique is applied in finding semantic type of a column in a CSV dataset. The results are used for creating ontology and RDF data that include the URI mappings for name entities. We evaluate two approaches by comparing the performance of a semantic search API, i.e. Wikipedia and the NER technique using some datasets from the Data.go.th website.
引用
收藏
页码:257 / 268
页数:12
相关论文
共 50 条
  • [1] A Conceptual Framework for Linking Open Government Data Based-On Geolocation: A Case of Thailand
    Budsapawanich, Punnawit
    Anutariya, Chutiporn
    Haruechaiyasak, Choochart
    [J]. SEMANTIC TECHNOLOGY (JIST 2018), 2018, 11341 : 352 - 366
  • [2] Linking of Open Government Data
    Fleiner, Rita
    [J]. 2018 IEEE 12TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI), 2018, : 479 - 483
  • [3] An Analysis of High-Value Datasets : A Case Study of Thailand's Open Government Data
    Utamachant, Piriya
    Anutariya, Chutiporn
    [J]. 2018 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2018, : 375 - 380
  • [4] Ontology-based Semantic Search For Open Government Data
    Jiang, Shanshan
    Hagelien, Thomas F.
    Natvig, Marit
    Li, Jingyue
    [J]. 2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 7 - 15
  • [5] Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data
    Chen, Bin
    Dong, Xiao
    Jiao, Dazhi
    Wang, Huijun
    Zhu, Qian
    Ding, Ying
    Wild, David J.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [6] Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data
    Bin Chen
    Xiao Dong
    Dazhi Jiao
    Huijun Wang
    Qian Zhu
    Ying Ding
    David J Wild
    [J]. BMC Bioinformatics, 11
  • [7] Noisy Type Assertion Detection in Semantic Datasets
    Zhu, Man
    Gao, Zhiqiang
    Quan, Zhibin
    [J]. SEMANTIC WEB - ISWC 2014, PT I, 2014, 8796 : 373 - 388
  • [8] IndiMaker - Open Data Linking Framework
    Preisegger, Juan Santiago
    Greco, Alejandro
    Pasini, Ariel
    Boracchia, Marcos
    Pesado, Patricia
    [J]. COMPUTER SCIENCE - CACIC 2020, 2021, 1409 : 337 - 349
  • [9] Exploring the extent of openness of open government data - A critique of open government datasets in the UK
    Wang, Victoria
    Shepherd, David
    [J]. GOVERNMENT INFORMATION QUARTERLY, 2020, 37 (01)
  • [10] An RDF-based framework for Semantic Indexing of web pages
    Amato, F.
    Moscato, V.
    Persia, F.
    Picariello, A.
    Gargiulo, F.
    [J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 395 - +