A machine-learning approach to automatic detection of delimiters in tabular data files

被引:0
|
作者
Saurav, Shitesh [1 ]
Schwarz, Peter [2 ]
机构
[1] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90007 USA
[2] IBM Res Almaden, San Jose, CA USA
关键词
data ingestion; delimiters; logistic regression;
D O I
10.1109/HPCC-SmartCity-DSS.2016.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detection of string and column delimiters is a critical first step in the automated ingestion of files containing tabular data. In this paper we present an algorithm that uses a logistic-regression classifier to evaluate whether a particular choice of delimiters is correct. The delimiter choice that is given the highest score by the classifier is chosen as the one most likely to be correct. The algorithm makes the correct choice over 90% of the time on a test data set of files with a variety of different delimiters.
引用
收藏
页码:1501 / 1503
页数:3
相关论文
共 50 条
  • [41] Machine-Learning Based Automatic and Real-time Detection of Mouse Scratching Behaviors
    Park, Ingyu
    Lee, Kyeongho
    Bishayee, Kausik
    Jeon, Hong Jin
    Lee, Hyosang
    Lee, Unjoo
    EXPERIMENTAL NEUROBIOLOGY, 2019, 28 (01) : 54 - 61
  • [42] Novel automatic scorpion-detection and -recognition system based on machine-learning techniques
    Giambelluca, Francisco L.
    Cappelletti, Marcelo A.
    Osio, Jorge R.
    Giambelluca, Luis A.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (02):
  • [43] A machine-learning approach for automatic classification of volcanic seismicity at La Soufriere Volcano, Guadeloupe
    Falcin, Alexis
    Metaxian, Jean-Philippe
    Mars, Jerome
    Stutzmann, Eleonore
    Komorowski, Jean-Christophe
    Moretti, Roberto
    Malfante, Marielle
    Beauducel, Francois
    Saurel, Jean-Marie
    Dessert, Celine
    Burtin, Arnaud
    Ucciani, Guillaume
    de Chabalier, Jean-Bernard
    Lemarchand, Arnaud
    JOURNAL OF VOLCANOLOGY AND GEOTHERMAL RESEARCH, 2021, 411
  • [44] Machine-learning based automatic assessment of communication in interpreting
    Wang, Xiaoman
    Yuan, Lu
    FRONTIERS IN COMMUNICATION, 2023, 8
  • [45] Detection of genomic regions associated malformations in newborn piglets: a machine-learning approach
    Bakoev, Siroj
    Traspov, Aleksei
    Getmantseva, Lyubov
    Belous, Anna
    Karpushkina, Tatiana
    Kostyunina, Olga
    Usatov, Alexander
    Tatarinova, Tatiana, V
    PEERJ, 2021, 9
  • [46] A Machine-Learning Approach to Keypoint Detection and Landmarking on 3D Meshes
    Creusot, Clement
    Pears, Nick
    Austin, Jim
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 102 (1-3) : 146 - 179
  • [47] Automated Detection of Multi-Rotor UAVs Using a Machine-Learning Approach
    Grac, Simon
    Beno, Peter
    Duchon, Frantisek
    Dekan, Martin
    Tolgyessy, Michal
    APPLIED SYSTEM INNOVATION, 2020, 3 (03) : 1 - 23
  • [48] PhishNot: A Cloud-Based Machine-Learning Approach to Phishing URL Detection
    Alani, Mohammed M.
    Tawfik, Hissam
    COMPUTER NETWORKS, 2022, 218
  • [49] Detection of genomic regions associated malformations in newborn piglets: a machine-learning approach
    Bakoev, Siroj
    Traspov, Aleksei
    Getmantseva, Lyubov
    Belous, Anna
    Karpushkina, Tatiana
    Kostyunina, Olga
    Usatov, Alexander
    Tatarinova, Tatiana V.
    PeerJ Computer Science, 2021, 9
  • [50] A Machine-Learning Approach to Keypoint Detection and Landmarking on 3D Meshes
    Clement Creusot
    Nick Pears
    Jim Austin
    International Journal of Computer Vision, 2013, 102 : 146 - 179