A machine-learning approach to automatic detection of delimiters in tabular data files

被引:0
|
作者
Saurav, Shitesh [1 ]
Schwarz, Peter [2 ]
机构
[1] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90007 USA
[2] IBM Res Almaden, San Jose, CA USA
关键词
data ingestion; delimiters; logistic regression;
D O I
10.1109/HPCC-SmartCity-DSS.2016.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detection of string and column delimiters is a critical first step in the automated ingestion of files containing tabular data. In this paper we present an algorithm that uses a logistic-regression classifier to evaluate whether a particular choice of delimiters is correct. The delimiter choice that is given the highest score by the classifier is chosen as the one most likely to be correct. The algorithm makes the correct choice over 90% of the time on a test data set of files with a variety of different delimiters.
引用
收藏
页码:1501 / 1503
页数:3
相关论文
共 50 条
  • [31] A Machine-Learning Based Microwave Sensing Approach to Food Contaminant Detection
    Urbinati, Luca
    Ricci, Marco
    Turvani, Giovanna
    Vasquez, Jorge A. Tobon
    Vipiana, Francesca
    Casu, Mario R.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [32] Detection of Colchicum autumnale in drone images, using a machine-learning approach
    Lukas Petrich
    Georg Lohrmann
    Matthias Neumann
    Fabio Martin
    Andreas Frey
    Albert Stoll
    Volker Schmidt
    Precision Agriculture, 2020, 21 : 1291 - 1303
  • [33] A Machine-Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery
    Warrick, Philip A.
    Hamilton, Emily F.
    Kearney, Robert E.
    Precup, Doina
    AI MAGAZINE, 2012, 33 (02) : 79 - 90
  • [34] Detection of Colchicum autumnale in drone images, using a machine-learning approach
    Petrich, Lukas
    Lohrmann, Georg
    Neumann, Matthias
    Martin, Fabio
    Frey, Andreas
    Stoll, Albert
    Schmidt, Volker
    PRECISION AGRICULTURE, 2020, 21 (06) : 1291 - 1303
  • [35] A machine-learning approach for damage detection in aircraft structures using self-powered sensor data
    Salehi, Hadi
    Das, Saptarshi
    Chakrabartty, Shantanu
    Biswas, Subir
    Burgueno, Rigoberto
    SENSORS AND SMART STRUCTURES TECHNOLOGIES FOR CIVIL, MECHANICAL, AND AEROSPACE SYSTEMS 2017, 2017, 10168
  • [36] A Machine-Learning Approach to Time Discrimination
    Hansen, Peter
    2010 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD (NSS/MIC), 2010, : 2132 - 2133
  • [37] Theory Identity: A Machine-Learning Approach
    Larsen, Kai R.
    Hovorka, Dirk
    West, Jevin
    Birt, James
    Pfaff, James R.
    Chambers, Trevor W.
    Sampedro, Zebula R.
    Zager, Nick
    Vanstone, Bruce
    2014 47TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2014, : 4639 - 4648
  • [38] Predicting obesity and smoking using medication data: a machine-learning approach
    Ali, Sitwat
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2021, 50
  • [39] Predicting obesity and smoking using medication data: A machine-learning approach
    Ali, Sitwat
    Na, Renhua
    Waterhouse, Mary
    Jordan, Susan J.
    Olsen, Catherine M.
    Whiteman, David C.
    Neale, Rachel E.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 (01) : 91 - 99
  • [40] Machine learning for question answering from tabular data
    Khalid, Mahboob Alam
    Jijkoun, Valentin
    de Rijke, Maarten
    DEXA 2007: 18TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, : 392 - +