A Two-Step Resume Information Extraction Algorithm

被引:14
|
作者
Chen, Jie [1 ]
Zhang, Chunxia [2 ]
Niu, Zhendong [1 ,3 ,4 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Sch Software, Beijing 100081, Peoples R China
[3] Beijing Inst Technol, Beijing Engn Res Ctr Mass Language Informat Proc, Beijing 100081, Peoples R China
[4] Univ Pittsburgh, Sch Comp & Informat, Pittsburgh, PA 15260 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1155/2018/5761287
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid growth of Internet-based recruiting, there are a great number of personal resumes among recruiting systems. To gain more attention from the recruiters, most resumes are written in diverse formats, including varying font size, font colour, and table cells. However, the diversity of format is harmful to data mining, such as resume information extraction, automatic job matching, and candidates ranking. Supervised methods and rule-based methods have been proposed to extract facts from resumes, but they strongly rely on hierarchical structure information and large amounts of labelled data, which are hard to collect in reality. In this paper, we propose a two-step resume information extraction approach. In the first step, raw text of resume is identified as different resume blocks. To achieve the goal, we design a novel feature, Writing Style, to model sentence syntax information. Besides word index and punctuation index, word lexical attribute and prediction results of classifiers are included in Writing Style. In the second step, multiple classifiers are employed to identify different attributes of fact information in resumes. Experimental results on a real-world dataset show that the algorithm is feasible and effective.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Two-step Lanczos algorithm for model order reduction
    Wittig, T
    Munteanu, I
    Schuhmann, R
    Weiland, T
    IEEE TRANSACTIONS ON MAGNETICS, 2002, 38 (02) : 673 - 676
  • [32] Two-step frequency estimation algorithm for PSAM systems
    Luo Wu
    Bin Liu
    Liu An
    2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1195 - +
  • [33] A two-step matching algorithm for autonomous star identification
    Gao Yudong
    Huang Senlin
    Nie Yao
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON ADVANCED DESIGN AND MANUFACTURING ENGINEERING (ICADME 2016), 2016, 96 : 760 - 771
  • [34] An efficient two-step algorithm for the incompressible flow problem
    Pengzhan Huang
    Xinlong Feng
    Yinnian He
    Advances in Computational Mathematics, 2015, 41 : 1059 - 1077
  • [35] Two-Step Greedy Algorithm for Reduced Order Quadratures
    Antil, Harbir
    Field, Scott E.
    Herrmann, Frank
    Nochetto, Ricardo H.
    Tiglio, Manuel
    JOURNAL OF SCIENTIFIC COMPUTING, 2013, 57 (03) : 604 - 637
  • [36] Nonmelanocytic lesions defying the two-step dermoscopy algorithm
    Scope, Alon
    Benvenuto-Andrade, Cristiane
    Agero, Anna Liza C.
    Marghoob, Ashfaq A.
    DERMATOLOGIC SURGERY, 2006, 32 (11) : 1398 - 1406
  • [37] Two-step MUSIC algorithm for improved array resolution
    Chavanne, R
    Abed-Meraim, K
    Médynski, D
    2001 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING PROCEEDINGS, 2001, : 512 - 515
  • [38] A new two-step algorithm for ionospheric tomography solution
    Wen, Debao
    Wang, Yong
    Norman, Robert
    GPS SOLUTIONS, 2012, 16 (01) : 89 - 94
  • [39] A two-step artificial bee colony algorithm for clustering
    Kumar, Yugal
    Sahoo, G.
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 (03): : 537 - 551
  • [40] An two-step spectral detection algorithm in Cognitive radio
    Li, Yingxue
    Yuan, Chaowei
    Shen, Shuqun
    Journal of Computational Information Systems, 2012, 8 (22): : 9465 - 9472