Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective
被引:1
|
作者:
Xu, Kaiquan
论文数: 0引用数: 0
h-index: 0
机构:
Nanjing Univ, Sch Business, Mkt & eBusiness Dept, Nanjing 210093, Jiangsu, Peoples R ChinaNanjing Univ, Sch Business, Mkt & eBusiness Dept, Nanjing 210093, Jiangsu, Peoples R China
Xu, Kaiquan
[1
]
Liao, Stephen Shaoyi
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Informat Syst, Kowloon, Hong Kong, Peoples R ChinaNanjing Univ, Sch Business, Mkt & eBusiness Dept, Nanjing 210093, Jiangsu, Peoples R China
Liao, Stephen Shaoyi
[2
]
Lau, Raymond Y. K.
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Informat Syst, Kowloon, Hong Kong, Peoples R ChinaNanjing Univ, Sch Business, Mkt & eBusiness Dept, Nanjing 210093, Jiangsu, Peoples R China
Lau, Raymond Y. K.
[2
]
Zhao, J. Leon
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Informat Syst, Kowloon, Hong Kong, Peoples R ChinaNanjing Univ, Sch Business, Mkt & eBusiness Dept, Nanjing 210093, Jiangsu, Peoples R China
Zhao, J. Leon
[2
]
机构:
[1] Nanjing Univ, Sch Business, Mkt & eBusiness Dept, Nanjing 210093, Jiangsu, Peoples R China
[2] City Univ Hong Kong, Dept Informat Syst, Kowloon, Hong Kong, Peoples R China
active learning;
machine learning;
data mining;
optimization;
business intelligence;
RANDOM-FIELDS;
WEB;
ACQUISITION;
INFORMATION;
EXTRACTION;
ONLINE;
MODELS;
D O I:
10.1287/ijoc.2013.0578
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
Classical supervised machine learning techniques have been explored for semantically annotating unstructured textual data such as consumers' comments archived at social media websites to extract business intelligence. However, these techniques often require a large number of manually labeled training examples to produce accurate annotations. Several active learning approaches that are designed based on probabilistic sequence models have been explored to minimize the number of labeled training examples for semantic annotation tasks. Recent research has shown that large-margin classifiers are viable alternatives to automated semantic annotation, given their strong generalization capabilities and the ability to process high-dimensional data. However, the existing active learning methods that are designed for probabilistic sequence models cannot be easily adapted and applied to large-margin classifiers. The main contribution of this paper is the development of novel active learning methods for large-margin classifiers to fill the aforementioned research gap. In particular, we propose an innovative perspective of taking active learning as a search of optimal parameters for large-margin classifiers. A rigorous evaluation involving two benchmark tests and an empirical test based on real-world data extracted from Amazon.com reveals that the proposed active learning methods can train effective classifiers with significantly fewer training examples while achieving similar annotation performance, compared to a typical state-of-the-art classifier that only uses several labeled training examples. More specifically, one of our proposed active learning methods can reduce the number of training examples by 19.74% at the 68% level of F 1 when compared to the best baseline method, as evaluated based on the Amazon data set. Our research opens the door to the application of intelligent semantic annotation techniques to support real-world applications such as automatically analyzing consumer comments for customer relationship management.