Opinion mining from noisy text data

被引:41
|
作者
Dey, Lipika [1 ]
Haque, Sk. Mirajul [1 ]
机构
[1] Tata Consultancy Serv, Innovat Labs, Udyog Vihar, Gurgaon, India
关键词
Noisy text; Context-dependent cleaning; Opinion mining; WordNet; Text analytics for market knowledge discovery;
D O I
10.1007/s10032-009-0090-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. The chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach, in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics. Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set of pre-defined blogs.
引用
收藏
页码:205 / 226
页数:22
相关论文
共 50 条
  • [1] Opinion mining from noisy text data
    Lipika Dey
    Sk. Mirajul Haque
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2009, 12 : 205 - 226
  • [2] Analyzing Text Data for Opinion Mining
    Wei, Wei
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 330 - 335
  • [3] Mining Opinion from Text Documents: A Survey
    Khan, Khairullah
    Baharudin, Baharum B.
    Khan, Aurangzeb
    Fazal-e-Malik
    [J]. 2009 3RD IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES, 2009, : 194 - 199
  • [4] Mining Opinion and Sentiment from Arabic Text
    Malik, Asif
    Aoudi, Samer
    Alteneiji, Salem
    Khdour, Thair
    Saleh, Mohammed
    Hamdan, Issam
    [J]. 2020 SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY TRENDS (ITT 2020), 2020, : 165 - 168
  • [5] Text mining and data information analysis for network public opinion
    Hu, Yan
    [J]. Data Science Journal, 2019, 18 (01)
  • [6] Using Collaborative Tagging for Text Classification: From Text Classification to Opinion Mining
    Charton, Eric
    Meurs, Marie-Jean
    Jean-Louis, Ludovic
    Gagnon, Michel
    [J]. INFORMATICS-BASEL, 2014, 1 (01): : 32 - 51
  • [7] Emotion Analysis for Opinion Mining From Text: A Comparative Study
    Mohsen, Amr Mansour
    Idrees, Amira M.
    Hassan, Hesham Ahmed
    [J]. INTERNATIONAL JOURNAL OF E-COLLABORATION, 2019, 15 (01) : 38 - 58
  • [8] Research on Feature Extraction from Chinese Text for Opinion Mining
    Zhu, Shanzong
    Liu, Yuanchao
    Liu, Ming
    Tian, Peiliang
    [J]. 2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 7 - 10
  • [9] Supervised semantic relation mining from linguistically noisy text documents
    Cristina Giannone
    Roberto Basili
    Paolo Naggar
    Alessandro Moschitti
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2011, 14 : 213 - 228
  • [10] Supervised semantic relation mining from linguistically noisy text documents
    Giannone, Cristina
    Basili, Roberto
    Naggar, Paolo
    Moschitti, Alessandro
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2011, 14 (02) : 213 - 228