Opinion mining from noisy text data

被引:0
|
作者
Lipika Dey
Sk. Mirajul Haque
机构
[1] Tata Consultancy Services,Innovation Labs
关键词
Noisy text; Context-dependent cleaning; Opinion mining; WordNet; Text analytics for market knowledge discovery;
D O I
暂无
中图分类号
学科分类号
摘要
The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. The chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach, in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics. Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set of pre-defined blogs.
引用
收藏
页码:205 / 226
页数:21
相关论文
共 50 条
  • [41] Text Mining Technique for Data Mining Application
    Govindarajan, M.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 544 - 549
  • [42] Integration of text and data mining
    Drewes, B
    DATA MINING III, 2002, 6 : 289 - 298
  • [43] Text summarization in data mining
    Crangle, CE
    SOFT-WARE 2002: COMPUTING IN AN IMPERFECT WORLD, 2002, 2311 : 332 - 347
  • [44] Data Analysis Support by Combining Data Mining and Text Mining
    Matsumoto, Tomoya
    Sunayama, Wataru
    Hatanaka, Yuji
    Ogohara, Kazunori
    2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 313 - 318
  • [45] Opinion Mining on Usability Testing Data
    Tekin, Cetin
    Yuksek, Hakan
    Aktas, Mehmet S.
    Arslan, Burak
    Sahin, Yunus
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [46] Opinion Mining on Social Media Data
    Liang, Po-Wei
    Dai, Bi-Ru
    2013 IEEE 14TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2013), VOL 2, 2013, : 91 - 96
  • [47] Integrating induction and deduction for noisy data mining
    Zhang, Yan
    Wu, Xindong
    INFORMATION SCIENCES, 2010, 180 (14) : 2663 - 2673
  • [48] Mining Local Staircase Patterns in Noisy Data
    Thanh Le Van
    Fierro, Ana Carolina
    Guns, Tias
    van Leeuwen, Matthijs
    Nijssen, Siegfried
    De Raedt, Luc
    Marchal, Kathleen
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 139 - 146
  • [49] Early and Late Fusion of Emojis and Text to Enhance Opinion Mining
    Al-Azani, Sadam
    El-Alfy, El-Sayed M.
    IEEE ACCESS, 2021, 9 (09): : 121031 - 121045
  • [50] Virtual Reality Technology: Analysis based on text and opinion mining
    Sanchez, Pedro R. Palos
    Folgado-Fernandez, Jose A.
    Rojas Sanchez, Mario Alberto
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (08) : 7856 - 7885