Automatic Extraction Method of Hot Words Based on Agricultural Network Information Classification

被引:0
|
作者
Duan Q. [1 ]
Zhang L. [1 ]
Liu Y. [1 ]
Wang S. [2 ]
机构
[1] College of Information and Electrical Engineering, China Agricultural University, Beijing
[2] Agricultural Information Technology Limited Liability Company of Beijing, Beijing
关键词
Agricultural network information; Agricultural public opinion monitoring; Heat calculation; Hot word; Multi-label classification;
D O I
10.6041/j.issn.1000-1298.2018.07.020
中图分类号
学科分类号
摘要
With the vigorous development of the Internet, the network information grows rapidly, so does the agricultural network information. Extracting hot words from massive information is of great significance for monitoring and analyzing agricultural public opinion. Up to now, there is some research on hot words extraction, but there are still many problems such as poor pertinence. Existing hot word extraction methods cannot meet the personalized needs of users in different industries in agriculture. Therefore, a method of automatically extracting hot words based on agricultural network information classification was proposed. Firstly, the texts were classified by using the multi-label classification algorithm and multiple corpuses were built according to the classification categories. Secondly, the hot word candidates for each category were extracted by using the method based on information entropy. Thirdly, the heat of each hot word candidate was calculated by using the method based on time variation. Finally, these candidates were sorted by heat degree, and hot words were got according to the sorting results. Totally 15 354 texts from agricultural websites were extracted for the experiment, automatically obtaining the hot words in the specified time period. The experiment results showed that the accuracy was over 0.9. It proved that the proposed method can extract agricultural hot words with high quality and help different agricultural user groups find and analyze the hot spot information of the industry. © 2018, Chinese Society of Agricultural Machinery. All right reserved.
引用
收藏
页码:160 / 167
页数:7
相关论文
共 27 条
  • [1] Wang Y., Shuai J., Chen Z., Hot word extraction for microblog based on massive data filtering, Computer Systems &Applications, 21, 11, pp. 131-136, (2012)
  • [2] Hao X.L., Mao J.H., Yu X.Y., Micro-blogging hot words extraction and topic detection, Journal of Intelligence, 34, 6, (2015)
  • [3] Guo C., Algorithm of network hot word detection based on news title, Computer and Modernization, 3, (2013)
  • [4] Wu D., Tang X.J., Preliminary analysis of Baidu hot words, Proceedings of the 11th Youth Conference on Systems Science and Management Science, pp. 478-483, (2011)
  • [5] Li Y., Sun L., Hot-word detection for internet public sentiment, Journal of Chinese Information Processing, 25, 1, (2011)
  • [6] Geng S., Newword recognition and hot word ranking methods, (2013)
  • [7] Tang R., The study on the extraction of the news topic based on web mining of micro-blog hot words, (2014)
  • [8] Chen S., Bigdata analysis and data velocity, Journal of Computer Research and Development, 52, 2, pp. 333-342, (2015)
  • [9] Cheng X.Q., Jin X.L., Wang Y.Z., Et al., Survey on big data system and analytic technology, Journal of Software, 25, 9, pp. 1889-1908, (2014)
  • [10] Lazer D., Kennedy R., King G., Et al., Big data. The parable of google flu: traps in big data analysis, Science, 343, 6176, (2014)