A survey on correlation analysis of big data

被引:0
|
作者
Liang J.-Y. [1 ]
Feng C.-J. [1 ,2 ]
Song P. [1 ,3 ]
机构
[1] Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan
[2] School of Applied Mathematics, Shanxi University of Finance & Economics, Taiyuan
[3] School of Economics and Management, Shanxi University, Taiyuan
来源
关键词
Big data; Correlation analysis; Correlation coefficient; Information entropy;
D O I
10.11897/SP.J.1016.2016.00001
中图分类号
学科分类号
摘要
In the big data time, correlation analysis has attracted much attention for its high-efficiency in analyzing inherent relation of things, and been effectively applied to many fields including recommender system, business analytics, public administration and medical diagnosis. Big data is usually nonlinear and high-dimensional. On the consideration of these complex characteristics and the semantic analysis for existing correlation analysis approaches, this paper gives a discussion of existing research findings of correlation analysis for big data. The discussion is analyzed from four aspects including statistical correlation analysis, mutual information, matrix calculation and distance. Based on summarizing classical correlation analysis theory in statistics, this paper firstly elaborates the nonlinear correlation analysis approaches between two stochastic variables induced by mutual information from the view of generality and equitability. Then, the correlation coefficient based on matrix calculation is analyzed in term of computability of high-dimensional data; and the distance correlation is analyzed from the point of complicated formation of nonlinear and high-dimensional data. Furthermore, on the account of analyzing and comparing existing correlation analysis approaches, challenges of correlation analysis for big data are studied, namely high dimensional data, multivariable data, large-scale data, incremental data and its computability. © 2016, Science Press. All right reserved.
引用
收藏
页码:1 / 18
页数:17
相关论文
共 84 条
  • [1] Big data, Nature, 455, 7209, pp. 1-136, (2008)
  • [2] Dealing with data, Science, 331, 6018, pp. 649-729, (2011)
  • [3] Manyika J., Chui M., Brown B., Et al., Big data: The next frontier for innovation, competition, and productivity, (2011)
  • [4] Big data, big impact: New possibilities for international development, World Economic Forum, (2012)
  • [5] Li G.-J., Cheng X.-Q., Research status and scientific thinking of big data, Bulletin of Chinese Academy of Sciences, 27, 6, pp. 647-657, (2012)
  • [6] Wang S., Wang H.-J., Qin X.-P., Zhou X., Architecting big data: Challenges, studies and forecast, Chinese Journal of Computers, 34, 10, pp. 1741-1752, (2011)
  • [7] Liang J.Y., Wang F., Dang C.Y., Qian Y.H., An efficient rough feature selection algorithm with a multi-granulation view, International Journal of Approximate Reasoning, 53, pp. 912-926, (2012)
  • [8] Zhou H.-X., Chen S.-C., Ordinal discriminative canonical correlation analysis, Journal of Software, 25, 9, pp. 2018-2025, (2014)
  • [9] Huo Z., Meng X.-F., A survey of trajectory privacy-preserving techniques, Chinese Journal of Computers, 34, 10, pp. 1820-1830, (2011)
  • [10] Meng X.-F., Gao H., Introduction of big data subject, Journal of Software, 25, 4, pp. 691-692, (2014)