Information Splitting for Big Data Analytics

被引:9
|
作者
Zhu, Shengxin [1 ]
Gu, Tongxiang [1 ]
Xu, Xiaowen [1 ]
Mo, Zeyao [1 ]
机构
[1] Inst Appl Phys & Computat Math, Lab Computat Phys, POB 8009, Beijing 100088, Peoples R China
关键词
Observed information matrix; Fisher information matrix; Fisher scoring algorithm; linear mixed model; breeding model; geno-wide-association; variance parameter estimation; GENOME-WIDE ASSOCIATION; LINEAR MIXED MODELS; ALGORITHM;
D O I
10.1109/CyberC.2016.64
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many statistical models require an estimation of unknown (co)-variance parameter(s). The estimation is usually obtained by maximizing a log-likelihood which involves log determinant terms. In principle, one requires the observed information-the negative Hessian matrix or the second derivative of the log-likelihood-to obtain an accurate maximum likelihood estimator according to the Newton method. When one uses the Fisher information, the expect value of the observed information, a simpler algorithm than the Newton method is obtained as the Fisher scoring algorithm. With the advance in high-throughput technologies in the biological sciences, recommendation systems and social networks, the sizes of data sets-and the corresponding statistical models-have suddenly increased by several orders of magnitude. Neither the observed information nor the Fisher information is easy to obtained for these big data sets. This paper introduces an information splitting technique to simplify the computation. After splitting the mean of the observed information and the Fisher information, an simpler approximate Hessian matrix for the log-likelihood can be obtained. This approximated Hessian matrix can significantly reduce computations, and makes the linear mixed model applicable for big data sets. Such a spitting and simpler formulas heavily depend on matrix algebra transforms, and applicable to large scale breeding model, genetics wide association analysis.
引用
收藏
页码:294 / 302
页数:9
相关论文
共 50 条
  • [31] Introduction to big data and analytics: Pathways to maturity the original big data and analytics minitrack
    Kaisler, Stephen H.
    Armour, Frank J.
    Espinosa, J. Alberto
    [J]. Proceedings of the Annual Hawaii International Conference on System Sciences, 2021, 2020-January : 936 - 939
  • [32] Security Analytics: Big Data Analytics for Cybersecurity
    Mahmood, Tariq
    Afzal, Uzma
    [J]. 2013 2ND NATIONAL CONFERENCE ON INFORMATION ASSURANCE (NCIA), 2013, : 129 - 134
  • [33] Deriving information from external Big Databases and Big Data analytics: all that glitters is not gold
    Angel Martinez-Garcia, Miguel
    Anh Tuan Dinh-Xuan
    [J]. EUROPEAN RESPIRATORY JOURNAL, 2016, 47 (04) : 1047 - 1049
  • [34] Protagonist of Big Data and Predictive Analytics using data analytics
    Subbalakshmi, Sakineti
    Prabhu, C. S. R.
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 276 - 279
  • [35] Big Data Analytics in Healthcare
    Ambigavathi, M.
    Sridharan, D.
    [J]. 2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, : 269 - 276
  • [36] Big data analytics with applications
    Bi, Zhuming
    Cochran, David
    [J]. JOURNAL OF MANAGEMENT ANALYTICS, 2014, 1 (04) : 249 - 265
  • [37] IoT Big Data Analytics
    Choudhury, Salimur
    Ye, Qiang
    Dong, Mianxiong
    Zhang, Qingchen
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2019, 2019
  • [38] Big Data Analytics in Retail
    Lekhwar, Shubham
    Yadav, Shweta
    Singh, Archana
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS, ICTIS 2018, VOL 2, 2019, 107 : 469 - 477
  • [39] Big Data and Analytics in Healthcare
    Tan, S. S. -L.
    Gao, G.
    Koch, S.
    [J]. METHODS OF INFORMATION IN MEDICINE, 2015, 54 (06) : 546 - 547
  • [40] Handbook of Big Data Analytics
    Shalabh
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2019, 182 (04) : 1646 - 1647