Zero-Inflated Patent Data Analysis Using Compound Poisson Models

被引:2
|
作者
Park, Sangsung [1 ]
Jun, Sunghae [1 ]
机构
[1] Cheongju Univ, Dept Stat, Cheongju 28503, South Korea
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
基金
新加坡国家研究基金会;
关键词
zero-inflated data; compound Poisson model; generalized linear model; Poisson distribution; document-word matrix;
D O I
10.3390/app13074505
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A large part of big data consists of text documents such as papers, patents or articles. To analyze text data, we have to preprocess the text documents and build a structured data based on a document-word matrix using various text mining techniques. This is because statistics and machine learning algorithms used in text analysis require structured train data. The row and column of the matrix are document and word, respectively. The element of the matrix represents the frequency value of the word occurring in each document. In general, because the number of words is much larger than the number of documents, most elements have zero values. Due to the sparsity problem caused by inflated zeros, the performance of the predictive model has decreased. In this paper, we propose a method to solve the sparsity problem and improve the model performance in text data analysis. We perform compound Poisson linear modeling to make the proposed method. To show the performance of our proposed method, we collect and analyze the patent documents from patent databases. In our experimental results, we compared the value of the Akaike information criterion (AIC) of the proposed model with traditional models, such as linear model, generalized linear model and zero-inflated Poisson model. Additionally, we illustrated that the AIC value of our proposed model is smaller than others. Therefore, we verify the validity of this paper.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Zero-inflated models and estimation in zero-inflated Poisson distribution
    Wagh, Yogita S.
    Kamalja, Kirtee K.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (08) : 2248 - 2265
  • [2] The analysis of zero-inflated count data: Beyond zero-inflated Poisson regression.
    Loeys, Tom
    Moerkerke, Beatrijs
    De Smet, Olivia
    Buysse, Ann
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2012, 65 (01): : 163 - 180
  • [3] Identifiability of zero-inflated Poisson models
    Li, Chin-Shang
    [J]. BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2012, 26 (03) : 306 - 312
  • [4] Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples
    Uhm, Daiho
    Jun, Sunghae
    [J]. FUTURE INTERNET, 2022, 14 (07):
  • [5] Score tests for zero-inflated Poisson models
    Jansakul, N
    Hinde, JP
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 40 (01) : 75 - 96
  • [6] Multivariate zero-inflated Poisson models and their applications
    Li, CS
    Lu, JC
    Park, JH
    [J]. TECHNOMETRICS, 1999, 41 (01) : 29 - 38
  • [7] Zero-inflated compound Poisson distributions in integer-valued GARCH models
    Goncalves, Esmeralda
    Mendes-Lopes, Nazare
    Silva, Filipa
    [J]. STATISTICS, 2016, 50 (03) : 558 - 578
  • [8] Zero-Inflated Poisson Regression Models with Right Censored Count Data
    Saffari, Seyed Ehsan
    Adnan, Robiah
    [J]. MATEMATIKA, 2011, 27 (01): : 21 - 29
  • [9] Zero-inflated Poisson model with group data
    Yang, Jun
    Zhang, Xin
    [J]. ADVANCED MATERIALS DESIGN AND MECHANICS, 2012, 569 : 627 - 631
  • [10] Analysis of zero-inflated Poisson data incorporating extent of exposure
    Lee, AH
    Wang, K
    Yau, KKW
    [J]. BIOMETRICAL JOURNAL, 2001, 43 (08) : 963 - 975