Zero-Inflated Patent Data Analysis Using Compound Poisson Models

被引:2
|
作者
Park, Sangsung [1 ]
Jun, Sunghae [1 ]
机构
[1] Cheongju Univ, Dept Stat, Cheongju 28503, South Korea
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
基金
新加坡国家研究基金会;
关键词
zero-inflated data; compound Poisson model; generalized linear model; Poisson distribution; document-word matrix;
D O I
10.3390/app13074505
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A large part of big data consists of text documents such as papers, patents or articles. To analyze text data, we have to preprocess the text documents and build a structured data based on a document-word matrix using various text mining techniques. This is because statistics and machine learning algorithms used in text analysis require structured train data. The row and column of the matrix are document and word, respectively. The element of the matrix represents the frequency value of the word occurring in each document. In general, because the number of words is much larger than the number of documents, most elements have zero values. Due to the sparsity problem caused by inflated zeros, the performance of the predictive model has decreased. In this paper, we propose a method to solve the sparsity problem and improve the model performance in text data analysis. We perform compound Poisson linear modeling to make the proposed method. To show the performance of our proposed method, we collect and analyze the patent documents from patent databases. In our experimental results, we compared the value of the Akaike information criterion (AIC) of the proposed model with traditional models, such as linear model, generalized linear model and zero-inflated Poisson model. Additionally, we illustrated that the AIC value of our proposed model is smaller than others. Therefore, we verify the validity of this paper.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Zero-Inflated INGARCH Using Conditional Poisson and Negative Binomial: Data Application
    Yoon, J. E.
    Hwang, S. Y.
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2015, 28 (03) : 583 - 592
  • [42] Spatial analysis of G.f.fuscipes abundance in Uganda using Poisson and Zero-Inflated Poisson regression models
    Mugenyi, Albert
    Muhanguzi, Dennis
    Hendrickx, Guy
    Nicolas, Gaelle
    Waiswa, Charles
    Torr, Steve
    Welburn, Susan Christina
    Atkinson, Peter M.
    [J]. PLOS NEGLECTED TROPICAL DISEASES, 2021, 15 (12):
  • [43] Bayesian Analysis for the Zero-inflated Regression Models
    Jane, Hakjin
    Kang, Yunhee
    Lee, S.
    Kim, Seong W.
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2008, 21 (04) : 603 - 613
  • [44] A TRANSITION MODEL FOR ANALYSIS OF ZERO-INFLATED LONGITUDINAL COUNT DATA USING GENERALIZED POISSON REGRESSION MODEL
    Baghfalaki, Taban
    Ganjali, Mojtaba
    [J]. REVSTAT-STATISTICAL JOURNAL, 2020, 18 (01) : 27 - 45
  • [45] ZERO-INFLATED POISSON REGRESSION MODELS: APPLICATIONS IN THE SCIENCES AND SOCIAL SCIENCES
    Truong, Buu-Chau
    Pho, Kim-Hung
    Dinh, Cong-Chanh
    McAleer, Michael
    [J]. ANNALS OF FINANCIAL ECONOMICS, 2021, 16 (02)
  • [46] Models for zero-inflated count data using the Neyman type A distribution
    Dobbie, Melissa J.
    Welsh, Alan H.
    [J]. STATISTICAL MODELLING, 2001, 1 (01) : 65 - 80
  • [47] Bayesian analysis of zero-inflated regression models
    Ghosh, SK
    Mukhopadhyay, P
    Lu, JC
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2006, 136 (04) : 1360 - 1375
  • [48] Finite mixture, zero-inflated Poisson and hurdle models with application to SIDS
    Dalrymple, ML
    Hudson, IL
    Ford, RPK
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) : 491 - 504
  • [49] A lack-of-fit test for parametric zero-inflated Poisson models
    Li, Chin-Shang
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2011, 81 (09) : 1081 - 1098
  • [50] Some findings on zero-inflated and hurdle poisson models for disease mapping
    Corpas-Burgos, Francisca
    Garcia-Donato, Gonzalo
    Martinez-Beneito, Miguel A.
    [J]. STATISTICS IN MEDICINE, 2018, 37 (23) : 3325 - 3337