Topic-based classification and identification of global trends for startup companies

被引:15
|
作者
Savin, Ivan [1 ,2 ]
Chukavina, Kristina [2 ]
Pushkarev, Andrey [2 ]
机构
[1] Univ Autonoma Barcelona, Inst Environm Sci & Technol, Barcelona, Spain
[2] Ural Fed Univ, Grad Sch Econ & Management, Ekaterinburg, Russia
基金
俄罗斯科学基金会;
关键词
Crunchbase; Machine learning; Natural language processing; Investments; Entrepreneurship; PATENT DATA; BARRIERS; DECADE; ONLINE;
D O I
10.1007/s11187-022-00609-6
中图分类号
F [经济];
学科分类号
02 ;
摘要
To foresee global economic trends, one needs to understand the present startup companies that soon may become new market leaders. In this paper, we explore textual descriptions of more than 250 thousand startups in the Crunchbase database. We analyze the 2009-2019 period by using topic modeling. We propose a novel classification of startup companies free from expert bias that contains 38 topics and quantifies the weight of each of these topics for all the startups. Taking the year of establishment and geographical location of the startups into account, we measure which topics were increasing or decreasing their share over time, and which of them were predominantly present in Europe, North America, or other regions. We find that the share of startups focused on data analytics, social platforms, and financial transfers, and time management has risen, while an opposite trend is observed for mobile gaming, online news, and online social networks as well as legal and professional services. We also identify strong regional differences in topic distribution, suggesting certain concentration of the startups. For example, sustainable agriculture is presented stronger in South America and Africa, while pharmaceutics, in North America and Europe. Furthermore, we explore which pairs of topics tend to co-occur more often together, quantify how multisectoral the startups are, and which startup classes attract more investments. Finally, we compare our classification to the one existing in the Crunchbase database, demonstrating how we improve it. Plain English Summary We propose a novel classification of more than 250 thousand startups registered in the Crunchbase database based on machine learning algorithms and free from expert bias. We find that the share of startups focused on data analytics, social platforms, and financial transfers, and time management has risen, while an opposite trend is observed for mobile gaming, online news, and online social networks as well as legal and professional services. We also identify strong regional differences in class distribution, suggesting, for example, sustainable agriculture being present stronger in South America and Africa, while pharmaceutics, in North America and Europe. Our classification can improve analysis of the Crunchbase data further promoting the popularity of the platform, while the trends identified will be useful for investors and policy makers. Last not least, this paper presents the first application of topic modeling to startup companies, providing thus a new direction for academic research.
引用
收藏
页码:659 / 689
页数:31
相关论文
共 50 条
  • [1] Topic-based classification and identification of global trends for startup companies
    Ivan Savin
    Kristina Chukavina
    Andrey Pushkarev
    [J]. Small Business Economics, 2023, 60 : 659 - 689
  • [2] Topic-based Classification through Unigram Unmasking
    HaCohen-Kerner, Yaakov
    Rosenfeld, Avi
    Sabag, Asaf
    Tzidkani, Maor
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 69 - 76
  • [3] Exploring independent trends in a topic-based search engine
    Perkiö, J
    Buntine, W
    Perttu, S
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 664 - 668
  • [4] Topic-Based Instance and Feature Selection in Multilabel Classification
    Ma, Jianghong
    Chow, Tommy W. S.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 315 - 329
  • [5] Topic-based habitat classification using visual data
    Pizarro, Oscar
    Williams, Stefan B.
    Colquhoun, Jamie
    [J]. OCEANS 2009 - EUROPE, VOLS 1 AND 2, 2009, : 1320 - +
  • [6] Learning topic-based mixture models for factored classification
    Chen, Qiong
    Mitchell, Tom M.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, 2006, : 25 - +
  • [7] Learning topic-based mixture models for factored classification
    Chen, Qiong
    Mitchell, Tom M.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, 2006, : 1114 - +
  • [8] Topic-Based Microblog Polarity Classification Based on Cascaded Model
    Liu, Quanchao
    Hu, Yue
    Lei, Yangfan
    Wei, Xiangpeng
    Liu, Guangyong
    Bi, Wei
    [J]. COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 206 - 220
  • [9] Feature selection for the topic-based mixture model in factored classification
    Chen, Qiong
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 39 - 44
  • [10] Google Trends Topic-Based Uncertainty: A Multi-National Approach
    Schuetze, Florian
    [J]. 3RD INTERNATIONAL CONFERENCE ON ADVANCED RESEARCH METHODS AND ANALYTICS (CARMA 2020), 2020, : 191 - 199