Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning

被引:0
|
作者
Wang, Jue [1 ,2 ]
Kim, Gyoorie [1 ,2 ]
Chang, Kevin Chen-Chuan [3 ]
机构
[1] Univ Toronto, Dept Geog & Planning, 100 St George St, Toronto, ON M5S 3G3, Canada
[2] Univ Toronto Mississauga, Dept Geog Geomat & Environm, 3359 Mississauga Rd, Mississauga, ON L5L 1C6, Canada
[3] Univ Illinois, Dept Comp Sci, 201 North Goodwin Ave, Urbana, IL USA
关键词
Food environment; Food words; Food energy density; Machine learning; Health geography; Geographic information science; INFORMATION-SYSTEMS; FUNCTIONAL REGIONS; TWITTER DATA; WEIGHT-GAIN; URBAN AREAS; ENVIRONMENT; INFRASTRUCTURE; PERSPECTIVES; COMMUNITIES; WALKABILITY;
D O I
10.1186/s12942-023-00344-5
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
BackgroundThe exponential growth of location-based social media (LBSM) data has ushered in novel prospects for investigating the urban food environment in health geography research. However, previous studies have primarily relied on word dictionaries with a limited number of food words and employed common-sense categorizations to determine the healthiness of those words. To enhance the analysis of the urban food environment using LBSM data, it is crucial to develop a more comprehensive list of food-related words. Within the context, this study delves into the exploration of expanding food-related words along with their associated energy densities.MethodsThis study addresses the aforementioned research gap by introducing a novel methodology for expanding the food-related word dictionary and predicting energy densities. Seed words are generated from official and crowdsourced food composition databases, and new food words are discovered by clustering food words within the word embedding space using the Gaussian mixture model. Machine learning models are employed to predict the energy density classifications of these food words based on their feature vectors. To ensure a thorough exploration of the prediction problem, ten widely used machine learning models are evaluated.ResultsThe approach successfully expands the food-related word dictionary and accurately predicts food energy density (reaching 91.62%.). Through a comparison of the newly expanded dictionary with the initial seed words and an analysis of Yelp reviews in the city of Toronto, we observe significant improvements in identifying food words and gaining a deeper understanding of the food environment.ConclusionsThis study proposes a novel method to expand food-related vocabulary and predict the food energy density based on machine learning and word embedding. This method makes a valuable contribution to building a more comprehensive list of food words that can be used in geography and public health studies by mining geotagged social media data.
引用
收藏
页数:16
相关论文
共 3 条
  • [1] Empowering health geography research with location-based social media data: innovative food word expansion and energy density prediction via word embedding and machine learning
    Jue Wang
    Gyoorie Kim
    Kevin Chen-Chuan Chang
    International Journal of Health Geographics, 22
  • [2] Prediction and Classification of User Activities Using Machine Learning Models from Location-Based Social Network Data
    Khan, Naimat Ullah
    Wan, Wanggen
    Riaz, Rabia
    Jiang, Shuitao
    Wang, Xuzhi
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [3] Suicidal ideation prediction based on social media posts using a GAN-infused deep learning framework with genetic optimization and word embedding fusion
    Kancharapu R.
    Ayyagari S.N.
    International Journal of Information Technology, 2024, 16 (4) : 2577 - 2593