Automated Extraction of Structured Data from the Social Network Instagram

被引:0
|
作者
Frantis, Petr [1 ]
Bures, Michel [1 ]
Coufalikova, Aneta [1 ]
Klaban, Ivo [1 ]
机构
[1] Univ Def, Fac Mil Technol, Dept Informat & Cyber Operat, Brno, Czech Republic
关键词
Instagram; Profiling; Instagram Private API; Automation; Osintgram; !text type='Python']Python[!/text;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper explores the extraction of structured information from the social network Instagram through a suitable application programming interface, namely the unofficial Instagram Private API. It focuses on creating a computer program that identifies which posts a user has tagged as "Likes" and then stores this information for profiling specific user profiles. The introduction of the paper highlights the general use of social media in modern society and the importance of personal data for these platforms. It specifies the aim of the study, which is to extract information from Instagram and then analyse it for user profiling. It then describes the evolution of the social network Instagram and key features such as different types of posts. This paper further focuses on the solution and implementation by using Python programming language to minimize the load on Instagram servers and reduce the risk of detection of automated processes. It describes the process of setting up new Instagram accounts, the obstacles in obtaining login credentials, and the need to simulate human behaviour to bypass the network's defence mechanisms. It then focuses on the actual retrieval of information such as the users followed, their posts and information about which posts the user has marked as favourites. It mentions that extracting data from closed profiles is difficult and elaborates on the technical challenges associated with this task. A significant part of this paper is a discussion of Instagram's defence mechanisms that respond to automated computer programs. It describes access denial, account blocking, and identity verification prompts such as CAPTCHA tests. Finally, the conclusion summarizes the results obtained, which indicate the acquisition of approximately 90,000 records for user profiling. It discusses the shortcomings of a fully automated solution due to Instagram's account creation conditions and defence mechanisms. It mentions the need for further research and highlights key gaps and challenges in this area. Overall, the study highlights the technical and security challenges in extracting information from Instagram and emphasises the need for further research and improvements in the technical procedures for extracting data from the platform.
引用
收藏
页码:157 / 164
页数:8
相关论文
共 50 条
  • [21] THE EXTRACTION OF LINE-STRUCTURED DATA FROM ENGINEERING DRAWINGS
    CLEMENT, TP
    PATTERN RECOGNITION, 1981, 14 (1-6) : 43 - 52
  • [22] Interactive tuples extraction from semi-structured data
    Gilleron, Remi
    Marty, Patrick
    Tommasi, Marc
    Torre, Fabien
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 997 - 1004
  • [23] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [24] Web Service for Data Extraction from Semi-structured Data Sources
    Yashina, Marina V.
    Nakonechnyy, Ivan I.
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON DEPENDABILITY AND COMPLEX SYSTEMS DEPCOS-RELCOMEX, 2014, 286 : 499 - 510
  • [25] From Twitter to Instagram: Which social network do fashion and beauty magazines choose?
    de Travesedo Rojas, Ruth Gomez
    Gil Ramirez, Marta
    REVISTA ICONO 14-REVISTA CIENTIFICA DE COMUNICACION Y TECNOLOGIAS, 2020, 18 (01): : 179 - 202
  • [26] Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
    Zhang, Youshan
    Allem, Jon-Patrick
    Unger, Jennifer Beth
    Cruz, Tess Boley
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2018, 20 (11)
  • [27] Supervised link prediction using structured-based feature extraction in social network
    Kumari, Anisha
    Behera, Ranjan Kumar
    Sahoo, Kshira Sagar
    Nayyar, Ananda
    Luhach, Ashish Kumar
    Sahoo, Satya Prakash
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (13):
  • [28] An automated integration approach for semi-structured and structured data
    Lim, SJ
    Ng, YK
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON COOPERATIVE DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2000, : 12 - 21
  • [29] Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
    Roman, Theodore
    Xie, Lu
    Schwartz, Russell
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (10)
  • [30] Data2Text Studio: Automated Text Generation from Structured Data
    Dou, Longxu
    Qin, Guanghui
    Wang, Jinpeng
    Yao, Jin-Ge
    Lin, Chin-Yew
    CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 13 - 18