Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities

被引:10
|
作者
Dogucu, Mine [1 ]
Cetinkaya-Rundel, Mine [2 ,3 ,4 ]
机构
[1] Univ Calif Irvine, Dept Stat, Irvine, CA 92697 USA
[2] Univ Edinburgh, Sch Math, Edinburgh, Midlothian, Scotland
[3] RStudio, Boston, MA USA
[4] Duke Univ, Dept Stat Sci, Durham, NC USA
关键词
Curriculum; Data science; R language; Teaching; Web scraping;
D O I
10.1080/10691898.2020.1787116
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Best practices in statistics and data science courses include the use of real and relevant data as well as teaching the entire data science cycle starting with importing data. A rich source of real and current data is the web, where data are often presented and stored in a structure that needs some wrangling and transforming before they can be ready for analysis. The web is a resource students naturally turn to for finding data for data analysis projects, but without formal instruction on how to get that data into a structured format, they often resort to copy-pasting or manual entry into a spreadsheet, which are both time consuming and error-prone. Teaching web scraping provides an opportunity to bring such data into the curriculum in an effective and efficient way. In this article, we explain how web scraping works and how it can be implemented in a pedagogically sound and technically executable way at various levels of statistics and data science curricula. We provide classroom activities where we connect this modern computing technique with traditional statistical topics. Finally, we share the opportunities web scraping brings to the classrooms as well as the challenges to instructors and tips for avoiding them.
引用
下载
收藏
页码:S112 / S122
页数:11
相关论文
共 50 条
  • [31] Challenges and Opportunities for Data Science in Women's Health
    Edwards, Todd L.
    Greene, Catherine A.
    Piekos, Jacqueline A.
    Hellwege, Jacklyn N.
    Hampton, Gabrielle
    Jasper, Elizabeth A.
    Edwards, Digna R. Velez
    ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, 2023, 6 : 23 - 45
  • [32] Data science in drug discovery safety: Challenges and opportunities
    Coltman, Nicholas J.
    Roberts, Ruth A.
    Sidaway, James E.
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (21) : 1993 - 2000
  • [33] Science opportunities and challenges associated with SKA big data
    Tao An
    Science China(Physics,Mechanics & Astronomy), 2019, Mechanics & Astronomy)2019 (08) : 125 - 130
  • [34] Science opportunities and challenges associated with SKA big data
    An, Tao
    SCIENCE CHINA-PHYSICS MECHANICS & ASTRONOMY, 2019, 62 (08)
  • [35] Bringing Inclusive Diversity to Data Science: Opportunities and Challenges
    Maestre, Heriberto Acosta
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3596 - 3596
  • [36] Science opportunities and challenges associated with SKA big data
    Tao An
    Science China Physics, Mechanics & Astronomy, 2019, 62
  • [37] Crowdfunding of Science and Open Data: Opportunities, Challenges, and Policies
    Paseri, Ludovica
    ELECTRONIC GOVERNMENT AND THE INFORMATION SYSTEMS PERSPECTIVE, EGOVIS 2019, 2019, 11709 : 3 - 15
  • [38] The Value of Web Data Scraping: An Application to TripAdvisor
    Barbera, Gianluca
    Araujo, Luiz
    Fernandes, Silvia
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)
  • [39] Data Science in Physical Medicine and Rehabilitation Opportunities and Challenges
    Ottenbacher, Kenneth J.
    Graham, James E.
    Fisher, Steve R.
    PHYSICAL MEDICINE AND REHABILITATION CLINICS OF NORTH AMERICA, 2019, 30 (02) : 459 - +
  • [40] Health data science: Challenges and opportunities in Latin America
    Manuel Rosa, Juan
    Ludmila Frutos, Eliana
    REVISTA MEDICA CLINICA LAS CONDES, 2022, 33 (06): : 591 - 597