Building A Large Collection of Multi-domain Electronic Theses and Dissertations

被引:5
|
作者
Uddin, Sami [1 ]
Banerjee, Bipasha [2 ]
Wu, Jian [1 ]
Ingram, William A. [3 ]
Fox, Edward A. [2 ]
机构
[1] Old Dominion Univ, Comp Sci, Norfolk, VA 23529 USA
[2] Virginia Polytech Inst & State Univ, Comp Sci, Blacksburg, VA USA
[3] Virginia Polytech Inst & State Univ, Univ Lib, Blacksburg, VA USA
关键词
ETD; OAI-PMH; Big data;
D O I
10.1109/BigData52589.2021.9672058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we report our progress on building a collection containing over 450k Electronic Theses and Dissertations (ETDs), including full-text and metadata. Our goal is to close the gap of accessibility between long text and short text documents, and to create a new research opportunity for the scholarly community. For that, we developed an ETD Ingestion Framework (EIF) that automatically harvests metadata and PDFs of ETDs from university libraries. We faced multiple challenges and learned many lessons during the process, that led to proposed solutions to overcome/mitigate the limitations of the current data. We also described the data that we have collected. We hope our methods will be useful for building similar collections from university libraries and that the data can be used for research and education.
引用
收藏
页码:6043 / 6045
页数:3
相关论文
共 50 条
  • [41] A Large Multilingual and Multi-domain Dataset for Recommender Systems
    Di Tommaso, Giorgia
    Faralli, Stefano
    Velardi, Paola
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2806 - 2813
  • [42] Designing a Model for Description of Theses and Dissertations Information on a Large Scale
    Alidousti, Sirous
    Khosrowjerdi, Mahmood
    Shahriari, Parviz
    Shirani, Farhad
    Tarnoni, Hamideh Beyrami
    LIBRI, 2009, 59 (03): : 180 - 197
  • [43] Enabling privacy by anonymization in the collection of similar data in multi-domain IoT
    Neto, Renato Caminha Juacaba
    Merindol, Pascal
    Theoleyre, Fabrice
    COMPUTER COMMUNICATIONS, 2023, 203 : 60 - 76
  • [45] Electronic theses and dissertations in Nigeria university libraries Status, challenges and strategies
    Ezema, Ifeanyi J.
    Ugwu, C. I.
    ELECTRONIC LIBRARY, 2013, 31 (04): : 493 - 507
  • [46] Slipping through the net: the paradox of nursing's electronic theses and dissertations
    Macduff, C.
    Goodfellow, L. M.
    Nolfi, D.
    Copeland, S.
    Leslie, G. D.
    Blackwood, D.
    INTERNATIONAL NURSING REVIEW, 2016, 63 (02) : 267 - 276
  • [47] An Analysis of Evolving Metadata Influences, Standards, and Practices in Electronic Theses and Dissertations
    Potvin, Sarah
    Thompson, Santi
    LIBRARY RESOURCES & TECHNICAL SERVICES, 2016, 60 (02): : 99 - 114
  • [48] Mandatory Open Access Publishing for Electronic Theses and Dissertations: Ethics and Enthusiasm
    Hawkins, Ann R.
    Kimball, Miles A.
    Ives, Maura
    JOURNAL OF ACADEMIC LIBRARIANSHIP, 2013, 39 (01): : 32 - 60
  • [49] An Experiment with the Use of ChatGPT for LCSH Subject Assignment on Electronic Theses and Dissertations
    Chow, Eric H. C.
    Kao, T. J.
    Li, Xiaoli
    CATALOGING & CLASSIFICATION QUARTERLY, 2024, 62 (05) : 574 - 588
  • [50] MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries
    Choudhury, Muntabir Hasan
    Salsabil, Lamia
    Jayanetti, Himarsha R.
    Wu, Jian
    Ingram, William A.
    Fox, Edward A.
    2023 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, JCDL, 2023, : 61 - 65