The Anatomy of a Search and Mining System for Digital Humanities

被引:0
|
作者
Harris, Martyn [1 ]
Levene, Mark [1 ]
Zhang, Dell [1 ]
Levene, Dan [2 ]
机构
[1] Birkbeck Univ London, Dept Comp Sci, London WC1E 7HX, England
[2] Univ Southampton, Sch Humanities, Hist, Southampton SO17 1BF, Hants, England
关键词
Digital Humanities; Statistical Language Model; Suffix Tree; Sequence Alignment; Collaborative Search;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Samtla (Search And Mining Tools with Linguistic Analysis) is an online integrated research environment designed in collaboration with historians and linguists to facilitate the study of digitised texts written in any language. It currently supports the research of two corpora: the Genizah collection held by the Taylor-Schechter Genizah Research Unit in Cambridge University, and a collection of Aramaic incantation texts from late antiquity. In contrast to standard search engines and text mining systems that rely on the bag-of-words representation of text, Samtla provides the retrieval and discovery of fuzzy text patterns/motifs (aka "formulae" to historians), which is achieved through applying a character-based n-gram statistical language model built on top of a powerful generalised suffix tree data structure. This paper briefly describes the major components of Samtla and their underlying techniques.
引用
收藏
页码:165 / 168
页数:4
相关论文
共 50 条
  • [1] Critical Digital Humanities: The Search for a Methodology
    Rodzvilla, John
    [J]. JOURNAL OF WEB LIBRARIANSHIP, 2019, 13 (04) : 332 - 333
  • [2] Critical digital humanities: the search for a methodology
    Mawasi, Areej
    [J]. PEDAGOGIES, 2021, 16 (02): : 218 - 221
  • [3] Critical Digital Humanities: The Search for a Methodology
    Gold, Matthew K.
    [J]. AMERICAN LITERARY HISTORY, 2022, 34 (04) : 1685 - 1687
  • [4] DIGITAL HUMANITIES IN SEARCH OF SELF-DEFINING
    Volodin, A. Yu.
    [J]. VESTNIK PERMSKOGO UNIVERSITETA-ISTORIYA-PERM UNIVERSITY HERALD-HISTORY, 2014, 26 (03): : 5 - 12
  • [5] The Anatomy of an Infrastructure for Digital Underground Mining
    Sreedharan, Sreekant
    Ramachandran, Muthu
    Ghosh, Soma
    Prakash, Suraj
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, IOTBDS 2023, 2023, : 218 - 225
  • [6] Semantically Enriched Line Search in a Humanities Digital Library
    Sanyal, Debarshi Kumar
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2169 - 2174
  • [7] Reading the Rebels and Mining the Maps: Digital Humanities and Cartographic Narratives
    Zacek, Natalie A.
    [J]. AMERICAN HISTORICAL REVIEW, 2016, 121 (01): : 167 - 175
  • [8] Exploring the digital humanities research agenda: a text mining approach
    Joo, Soohyung
    Hootman, Jennifer
    Katsurai, Marie
    [J]. JOURNAL OF DOCUMENTATION, 2022, 78 (04) : 853 - 870
  • [9] Humanities in the Digital World/Or Digital in the Humanities?
    Khanwalkar, Seema
    [J]. AMERICAN JOURNAL OF SEMIOTICS, 2017, 33 (1-2): : 69 - 82
  • [10] MOMFER: a search tool for motifs in folklore in the context of digital humanities
    Ardanuy, Jordi
    [J]. BID-TEXTOS UNIVERSITARIS DE BIBLIOTECONOMIA I DOCUMENTACIO, 2016, (36):