Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy

被引:0
|
作者
Aioanei, Andrei C. [1 ]
Hunziker-Rodewald, Regine R. [1 ]
Klein, Konstantin M. [2 ]
Michels, Dominik L. [3 ]
机构
[1] Univ Strasbourg, Fac Theol & Religious Sci, Strasbourg, France
[2] Univ Amsterdam, Fac Humanities Hist Ancient Hist, Amsterdam, Netherlands
[3] KAUST, Visual Comp Ctr, Thuwal, Saudi Arabia
来源
PLOS ONE | 2024年 / 19卷 / 04期
关键词
D O I
10.1371/journal.pone.0299297
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model's capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Synthetic data enable experiments in atomistic machine learning
    Gardner, John L. A.
    Beaulieu, Zoe Faure
    Deringer, Volker L.
    [J]. DIGITAL DISCOVERY, 2023, 2 (03): : 651 - 662
  • [32] Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges
    Goshisht, Manoj Kumar
    [J]. ACS OMEGA, 2024, 9 (09): : 9921 - 9945
  • [33] Big Data, Data Mining, Machine Learning, and Deep Learning Concepts in Crime Data
    Ates, Emre Cihan
    Bostanci, Erkan
    Guzel, Mehmet Serdar
    [J]. JOURNAL OF PENAL LAW AND CRIMINOLOGY-CEZA HUKUKU VE KRIMINOLOJI DERGISI, 2020, 8 (02): : 293 - 319
  • [34] Wavelet extreme learning machine and deep learning for data classification
    Yahia, Siwar
    Said, Salwa
    Zaied, Mourad
    [J]. NEUROCOMPUTING, 2022, 470 : 280 - 289
  • [35] Synthetic Data Generation for Deep Learning in Counting Pedestrians
    Ekbatani, Hadi Keivan
    Pujol, Oriol
    Segui, Santi
    [J]. ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 318 - 323
  • [36] A deep learning framework to generate synthetic mobility data
    Arkangil, Eren
    Yildirimoglu, Mehmet
    Kim, Jiwon
    Prato, Carlo
    [J]. 2023 8TH INTERNATIONAL CONFERENCE ON MODELS AND TECHNOLOGIES FOR INTELLIGENT TRANSPORTATION SYSTEMS, MT-ITS, 2023,
  • [37] A Deep Learning Framework for Soft Robots with Synthetic Data
    Sapai, Shageenderan
    Loo, Junn Yong
    Ding, Ze Yang
    Tan, Chee Pin
    Baskaran, Vishnu Monn
    Nurzaman, Surya Girinatha
    [J]. SOFT ROBOTICS, 2023, 10 (06) : 1224 - 1240
  • [38] Synthetic seismic data for training deep learning networks
    Merrifield, Tom P.
    Griffith, Donald P.
    Zamanian, S. Ahmad
    Gesbert, Stephane
    Sen, Satyakee
    Guzman, Jorge De La Torre
    Potter, R. David
    Kuehl, Henning
    [J]. INTERPRETATION-A JOURNAL OF SUBSURFACE CHARACTERIZATION, 2022, 10 (03): : SE31 - SE39
  • [39] DEEP LEARNING METHANE RETRIEVALS BASED ON SYNTHETIC DATA
    Schmidt, Johannes
    Basili, Patrizia
    Sang, Bernhard
    Foerstner, Roger
    [J]. 2022 12TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2022,
  • [40] Synergy of physics-based reasoning and machine learning in biomedical applications: towards unlimited deep learning with limited data
    Gavrishchaka, Valeriy
    Senyukova, Olga
    Koepke, Mark
    [J]. ADVANCES IN PHYSICS-X, 2019, 4 (01):