Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy

被引:0
|
作者
Aioanei, Andrei C. [1 ]
Hunziker-Rodewald, Regine R. [1 ]
Klein, Konstantin M. [2 ]
Michels, Dominik L. [3 ]
机构
[1] Univ Strasbourg, Fac Theol & Religious Sci, Strasbourg, France
[2] Univ Amsterdam, Fac Humanities Hist Ancient Hist, Amsterdam, Netherlands
[3] KAUST, Visual Comp Ctr, Thuwal, Saudi Arabia
来源
PLOS ONE | 2024年 / 19卷 / 04期
关键词
D O I
10.1371/journal.pone.0299297
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model's capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Enabling business sustainability for stock market data using machine learning and deep learning approaches
    Divyashree, S.
    Joshua, Christy Jackson
    Md, Abdul Quadir
    Mohan, Senthilkumar
    Abdullah, A. Sheik
    Mohamad, Ummul Hanan
    Innab, Nisreen
    Ahmadian, Ali
    [J]. ANNALS OF OPERATIONS RESEARCH, 2024,
  • [2] Synthetic Data for Deep Learning
    Horvath, Blanka
    [J]. QUANTITATIVE FINANCE, 2022, 22 (03) : 423 - 425
  • [3] Synthetic data protection: Towards a paradigm change in data regulation?
    Beduschi, Ana
    [J]. BIG DATA & SOCIETY, 2024, 11 (01):
  • [4] Generating Synthetic Sensor Data to Facilitate Machine Learning Paradigm for Prediction of Building Fire Hazard
    Tam, Wai Cheong
    Fu, Eugene Yujun
    Peacock, Richard
    Reneke, Paul
    Wang, Jun
    Li, Jiajia
    Cleary, Thomas
    [J]. FIRE TECHNOLOGY, 2023, 59 (06) : 3027 - 3048
  • [5] Generating Synthetic Sensor Data to Facilitate Machine Learning Paradigm for Prediction of Building Fire Hazard
    Wai Cheong Tam
    Eugene Yujun Fu
    Richard Peacock
    Paul Reneke
    Jun Wang
    Jiajia Li
    Thomas Cleary
    [J]. Fire Technology, 2023, 59 : 3027 - 3048
  • [6] Machine learning and the politics of synthetic data
    Jacobsen, Benjamin N.
    [J]. BIG DATA & SOCIETY, 2023, 10 (01)
  • [7] From feature to paradigm: Deep learning in machine translation
    [J]. Marta, Costa-JussàR. (marta.ruiz@upc.edu), 1600, AI Access Foundation (61):
  • [8] From Feature to Paradigm: Deep Learning in Machine Translation
    Costa-Jussa, Marta R.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 947 - 974
  • [9] A deep learning paradigm for medical imaging data
    Chen, Jinyang
    Park, Cheolwoo
    [J]. Expert Systems with Applications, 2024, 255
  • [10] A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning
    Shaveta Dargan
    Munish Kumar
    Maruthi Rohit Ayyagari
    Gulshan Kumar
    [J]. Archives of Computational Methods in Engineering, 2020, 27 : 1071 - 1092