Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

被引:1
|
作者
Felbur, Rafal [1 ]
Meelen, Marieke [2 ]
Vierthaler, Paul [3 ]
机构
[1] Leiden Univ, Leiden, Netherlands
[2] Univ Cambridge, Cambridge, England
[3] Coll William & Mary, Williamsburg, VA USA
基金
欧洲研究理事会;
关键词
Cross-linguistic STS; Information Retrieval; Buddhist Chinese; Classical Tibetan; Translation Studies;
D O I
10.5334/johd.86
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
In this paper we present the first -ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist s & umacr;tra literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross -lingual embedding space by taking the cosine similarity of average sequence vectors in order to produce unsupervised similar cross -linguistic parallel alignments at word, sentence, and even paragraph level. Initial results show that our method lays a solid foundation for the future development of a fully-fledged Information Retrieval tool for these (and potentially other) low -resource historical languages.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Influence of Token Similarity Measures for Semantic Textual Similarity
    Sowmya, V.
    Vardhan, Vishnu B.
    Raju, Bhadri M. S. V. S.
    2016 IEEE 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC), 2016, : 41 - 44
  • [2] FlexSTS: A Framework for Semantic Textual Similarity
    Freire, Janio
    Pinheiro, Vadia
    Feitosa, David
    LINGUAMATICA, 2016, 8 (02): : 23 - 31
  • [3] Semantic Textual Similarity in Bengali Text
    Shajalal, Md
    Aono, Masaki
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [4] Turkish Dataset for Semantic Textual Similarity
    Fikri, Figen Beken
    Oflazer, Kemal
    Yanikoglu, Berrin
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [5] Semantic Textual Similarity in Quality Estimation
    Bechara, Hanna
    Parra Escartin, Carla
    Orasan, Constantin
    Specia, Lucia
    BALTIC JOURNAL OF MODERN COMPUTING, 2016, 4 (02): : 256 - 268
  • [6] Linguistically Conditioned Semantic Textual Similarity
    Tu, Jingxuan
    Xu, Keer
    Yue, Liulu
    Ye, Bingyang
    Rim, Kyeongmin
    Pustejovsky, James
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1161 - 1172
  • [7] Correlation Coefficients and Semantic Textual Similarity
    Zhelezniak, Vitalii
    Savkov, Aleksandar
    Shen, April
    Hammerla, Nils Y.
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 951 - 962
  • [8] Czech Dataset for Semantic Textual Similarity
    Svoboda, Lukas
    Brychcin, Tomas
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 213 - 221
  • [9] Semantic Textual Similarity of Sentences with Emojis
    Debnath, Alok
    Pinnaparaju, Nikhil
    Shrivastava, Manish
    Varma, Vasudeva
    Augenstein, Isabelle
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 426 - 430
  • [10] Attention-Based Overall Enhance Network for Chinese Semantic Textual Similarity Measure
    Zhang, Hao
    Zhang, HuaXiong
    Lu, XingYu
    Gao, Qiang
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2022, 25 (02): : 287 - +