Optimal Transport-based Alignment of Learned Character Representations for String Similarity

被引:0
|
作者
Tam, Derek [1 ]
Monath, Nicholas [1 ]
Kobren, Ari [1 ]
Traylor, Aaron [2 ]
Das, Rajarshi [1 ]
McCallum, Andrew [1 ]
机构
[1] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA
[2] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity-a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE's ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B-3 F1 over the previous state-of-the-art approach.
引用
收藏
页码:5907 / 5917
页数:11
相关论文
共 50 条
  • [21] Optimal Transport-Based One-Shot Federated Learning for Artificial Intelligence of Things
    Chiang, Yi-Han
    Terai, Koudai
    Chiang, Tsung-Wei
    Lin, Hai
    Ji, Yusheng
    Lui, John C. S.
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (02) : 2166 - 2180
  • [22] OPTIMAL TRANSPORT-BASED FULL WAVEFORM INVERSION FOR NONDESTRUCTIVE EVALUATION USING ULTRASONIC ARRAYS
    Rossatol, Daniel
    Passarin, Thiago A. R.
    Guarneri, Giovanni A.
    Pires, Gustavo P.
    Pipa, Daniel R.
    PROCEEDINGS OF 2024 51ST ANNUAL REVIEW OF PROGRESS IN QUANTITATIVE NONDESTRUCTIVE EVALUATION, QNDE2024, 2024,
  • [23] CLOT: CONTRASTIVE LEARNING-DRIVEN AND OPTIMAL TRANSPORT-BASED TRAINING FOR SIMULTANEOUS CLUSTERING
    Aburidi, Mohammed
    Marcia, Roummel
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1515 - 1519
  • [24] Optimal Transport-Based Deep Domain Adaptation Approach for Fault Diagnosis of Rotating Machine
    Liu, Zhao-Hua
    Jiang, Lin-Bo
    Wei, Hua-Liang
    Chen, Lei
    Li, Xiao-Hua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [25] Explainable Legal Case Matching via Inverse Optimal Transport-based Rationale Extraction
    Yu, Weijie
    Sun, Zhongxiang
    Xu, Jun
    Dong, Zhenhua
    Chen, Xu
    Xu, Hongteng
    Wen, Ji-Rong
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 657 - 668
  • [26] Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition
    Kim, Daeha
    Song, Byung Cheol
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [27] Optimal Transport-Based Deep Domain Adaptation Approach for Fault Diagnosis of Rotating Machine
    Liu, Zhao-Hua
    Jiang, Lin-Bo
    Wei, Hua-Liang
    Chen, Lei
    Li, Xiao-Hua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71 : 13 - 13
  • [28] Handling data-skewness in character based string similarity join using Hadoop
    Meena, Kanak
    Tayal, Devendra K.
    Castillo, Oscar
    Jain, Amita
    APPLIED COMPUTING AND INFORMATICS, 2022, 18 (1/2) : 22 - 44
  • [29] Optimal transport-based fusion of two-stream convolutional networks for action recognitionOptimal transport-based fusion of two-stream. . .S. Yenduri et al.
    Sravani Yenduri
    Madhavi Gudavalli
    Gayathri C
    Applied Intelligence, 2025, 55 (7)
  • [30] OPTIMAL TRANSPORT-BASED GRAPH MATCHING FOR 3D RETINAL OCT IMAGE REGISTRATION
    Tian, Xin
    Anantrasirichai, Nantheera
    Nicholson, Lindsay
    Achim, Alin
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2791 - 2795