Optimal Transport-based Alignment of Learned Character Representations for String Similarity

被引:0
|
作者
Tam, Derek [1 ]
Monath, Nicholas [1 ]
Kobren, Ari [1 ]
Traylor, Aaron [2 ]
Das, Rajarshi [1 ]
McCallum, Andrew [1 ]
机构
[1] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA
[2] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity-a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE's ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B-3 F1 over the previous state-of-the-art approach.
引用
收藏
页码:5907 / 5917
页数:11
相关论文
共 50 条
  • [1] Similarity and economy of scale in urban transportation networks and optimal transport-based infrastructures
    Leite, Daniela
    De Bacco, Caterina
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [2] Efficient and Effective Optimal Transport-Based Biclustering
    Fettal, Chakib
    Labiod, Lazhar
    Nadif, Mohamed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning
    Lee, Seonghyeon
    Lee, Dongha
    Jang, Seongbo
    Yu, Hwanjo
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5969 - 5979
  • [4] An optimal transport-based characterization of convex order
    Wiesel, Johannes
    Zhang, Erica
    DEPENDENCE MODELING, 2023, 11 (01):
  • [5] Enhancing Multi-modal Contrastive Learning via Optimal Transport-Based Consistent Modality Alignment
    Zhu, Sidan
    Luo, Dixin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 157 - 171
  • [6] Optimal Transport-Based Polar Interpolation of Directional Fields
    Solomon, Justin
    Vaxman, Amir
    ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
  • [7] Convergence Properties of Optimal Transport-Based Temporal Networks
    Baptista, Diego
    De Bacco, Caterina
    COMPLEX NETWORKS & THEIR APPLICATIONS X, VOL 1, 2022, 1015 : 578 - 592
  • [8] Convergence properties of optimal transport-based temporal hypergraphs
    Diego Baptista
    Caterina De Bacco
    Applied Network Science, 8
  • [9] Convergence properties of optimal transport-based temporal hypergraphs
    Baptista, Diego
    De Bacco, Caterina
    APPLIED NETWORK SCIENCE, 2023, 8 (01)
  • [10] An Unbalanced Optimal Transport-Based Approach for Robust Dictionary Learning
    Wang, Shengjia
    Wang, Zhiguo
    Zhao, Xi-Le
    Shen, Xiaojing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025,