Conversational Composed Retrieval with Iterative Sequence Refinement

被引:1
|
作者
Wei, Hao [1 ,3 ]
Wang, Shuhui [1 ]
Xue, Zhe [2 ]
Chen, Shengbo [4 ]
Huang, Qingming [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Comput Tech, Key Lab Intell Info Proc, Beijing, Peoples R China
[2] BUPT, Beijing Key Lab Intelligent Telecommun Software, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
[4] Henan Univ, Sch Comp & Informat Engn, Kaifeng, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Cross-modal Retrieval; Conversational Search; Sequence Modeling;
D O I
10.1145/3581783.3611885
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the progress of large-scale multimodal model pretraining, existing cross-modal retrieval techniques is accurate to align text description to the target image when they show close and clear semantic correspondence. However, in real situations, users only provide ambiguous text queries, making it difficult to retrieve the desired images. To address this issue, we introduce the conversational composed retrieval paradigm, inspired by conversational search which models complex user intent through iterative interaction. This paradigm enhances the model capacity in learning fine-grained correspondences. To train the cross-modal conversational retrieval, we propose the Iterative Refining Retrieval (IRR) framework. It formalizes the reference images and modification texts in each session as a multimodal sequence, which is fed into the generative model to predict the information in the sequence autoregressively, and ultimately predicting the target image feature. In the conversational retrieval paradigm, the model refines the learned correspondences based on the interaction in the later stage of the retrieval session, thus captures fine-grained semantic correspondence to enforce the cross-modal representation. We propose a domain-specific multimodal pretraining method and the full sequence sampling augmentation method to fully utilize the session information. Extensive experiments demonstrate that the iterative refining retrieval method achieves state-of-the-art performance on sessions of varying lengths.
引用
收藏
页码:6390 / 6399
页数:10
相关论文
共 50 条
  • [21] Iterative refinement of Schur decompositions
    Bujanovic, Zvonimir
    Kressner, Daniel
    Schroeder, Christian
    NUMERICAL ALGORITHMS, 2023, 92 (01) : 247 - 267
  • [22] ITERATIVE REFINEMENT IN FLOATING POINT
    MOLER, CB
    JOURNAL OF THE ACM, 1967, 14 (02) : 318 - &
  • [23] Iterative refinement for Neville elimination
    Alonso, P.
    Delgado, J.
    Gallego, R.
    Pena, J. M.
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2009, 86 (02) : 341 - 353
  • [24] The Role of Precision for Iterative Refinement
    Lee, JunKyu
    Peterson, Gregory D.
    2012 SYMPOSIUM ON APPLICATION ACCELERATORS IN HIGH PERFORMANCE COMPUTING (SAAHPC), 2012, : 125 - 128
  • [25] Iterative Refinement for Linear Programming
    Gleixner, Ambros M.
    Steffy, Daniel E.
    Wolter, Kati
    INFORMS JOURNAL ON COMPUTING, 2016, 28 (03) : 449 - 464
  • [26] Iterative refinement of Schur decompositions
    Zvonimir Bujanović
    Daniel Kressner
    Christian Schröder
    Numerical Algorithms, 2023, 92 : 247 - 267
  • [27] An iterative sequence
    Henderson, D
    Lindsey, JH
    AMERICAN MATHEMATICAL MONTHLY, 2004, 111 (08): : 729 - 731
  • [28] Conversational Query Understanding Using Sequence to Sequence Modeling
    Ren, Gary
    Ni, Xiaochuan
    Malik, Manish
    Ke, Qifa
    WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 1715 - 1724
  • [29] Open-Retrieval Conversational Question Answering
    Qu, Chen
    Yang, Liu
    Chen, Cen
    Qiu, Minghui
    Croft, W. Bruce
    Iyyer, Mohit
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 539 - 548
  • [30] Integrating conversational case retrieval with generative planning
    Muñoz-Avila, H
    Aha, DW
    Breslow, LA
    Nau, DS
    Weber, R
    ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2001, 1898 : 210 - 221