Unified Visual Preference Learning for User Intent Understanding

被引:1
|
作者
Wen, Yihua [1 ,3 ]
Chen, Si [2 ]
Tian, Yu [1 ,3 ]
Guan, Wanxian [2 ]
Wang, Pengjie [2 ]
Deng, Hongbo [2 ]
Xu, Jian [2 ]
Zheng, Bo [2 ]
Li, Zihao [2 ]
Zou, Lixin [1 ]
Li, Chenliang [1 ]
机构
[1] Wuhan Univ, Minist Educ, Sch Cyber Sci & Engn, Key Lab Aerosp Informat Secur & Trusted Comp, Wuhan, Peoples R China
[2] Alibaba Grp China, Hangzhou, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual Preference Modeling; Disentangled Representation Learning; Search and Recommendation; SEARCH;
D O I
10.1145/3616855.3635858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the world of E-Commerce, the core task is to understand the personalized preference from various kinds of heterogeneous information, such as textual reviews, item images and historical behaviors. In current systems, these heterogeneous information are mainly exploited to generate better item or user representations. For example, in scenario of visual search, the importance of modeling query image has been widely acknowledged. But, these existing solutions focus on improving the representation quality of the query image, overlooking the personalized visual preference of the user. Note that the visual features affect the user's decision significantly, e.g., a user could be more likely to click the items with her preferred design. Hence, it is fruitful to exploit the visual preference to deliver better capacity for personalization. To this end, we propose a simple yet effective target-aware visual preference learning framework (named Tavern) for both item recommendation and search. The proposed Tavern works as an individual and generic model that can be smoothly plugged into different downstream systems. Specifically, for visual preference learning, we utilize the image of the target item to derive the visual preference signals for each historical clicked item. This procedure is modeled as a form of representation disentanglement, where the visual preference signals are extracted by taking off the noisy information irrelevant to visual preference from the shared visual information between the target and historical items. During this process, a novel selective orthogonality disentanglement is proposed to avoid the significant information loss. Then, a GRU network is utilized to aggregate these signals to form the final visual preference representation. Extensive experiments over three large-scale real-world datasets covering visual search, product search and recommendation well demonstrate the superiority of our proposed Tavern against existing technical alternatives. Further ablation study also confirms the validity of each design choice
引用
收藏
页码:816 / 825
页数:10
相关论文
共 50 条
  • [21] User preference through learning user profile for ubiquitous recommendation systems
    Jung, Kyung-Yong
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2006, 4251 : 163 - 170
  • [22] Multimodal Analysis of Image Search Intent Intent Recognition in Image Search from User Behavior and Visual Content
    Soleymani, Mohammad
    Riegler, Michael
    Halvorsen, Pal
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 256 - 264
  • [23] Deep Learning for Visual Understanding
    Porikli, Fatih
    Shan, Shiguang
    Snoek, Cees
    Sukthankar, Rahul
    Wang, Xiaogang
    IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 24 - 25
  • [24] Understanding user intent modeling for conversational recommender systems: a systematic literature review
    Farshidi, Siamak
    Rezaee, Kiyan
    Mazaheri, Sara
    Rahimi, Amir Hossein
    Dadashzadeh, Ali
    Ziabakhsh, Morteza
    Eskandari, Sadegh
    Jansen, Slinger
    USER MODELING AND USER-ADAPTED INTERACTION, 2024, : 1643 - 1706
  • [25] An Efficient Way of Answering the Questions Asked on Social Sites by Understanding User Intent
    Kharche, Shital E.
    Mante, Ravi V.
    2017 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRICAL, ELECTRONICS AND COMPUTING TECHNOLOGIES (ICRTEECT), 2017, : 159 - 163
  • [26] Multi-Task Deep Learning Design and Training Tool for Unified Visual Driving Scene Understanding
    Won, Woong-Jae
    Kim, Tae Hun
    Kwon, Soon
    2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 356 - 360
  • [27] Visual Query Suggestion: Towards Capturing User Intent in Internet Image Search
    Zha, Zheng-Jun
    Yang, Linjun
    Mei, Tao
    Wang, Meng
    Wang, Zengfu
    Chua, Tat-Seng
    Hua, Xian-Sheng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (03)
  • [28] Learning a Unified Embedding for Visual Search at Pinterest
    Zhai, Andrew
    Wu, Hao-Yu
    Tzeng, Eric
    Park, Dong Huk
    Rosenberg, Charles
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2412 - 2420
  • [29] Learning to Infer User Implicit Preference in Conversational Recommendation
    Hu, Chenhao
    Huang, Shuhua
    Zhang, Yansen
    Liu, Yubao
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 256 - 266
  • [30] A model of machine learning based on user preference of attributes
    Yao, Yiyu
    Zhao, Yan
    Wang, Jue
    Han, Suqing
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2006, 4259 : 587 - 596