Manu: A Cloud Native Vector Database Management System

被引:13
|
作者
Guo, Rentong [1 ]
Luan, Xiaofan [1 ]
Xiang, Long [2 ]
Yan, Xiao [2 ]
Yi, Xiaomeng [1 ]
Luo, Jigao [1 ,3 ]
Cheng, Qianya [1 ]
Xu, Weizhi [1 ]
Luo, Jiarui [2 ]
Liu, Frank [1 ]
Cao, Zhenshan [1 ]
Qiao, Yanliang [1 ]
Wang, Ting [1 ]
Tang, Bo [2 ]
Xie, Charles [1 ]
机构
[1] Zilliz, Redwood City, CA 94065 USA
[2] Southern Univ Sci & Technol, Shenzhen, Peoples R China
[3] Tech Univ Munich, Munich, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 12期
关键词
NEAREST-NEIGHBOR SEARCH; PRODUCT QUANTIZATION;
D O I
10.14778/3554821.3554843
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of learning-based embedding models, embedding vectors are widely used for analyzing and searching unstructured data. As vector collections exceed billion-scale, fully managed and horizontally scalable vector databases are necessary. In the past three years, through interaction with our 1200+ industry users, we have sketched a vision for the features that next-generation vector databases should have, which include long-term evolvability, tunable consistency, good elasticity, and high performance. We present Manu, a cloud native vector database that implements these features. It is difficult to integrate all these features if we follow traditional DBMS design rules. As most vector data applications do not require complex data models and strong data consistency, our design philosophy is to relax the data model and consistency constraints in exchange for the aforementioned features. Specifically, Manu firstly exposes the write-ahead log (WAL) and binlog as backbone services. Secondly, write components are designed as log publishers while all read-only analytic and search components are designed as independent subscribers to the log services. Finally, we utilize multi-version concurrency control (MVCC) and a delta consistency model to simplify the communication and cooperation among the system components. These designs achieve a low coupling among the system components, which is essential for elasticity and evolution. We also extensively optimize Manu for performance and usability with hardware-aware implementations and support for complex search semantics. Manu has been used for many applications, including, but not limited to, recommendation, multimedia, language, medicine and security. We evaluated Manu in three typical application scenarios to demonstrate its efficiency, elasticity, and scalability.
引用
收藏
页码:3548 / 3561
页数:14
相关论文
共 50 条
  • [41] Study on the performance database and database management of distribution system
    Liu, Jie
    Qiu, Mingde
    Yu, Xi
    Dianli Xitong Zidonghue/Automation of Electric Power Systems, 2000, 24 (10):
  • [42] The Manu Gradient as a study system for bird pollination
    Boehm, Mannfred M. A.
    Scholer, Micah N.
    Kennedy, Jeremiah J. C.
    Heavyside, Julian M.
    Daza, Aniceto
    Guevara-Apaza, David
    Jankowski, Jill E.
    BIODIVERSITY DATA JOURNAL, 2018, 6
  • [43] Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control
    Jochim, Markus
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 813 - 814
  • [44] Design and Implementation of Database Encryption System for Cloud Environment
    Zhang, Luwei
    Li, Tianyu
    Jun, Huang
    Li, Dongmin
    Zhu, Bei
    Li, Jing
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,
  • [45] monBench: A Database Performance Benchmark for Cloud Monitoring System
    Zhao, Xinkui
    Yin, Jianwei
    Zhi, Chen
    Lin, Pengxiang
    Feng, Shichun
    Wu, Hao
    Chen, Zuoning
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 523 - 524
  • [46] Ursa: Lightweight Resource Management for Cloud-Native Microservices
    Zhang, Yanqi
    Zhou, Zhuangzhuang
    Elnikety, Sameh
    Delimitrou, Christina
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 954 - 969
  • [47] Database Security Management for Healthcare SaaS in the Amazon AWS Cloud
    Bracci, Fabio
    Corradi, Antonio
    Foschini, Luca
    2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 812 - 819
  • [48] Intelligent Management of Virtualized Resources for Database Systems in Cloud Environment
    Xiong, Pengcheng
    Chi, Yun
    Zhu, Shenghuo
    Moon, Hyun Jin
    Pu, Calton
    Haciguemues, Hakan
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 87 - 98
  • [49] HiEngine: How to Architect a Cloud-Native Memory-Optimized Database Engine
    Ma, Yunus
    Xie, Siphrey
    Zhong, Henry
    Lee, Leon
    Lv, King
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2177 - 2190
  • [50] PolarDB-X: An Elastic Distributed Relational Database for Cloud-Native Applications
    Cao, Wei
    Li, Feifei
    Huang, Gui
    Lou, Jianghang
    Zhao, Jianwei
    He, Dengcheng
    Sun, Mengshi
    Zhang, Yingqiang
    Wang, Sheng
    Wu, Xueqiang
    Liao, Han
    Chen, Zilin
    Fang, Xiaojian
    Chen, Mo
    Liang, Chenghui
    Luo, Yanxin
    Wang, Huanming
    Wang, Songlei
    Ma, Zhanfeng
    Yang, Xinjun
    Peng, Xiang
    Ruan, Yubin
    Wang, Yuhui
    Zhou, Jie
    Wang, Jianying
    Hu, Qingda
    Kang, Junbin
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2859 - 2872