Efficient in-memory extensible inverted file

被引:10
|
作者
Luk, Robert W. P.
Lam, Wai
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
关键词
information retrieval; indexing; optimization;
D O I
10.1016/j.is.2006.06.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The growing amount of on-line data demands efficient parallel and distributed indexing mechanisms to manage large resource requirements and unpredictable system failures. Parallel and distributed indices built using commodity hardware like personal computers (PCs) can substantially save cost because PCs are produced in bulk, achieving the scale of economy. However, PCs have limited amount of random access memory (RAM) and the effective utilization of RAM for in-memory inversion is crucial. This paper presents an analytical investigation and an empirical evaluation of storage-efficient in memory extensible inverted files, which are represented by fixed- or variable-sized linked list nodes. The size of these linked list nodes is determined by minimizing the storage wastes or maximizing storage utilization under different conditions, which lead to different storage allocation schemes. Minimizing storage wastes also reduces the number of address indirections (i.e., chaining). We evaluated our storage allocation schemes using a number of reference collections. We found that the arrival rate scheme is the best in terms of both storage utilization and the mean number of chainings per term. The final storage utilization can be over 90% in our evaluation if there is a sufficient number of documents indexed. The mean number of chainings is not large (less than 2.6 for all the reference collections). We have also showed that our best storage allocation scheme can be used for our extensible compressed inverted file. The final storage utilization of the extensible compressed inverted file can be over 90% in our evaluation provided that there is a sufficient number of documents indexed. The proposed storage allocation schemes can also be used by compressed extensible inverted files with word positions (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:733 / 754
页数:22
相关论文
共 50 条
  • [21] Performance Optimization of In-Memory File System in Distributed Storage System
    Li, Zhaowei
    Yan, Yunlong
    Mo, Jintao
    Wen, Zhaocong
    Wu, Junmin
    2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 280 - 281
  • [22] Wear Attacks and Defense Mechanisms for Persistent In-memory File Systems
    Yang C.-S.
    Zhuge Q.-F.
    Sha E.H.-M.
    Chen X.-Z.
    Wu L.
    Wu T.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (06): : 1909 - 1929
  • [23] A Consistency Mechanism for NVM-Based in-Memory File Systems
    Zha, Jin
    Huang, Linpeng
    Wu, Linzhu
    Zheng, Sheng-an
    Liu, Hao
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 197 - 204
  • [24] MemEFS: an Elastic In-Memory Runtime File System for eScience Applications
    Uta, Alexandru
    Sandu, Andreea
    Costache, Stefania
    Kielmann, Thilo
    2015 IEEE 11TH INTERNATIONAL CONFERENCE ON E-SCIENCE, 2015, : 465 - 474
  • [25] IMFSSC: An In-Memory Distributed File System Framework for Super Computing
    Li, Binyang
    Li, Bo
    Liu, Ming
    2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2016, : 132 - 137
  • [26] In-Memory Runtime File Systems for Many-Task Computing
    Uta, Alexandru
    Sandu, Andreea
    Morozan, Ion
    Kielmann, Thilo
    ADAPTIVE RESOURCE MANAGEMENT AND SCHEDULING FOR CLOUD COMPUTING (ARMS-CC 2014), 2014, 8907 : 3 - 5
  • [27] Efficient In-Memory Processing Using Spintronics
    Chowdhury, Zamshed
    Harms, Jonathan D.
    Khatamifard, S. Karen
    Zabihi, Masoud
    Lv, Yang
    Lyle, Andrew P.
    Sapatnekar, Sachin S.
    Karpuzcu, Ulya R.
    Wang, Jian-Ping
    IEEE COMPUTER ARCHITECTURE LETTERS, 2018, 17 (01) : 42 - 46
  • [28] Efficient In-memory Data Management: An Analysis
    Zhang, Hao
    Tudor, Bogdan Marius
    Chen, Gang
    Ooi, Beng Chin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (10): : 833 - 836
  • [29] Simba: Efficient In-Memory Spatial Analytics
    Xie, Dong
    Li, Feifei
    Yao, Bin
    Li, Gefei
    Zhou, Liang
    Guo, Minyi
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1071 - 1085
  • [30] IMCA: An Efficient In-Memory Convolution Accelerator
    Yantir, Hasan Erdem
    Eltawil, Ahmed M.
    Salama, Khaled N.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (03) : 447 - 460