Text compression via alphabet re-representation

被引:2
|
作者
Long, PM
Natsev, AI
Vitter, JS
机构
关键词
D O I
10.1109/DCC.1997.582003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of this implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.
引用
收藏
页码:161 / 170
页数:10
相关论文
共 50 条
  • [31] Towards a category theory approach to analogy: Analyzing re-representation and acquisition of numerical knowledge
    Navarrete, Jairo A.
    Dartnell, Pablo
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (08)
  • [32] Towards a Multi-level Exploration of Human and Computational Re-representation in Unified Cognitive Frameworks
    Olteteanu, Ana-Maria
    Schoettner, Mikkel
    Bahety, Arpit
    FRONTIERS IN PSYCHOLOGY, 2019, 10
  • [33] A large-alphabet-oriented scheme for Chinese and English text compression
    Gu, HY
    SOFTWARE-PRACTICE & EXPERIENCE, 2005, 35 (11): : 1027 - 1039
  • [34] An Efficient Technique for Representation and Compression of Bengali Text
    Mokter, Md Farhad
    Akter, Sumya
    Uddin, Md. Palash
    Ibn Afjal, Masud
    Al Mamun, Md.
    Abu Marjan, Md.
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [35] Feature Re-Representation and Reliable Pseudo Label Retraining for Cross-Domain Semantic Segmentation
    Li, Jing
    Zhou, Kang
    Qian, Shenhan
    Li, Wen
    Duan, Lixin
    Gao, Shenghua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1682 - 1694
  • [36] A Film Unfinished: Yael Hersonski's Re-representation of Archival Footage from the Warsaw Ghetto
    Boeser, Ursula
    FILM CRITICISM, 2012, 37 (02) : 38 - 56
  • [37] Procedures of Extending the Alphabet in Combined Coding for Prediction by Partial String Matching in Text Compression
    Radescu, Radu
    Pasca, Sever
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE - ECAI 2017, 2017,
  • [38] Superior guarantees for sequential prediction and lossless compression via alphabet decomposition
    Begleiter, R
    El-Yaniv, R
    JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 379 - 411
  • [39] Superior guarantees for sequential prediction and lossless compression via alphabet decomposition
    Department of Computer Science, Technion - Israel Institute of Technology, Haifa 32000, Israel
    J. Mach. Learn. Res., 2006, (379-411):
  • [40] Developing an Efficient Algorithm for Representation and Compression of Large Bengali Text
    Abu Marjan, Md.
    Uddin, Md. Palash
    Ibn Afjal, Masud
    Haque, Md. Dulal
    2014 9TH INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGY (IFOST), 2014, : 22 - 25