Unitary operators on the document space

被引:11
|
作者
Hoenkamp, E [1 ]
机构
[1] Univ Nijmegen, Nijmegen Inst Cognit Res & Informat Technol, NL-6525 HR Nijmegen, Netherlands
关键词
D O I
10.1002/asi.10211
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条