LOOKING IN TEXT WINDOWS - THEIR SIZE AND COMPOSITION

被引:9
|
作者
HAAS, SW
LOSEE, RM
机构
[1] School of Information, Library Science, University of North Carolina, Chapel Hill, NC 27599-3360
关键词
D O I
10.1016/0306-4573(94)90074-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A text window is a group of words appearing in contiguous positions in text. Intuitively, words in such close proximity should have something to do with each other. We can use the text window to exploit a variety of lexical, syntactic, and semantic relationships without having to analyze the text explicitly for their structure. This research supports the previously suggested idea that natural groupings of words are best treated as a unit of size 7 to 11 words, that is, plus or minus three to five words. Our text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristics of windows that best match terms in queries are examined in detail, revealing interesting differences between those for queries with good results and those for queries with poorer results. Queries with good results tend to contain more content word phrases and fewer terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-style relationships or incorporating statistical dependencies for terms within these windows.
引用
收藏
页码:619 / 629
页数:11
相关论文
共 50 条