An Automatic Method for Extracting Citations From Google Books

被引:37
|
作者
Kousha, Kayvan [1 ]
Thelwall, Mike [1 ]
机构
[1] Wolverhampton Univ, Sch Technol, Stat Cybermetr Res Grp, Wolverhampton WV1 1LY, W Midlands, England
关键词
citation analysis; experiments; SOCIAL-SCIENCES; HUMANITIES; MONOGRAPHS; CHAPTERS; IMPACT; OUTPUT;
D O I
10.1002/asi.23170
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent studies have shown that counting citations from books can help scholarly impact assessment and that Google Books (GB) is a useful source of such citation counts, despite its lack of a public citation index. Searching GB for citations produces approximate matches, however, and so its raw results need time-consuming human filtering. In response, this article introduces a method to automatically remove false and irrelevant matches from GB citation searches in addition to introducing refinements to a previous GB manual citation extraction method. The method was evaluated by manual checking of sampled GB results and comparing citations to about 14,500 monographs in the Thomson Reuters Book Citation Index (BKCI) against automatically extracted citations from GB across 24 subject areas. GB citations were 103% to 137% as numerous as BKCI citations in the humanities, except for tourism (72%) and linguistics (91%), 46% to 85% in social sciences, but only 8% to 53% in the sciences. In all cases, however, GB had substantially more citing books than did BKCI, with BKCI's results coming predominantly from journal articles. Moderate correlations between the GB and BKCI citation counts in social sciences and humanities, with most BKCI results coming from journal articles rather than books, suggests that they could measure the different aspects of impact, however.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 50 条
  • [1] Extracting Frame-Like Structures from Google Books NGram Dataset
    Ivanov, Vladimir
    [J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 18 - 27
  • [2] Extracting information from PDF documents for use in automatic indexing of e-books
    Gil-Leiva, Isidoro
    Lopes Fujita, Mariangela Spotti
    Redigolo, Franciele Marques
    Saran, Jordan Ferreira
    [J]. TRANSINFORMACAO, 2022, 34
  • [3] 'Books @ Google' (Jason Epstein's 'Books at Google')
    Friedman, Peter
    [J]. NEW YORK REVIEW OF BOOKS, 2006, 53 (19) : 69 - 69
  • [4] An automatic method of extracting contours from ultrasound medical images
    Zhang, H
    Song, GD
    Jiang, F
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 3, 2004, : 749 - 753
  • [5] A Flexible Approach for Extracting Metadata From Bibliographic Citations
    Cortez, Eli
    da Silva, Altigran S.
    Goncalves, Marcos Andre
    Mesquita, Filipe
    de Moura, Edleno S.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (06): : 1144 - 1158
  • [6] Extracting semantic predications from MEDLINE citations for pharmacogenomics
    Ahlers, Caroline B.
    Fiszman, Marcelo
    Demner-Fushman, Dina
    Lang, Francois-Michel
    Rindflesch, Thomas C.
    [J]. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, 2007, : 209 - +
  • [7] Google Scholar citations and Google Web/URL citations: A multi-discipline exploratory analysis
    Kousha, Kayvan
    Thelwall, Mike
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (07): : 1055 - 1065
  • [8] Google Books
    McShane, Clay
    [J]. JOURNAL OF TRANSPORT HISTORY, 2007, 28 (02): : 319 - 325
  • [9] 'Books @ Google' (Jason Epstein's 'Books at Google') - Reply
    Epstein, Jason
    [J]. NEW YORK REVIEW OF BOOKS, 2006, 53 (19) : 70 - 70
  • [10] A Method for Extracting Correct Links from Automatic Created Links on Folksonomy
    Kobayashi, Akio
    Sakaji, Hiroki
    Kohana, Masaki
    [J]. ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2017, 2018, 7 : 1144 - 1150