Identification of Original Document by Using Textual Similarities

被引:3
|
作者
Shrestha, Prasha [1 ]
Solorio, Thamar [1 ]
机构
[1] Univ Houston, Dept Comp Sci, Houston, TX 77004 USA
关键词
original document; bag-of-words; document provenance; plagiarism; PROVENANCE;
D O I
10.1007/978-3-319-18117-2_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When there are two documents that share similar content, either accidentally or intentionally, the knowledge about which one of the two is the original source of the content is unknown in most cases. This knowledge can be crucial in order to charge or acquit someone of plagiarism, to establish the provenance of a document or in the case of sensitive information, to make sure that you can rely on the source of the information. Our system identifies the original document by using the idea that the pieces of text written by the same author have higher resemblance to each other than to those written by different authors. Given two pairs of documents with shared content, our system compares the shared part with the remaining text in both of the documents by treating them as bag of words. For cases when there is no reference text by one of the authors to compare against, our system makes predictions based on similarity of the shared content to just one of the documents.
引用
收藏
页码:643 / 654
页数:12
相关论文
共 50 条
  • [1] Textual Document Clustering using Topic Models
    Sun, Xiaoping
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 1 - 4
  • [2] Integrated document segmentation and region identification: textual, equation and graphical
    Jennil Thiyam
    Sanasam Ranbir Singh
    Prabin Kumar Bora
    [J]. Multimedia Systems, 2023, 29 : 3447 - 3466
  • [3] Integrated document segmentation and region identification: textual, equation and graphical
    Thiyam, Jennil
    Singh, Sanasam Ranbir
    Bora, Prabin Kumar
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3447 - 3466
  • [4] Matching Images With Textual Document Using TFIDF Method
    Arnesia, Pipit Dewi
    Madenda, Sarifuddin
    [J]. 2012 5TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2012, : 1283 - 1289
  • [5] Original, Original Document and Authentic Document: Revisiting the Relationship of Concepts
    Surovtseva, Nataliya G.
    [J]. HERALD OF AN ARCHIVIST, 2020, (01): : 102 - 105
  • [6] The judgment of document similarities using various orthogonal transformations
    Graduate School of Science and Technology, Seikei University, Japan
    不详
    [J]. Recent Res. Circuits, Syst., Control Signals - Proc. Int. Conf. Circuits, Syst., Control, Signals, CSCS, (42-48):
  • [7] Spoken Document Retrieval for Oral Presentations Integrating Global Document Similarities into Local Document Similarities
    Nanjo, Hiroaki
    Iyonaga, Yusuke
    Yoshimi, Takehiko
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1285 - 1288
  • [8] Collaborative Bug Triaging using Textual Similarities and Change Set Analysis
    Kevic, Katja
    Mueller, Sebastian C.
    Fritz, Thomas
    Gall, Harald C.
    [J]. 2013 6TH INTERNATIONAL WORKSHOP ON COOPERATIVE AND HUMAN ASPECTS OF SOFTWARE ENGINEERING (CHASE), 2013, : 17 - 24
  • [9] Video Search Reranking with Relevance Feedback Using Visual and Textual Similarities
    Fujii, Takamasa
    Yoshida, Soh
    Muneyasu, Mitsuji
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2019, E102A (12) : 1900 - 1909
  • [10] Using non-textual cues for electronic document browsing
    Rus, D
    Summers, K
    [J]. DIGITAL LIBRARIES: CURRENT ISSUES, 1995, 916 : 129 - 162