Mobile Video Capture of Multi-page Documents

被引:7
|
作者
Kumar, Jayant [1 ]
Bala, Raja [2 ]
Ding, Hengzhou [2 ]
Emmett, Phillip [2 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Xerox Res Ctr, Webster, NY USA
关键词
D O I
10.1109/CVPRW.2013.10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a mobile application for capturing images of printed multi-page documents with a smartphone camera. With today's available document capture applications, the user has to carefully capture individual photographs of each page and assemble them into a document, leading to a cumbersome and time consuming user experience. We propose a novel approach of using video to capture multipage documents. Our algorithm automatically selects the best still images corresponding to individual pages of the document from the video. The technique combines video motion analysis, inertial sensor signals, and an image quality (IQ) prediction technique to select the best page images from the video. For the latter, we extend a previous no-reference IQ prediction algorithm to suit the needs of our video application. The algorithm has been implemented on an iPhone 4S. Individual pages are successfully extracted for a wide variety of multi-page documents. OCR analysis shows that the quality of document images produced by our app is comparable to that of standard still captures. At the same time, user studies confirm that in the majority of trials, video capture provides an experience that is faster and more convenient than multiple still captures.
引用
收藏
页码:35 / 40
页数:6
相关论文
共 50 条
  • [1] Hidden Markov Models for Text Categorization in Multi-Page Documents
    Paolo Frasconi
    Giovanni Soda
    Alessandro Vullo
    Journal of Intelligent Information Systems, 2002, 18 : 195 - 217
  • [2] Hidden markov models for text categorization in multi-page documents
    Frasconi, P
    Soda, G
    Vullo, A
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2002, 18 (2-3) : 195 - 217
  • [3] On Leveraging Multi-Page Element Relations in Visually-Rich Documents
    Napolitano, Davide
    Vaiani, Lorenzo
    Cagliero, Luca
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 360 - 365
  • [4] Text categorization for multi-page documents: A hybrid Naive Bayes HMM approach
    Frasconi, Paolo
    Soda, Giovanni
    Vullo, Alessandro
    Proceedings of the ACM International Conference on Digital Libraries, 2001, : 11 - 20
  • [5] NORMA Multi-page Relational View
    Curland, Matthew
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 WORKSHOPS, 2013, 8186 : 487 - 491
  • [6] A Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents
    Guha, Abhijit
    Samanta, Debabrata
    Banerjee, Amit
    Agarwal, Daksh
    IEEE ACCESS, 2021, 9 : 80451 - 80465
  • [7] XWRAPComposer: A multi-page data extraction service
    Liu, Ling
    Zhang, Jianjun
    Han, Wei
    Pu, Calton
    Caverlee, James
    Park, Sungkeun
    Critchlow, Terence
    Buttler, David
    Coleman, Matthew
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2006, 3 (02) : 33 - 60
  • [8] GRAM: Global Reasoning for Multi-Page VQA
    Blau, Tsachi
    Fogel, Sharon
    Ronen, Roi
    Goltst, Alona
    Per, Shahar Tsi
    Ben Avraham, Elad
    Aberdam, Aviad
    Ganz, Roy
    Litman, Ron
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15598 - 15607
  • [9] Multi-page Document VQA with Recurrent Memory Transformer
    Dong, Qi
    Kang, Lei
    Karatzas, Dimosthenis
    DOCUMENT ANALYSIS SYSTEMS, DAS 2024, 2024, 14994 : 57 - 70
  • [10] An analysis of schedules for performing multi-page requests
    Seeger, B
    INFORMATION SYSTEMS, 1996, 21 (05) : 387 - 407