Deep Learning Reader for Visually Impaired

被引:7
|
作者
Ganesan, Jothi [1 ]
Azar, Ahmad Taher [2 ,3 ]
Alsenan, Shrooq [2 ]
Kamal, Nashwa Ahmad [4 ]
Qureshi, Basit [2 ]
Hassanien, Aboul Ella [5 ]
机构
[1] Sona Coll Arts & Sci, Dept Comp Applicat, Salem 636005, Tamil Nadu, India
[2] Prince Sultan Univ, Coll Comp & Informat Sci, Riyadh 11586, Saudi Arabia
[3] Benha Univ, Fac Comp & Artificial Intelligence, Banha 13518, Egypt
[4] Cairo Univ, Fac Engn, Giza 12613, Egypt
[5] Cairo Univ, Fac Comp & Artificial Intelligence, Giza 12613, Egypt
关键词
artificial intelligence; Convolutional Neural Network architectures; Long Short Term Memory; visually impaired individuals; assistive device; deep learning; BLIND;
D O I
10.3390/electronics11203335
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in machine and deep learning algorithms and enhanced computational capabilities have revolutionized healthcare and medicine. Nowadays, research on assistive technology has benefited from such advances in creating visual substitution for visual impairment. Several obstacles exist for people with visual impairment in reading printed text which is normally substituted with a pattern-based display known as Braille. Over the past decade, more wearable and embedded assistive devices and solutions were created for people with visual impairment to facilitate the reading of texts. However, assistive tools for comprehending the embedded meaning in images or objects are still limited. In this paper, we present a Deep Learning approach for people with visual impairment that addresses the aforementioned issue with a voice-based form to represent and illustrate images embedded in printed texts. The proposed system is divided into three phases: collecting input images, extracting features for training the deep learning model, and evaluating performance. The proposed approach leverages deep learning algorithms; namely, Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), for extracting salient features, captioning images, and converting written text to speech. The Convolution Neural Network (CNN) is implemented for detecting features from the printed image and its associated caption. The Long Short-Term Memory (LSTM) network is used as a captioning tool to describe the detected text from images. The identified captions and detected text is converted into voice message to the user via Text-To-Speech API. The proposed CNN-LSTM model is investigated using various network architectures, namely, GoogleNet, AlexNet, ResNet, SqueezeNet, and VGG16. The empirical results conclude that the CNN-LSTM based training model with ResNet architecture achieved the highest prediction accuracy of an image caption of 83%.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Smart Reader For Visually Impaired
    Musale, Sandeep
    Ghiye, Vikram
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 339 - 342
  • [2] Identification of Visually Impaired Person with Deep Learning
    Fujisawa, Shoichiro
    Mandai, Ranmaru
    Kurozumi, Ryota
    Ito, Shin-ichi
    Sato, Katsuya
    [J]. INTELLIGENT HUMAN SYSTEMS INTEGRATION, IHSI 2018, 2018, 722 : 601 - 607
  • [3] Text Reader for Visually Impaired People: ANY READER
    Manikandan, A. V. M.
    Choudhury, Shouham
    Majumder, Souptik
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 2389 - 2393
  • [4] Interaction with a Mobile Reader for the Visually Impaired
    Keefer, Robert
    Bourbakis, Nikolaos
    [J]. ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 229 - 236
  • [5] LifeLens: Deep Learning Based Application for the Visually Impaired
    Balasubramanian, S.
    Teja, Surya C. V. N.
    Tomar, Aakash
    Kiran, Thanikonda Sai
    Krupa, Niranjana
    [J]. 2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 771 - 776
  • [6] Deep Learning Based Shopping Assistant For The Visually Impaired
    Pintado, Daniel
    Sanchez, Vanessa
    Adarve, Erin
    Mata, Mark
    Gogebakan, Zekeriya
    Cabuk, Bunyamin
    Chiu, Carter
    Zhan, Justin
    Gewali, Laxmi
    Oh, Paul
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2019,
  • [7] Deep Learning Technique for Serving Visually Impaired Person
    Chandankhede, Pragati
    Kumar, Arun
    [J]. 2019 9TH INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY: SIGNAL AND INFORMATION PROCESSING (ICETET-SIP-19), 2019,
  • [8] Mobile Reader: Turkish Scene Text Reader for the Visually Impaired
    Kandemir, Hilal
    Canturk, Busra
    Bastan, Muhammet
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1857 - 1860
  • [9] iReader: An Intelligent Reader System for the Visually Impaired
    Jothi, G.
    Azar, Ahmad Taher
    Qureshi, Basit
    Kamal, Nashwa Ahmad
    [J]. 2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 188 - 193
  • [10] Interactive Reader Device for Visually Impaired People
    Motto Ros, Paolo
    Paseroa, Eros
    Del Giudice, Paolo
    Dante, Vittorio
    Petetti, Erminio
    [J]. NEURAL NETS WIRN09, 2009, 204 : 306 - 313