Название | Machine Vision Inspection Systems, Machine Learning-Based Approaches |
---|---|
Автор произведения | Группа авторов |
Жанр | Программы |
Серия | |
Издательство | Программы |
Год выпуска | 0 |
isbn | 9781119786108 |
76. https://www.soundhealthandlastingwealth.com/health-news/new-insightsinto-sars-cov-2-viral-diversity/?utm_source=rss&utm_medium=rss&utm_campaign=new-insights-into-sars-cov-2-viral-diversity [Accessed on May 29, 2020].
77. https://www.flickr.com/photos/nihgov/43683984840 [Accessed on May 29, 2020].
78. https://www.nih.gov/news-events/news-releases/scientists-develop-novel-vaccine-lassa-fever-rabies [Accessed on May 29, 2020].
79. https://www.nytimes.com/2015/05/27/science/lassa-virus-carries-little-risk-to-public-experts-say.html [Accessed on May 29, 2020].
80. http://www.mrcindia.org/journal/issues/441001.pdf [Accessed on May 29, 2020].
81. https://www.dw.com/en/man-severely-ill-with-lassa-fever-being-treated-at-university-hospital-frankfurt/a-19122900 [Accessed on May 29, 2020].
82. https://fineartamerica.com/featured/1-lassa-virus-tem-science-source.html [Accessed on May 29, 2020].
83. https://www.cdc.gov/non-polio-enterovirus/resources-ev68-photos.html [Accessed on May 29, 2020].
84. https://www.researchgate.net/figure/TEM-image-of-Enterovirus-71-EV71-virus-like-particles-The-morphology-of-purified-VLPs_fig1_277783163 [Accessed on May 29, 2020].
85. https://www.nih.gov/news-events/nih-research-matters/enterovirus-infection-linked-acute-flaccid-myelitis [Accessed on May 29, 2020].
86. https://en.wikipedia.org/wiki/Enterovirus_C [Accessed on May 29, 2020].
87. https://www.emptywheel.net/tag/enterovirus-d68/?print=print [Accessed on May 29, 2020].
88. https://simple.wikipedia.org/wiki/Enterovirus [Accessed on May 29, 2020].
1 *Corresponding author: [email protected]
2
Capsule Networks for Character Recognition in Low Resource Languages
C. Abeysinghe, I. Perera and D.A. Meedeniya*
Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka
Abstract
Most of the existing techniques in handwritten character recognition are not well-utilized for low resource languages, due to the lack of labelled data and the need for large datasets for image classification using deep neural networks. In contrast to recent advancement in deep learning-based image classification, human cognition could quickly identify and differentiate characters without much training. As a solution to character recognition problem in low resource languages, this chapter proposes a model that replicates the human cognition ability to learn with small datasets. The proposed solution is a Siamese neural network which bestows capsules and convolutional units to get a thorough understanding of the image. The presented model takes two images as inputs, process, and extract features through the capsule network and outputs the probability of being similar. This study attests that the capsule-based Siamese network could learn abstract knowledge about different characters which could be extended to unforeseen characters. The proposed model is trained on Omniglot dataset and achieved up to 94% accuracy for previously unseen alphabets. Further, the module is tested on Sinhala language alphabet and MNIST dataset that stands for Modified National Institute of Standards and Technology database, which are new to the trained model.
Keywords: Character recognition, capsule networks, deep learning, one-shot, learning, sinhala dataset
2.1 Introduction
Ability to learn visual concepts using a small number of examples is a distinctive ability of human cognition. For instance, even a child can correctly distinguish between a bicycle and car, after showing them one example. Taking this one step further, if we show them a plane and ship, which they have never seen before, they could correctly understand that they are two different vehicle types. One could argue that this ability is an application of previous experience and domain knowledge to new situations. How could we reproduce this same ability in machines? In this chapter, we propose a method to transfer previously learned knowledge about characters to differentiate between new character images.
There are versatile applications in image classification using few training samples [1–3]. Being able to classify images without any previous training possess greater importance in situations like character recognition, signature verification, and robot vision. This paradigm, where only one sample is used to learn and make predictions, is known as one-shot learning [4]. Especially when it comes to low resource languages, currently available deep learning techniques fail due to lack of large labeled datasets. If a model could do one-shot learning for an alphabet using a single image as a training sample for classification, that model could make a massive impact for optical character recognition [5].
This chapter uses Omniglot dataset [6] to train such one-shot learning model. Omniglot stands for the online encyclopedia of writing systems and languages, which is a dataset of handwritten characters and widely used in similar tasks that need a small number of data samples belonging to many classes. In the research, we extend this dataset by introducing a set of characters from Sinhala language, which has around 17 million native speakers and mainly used only in Sri Lanka. Due to lack of available resources for the language, using novel deep learning-based Optical Character Recognition (OCR) methods are challenging. With the trained model introduced in this chapter, significant character recognition accuracy was achieved for Sinhala language using a small dataset.
Character detection using one-shot learning has been addressed previously by researchers such as Lake et al. [6] using generative character model, Koch et al. [7] using Convolutional Neural Networks (CNN). In this proposed study, we focus on using capsule networks integrated into a Siamese network [8] to learn a generalized abstract function which outputs the similarity of two images. Capsule networks are the latest advancement in the computer vision domain, and they possess several advantages over traditional convolutional layers [9].
Translation invariance or disability to identify the position of an object