Real-Time Detection of Natural Scene Assamese Texts Using Deep Learning

Enghi, Monor; Talukdar, Anjan Kumar; Sarma, Kandarpa Kumar

doi:10.1007/978-981-99-2609-1_4

Monor Enghi⁵,
Anjan Kumar Talukdar⁵ &
Kandarpa Kumar Sarma⁵

Included in the following conference series:

North-East Research Conclave

118 Accesses

Abstract

Text detection and identification are crucial elements in comprehending semantics in a natural scene. As a result of human ideas and manipulation, text in a natural scene carries meaningful information. In the state of Assam, Assamese is the official language and has over 15 million speakers worldwide. But, since no major research or application is being developed on a real-time natural scene Assamese text detection system, it is a driving force for us to develop and contribute the same. The current state-of-the-art Optical Character Recognition (OCR) engine can only detect document-based images and is unable to detect text in natural scene images of different languages, colors, fonts, sizes, etc. Also, traditional text detection methods like Sliding Window produce huge numbers of weights and make them unusable for real-time text detection. Aiming at the problems of scene text detection, we proposed several versions of the YOLO-DarkNet algorithm for achieving real-time detection, and YOLOv5s is trained on discrete GPUs. Several hyper-parameter tuning methods are applied to optimize our proposed model. The hyper-parameter optimizer Stochastic Gradient Descent (SGD) minimizes the extremely high computing power by attaining faster iterations in exchange for a reduced convergence rate. After training, our proposed model based on YOLOv5s optimized with SGD generates a weight of 13.6 MB with 7.01 million parameters, and the performance on custom datasets achieves a detection speed of 13.3 ms and precision (mAP) of 94.3%.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Text Detection with Deep Neural Network

Deep-learning based end-to-end system for text reading in the wild

Article 21 March 2022

Scene text detection and recognition with advances in deep learning: a survey

Article 27 March 2019

Keywords

1 Introduction

Text is one of the humanity’s most essential sources of information and is widely utilized for communication. We commonly used text in ID cards, driving licenses, bank passbooks, scanned lecture notes, etc. Optical character recognition (OCR) [1] software is generally used to read texts from such images. These solutions are incredibly reliable and accurate. In the last decade, there has been a demand for a more difficult task, which is the real-time detection of text from natural scene images. The text displayed in natural scene images differs significantly from the document-based text images which can be present in posters, banners, billboards, street names, and sign poles. The key distinctions are uneven lighting, texture, orientation, perspective distortion, and font color and size variation [2]. With the tremendous rise of the smartphone market, the vast majority of individuals can now capture images of their surroundings. Text in natural scene conveys high-level meanings directly as a result of human ideas and creativity. Because of this semantic property, the text in natural scene images and videos is a unique and valuable source of information. As a result, text detection has a variety of real-life applications, including multimedia information retrieval systems, visually impaired assistive devices, self-driving cars, text translators, and toll gate car number plate detectors [3].

The official language of the Indian state of Assam is Assamese. Assamese is the world’s 67th most-spoken language, with over 15 million native speakers. It is the major language in northeast India consisting of seven Indian states. Assamese has derived its phonetic character set and its behavior from Sanskrit. There are 11 vowels, 41 consonants, 10 digits, and over 300 compound characters in the Assamese language. But since no major research or application is being developed on real-time natural scene Assamese text detection, recognition, and translation system, it is our utmost motivation to develop and contribute our proposed model to a multilingual nation like India and the rest of the world. Traditionally, the methods of text detection can be divided into two types: sliding window and connected component-based methods [2]. The sliding window method detects text by sliding a window across the entire image at various scales. Methods based on connected components recognize single characters and then arrange them into words or text-line regions. With the advent CNNs and deep learning, new approaches with substantially greater accuracies than previous methods have been developed. Deep learning methods are currently widely used in general object recognition, pattern and object segmentation, and text detection in natural scene images. Our proposed model focuses solely on deep learning-based methods, which clearly outperform traditional methods.

There are certain factors that can make a natural scene text detection a challenging task, which can be a scene background pattern, different language characters, and variations in text position, size, and color. However, the performance of the machine learning method for scene text detection is superseded by deep learning method which is suitable for training a model with a large number of datasets and has become a popular method for scene text detection and recognition in recent years. The concept of machine learning in textual region detection comes from the development of machine learning in object detection. Moreover, the framework for scene text detection is further developed from the object detection framework. Deep learning-based text detection is basically divided into three stages: preprocessing, feature extraction, and text detection. The basic network model's fundamental idea is to use CNN as the image's feature extractor. Some of the existing basic networks are LeNet, AlexNet, VGGNet, GoogleNet, ResNet, DenseNet, etc.

The rest of the paper is arranged as follows: Sect. 2 discusses various related works done in the field of natural scene text detection systems, Sect. 3 discusses the method of our proposed detection system, Sect. 4 discusses the experimental results obtained, and finally, Sect. 5 discusses the conclusions drawn from our work and the future works to be done.

2 Related Works

Text detection in the natural scene is a challenging task because the text from the images is crowded with various textures, noises, lightning conditions, font color, orientation, etc. Matteo et al. [2] reviewed several text detection methods for on-scene images, they also provided the most recent state-of-the-art approaches to tackle the challenging task of scene text detection. The accuracy and real-time performance of the approaches are compared. They also presented the most popular scene text detection evaluation datasets. The authors of [3] examined, compared, and contrasted the technical obstacles, methodologies, and performance of text detection and recognition study in color images. They also described the main issues and listed items to consider while dealing with scene detection issues. Mitra et al. [4] presented a novel scene detection system using Fully Convolutional DenseNets. They trained FC-DenseNet to execute semantic segmentation on photos before using it to recognize text. That is, they divided each image into three sections: text, background, and word-fence. Mayuri et al. [5] introduced a novel approach to text detection method that improved detection accuracy while decreasing average processing time. Their text identification method used the eMSER method to retain character form and a custom clustering algorithm to converge quicker. Xinyu et al. [6] proposed a method for fast and accurate text recognition in natural scenes in which a single neural network has predicted words of various orientations and quadrilateral forms in entire images, avoiding superfluous intermediary stages. The authors of [7] introduced a novel method for increasing text detection and identification performance by finding flaws in text detection results. Joseph et al. [8] presented an updated version of YOLO, i.e., YOLOv3. They also provided comparative and performance analyses. Huibai et al. [9] used an enhanced YOLOv3-based scene text detection technique. They found that the training duration of YOLOv3 with DarkNet for a single detection target was slow due to too many layers; therefore, they experimented with a method by replacing it with DarkNet19. Second, the original network preserved multi-scale detection, and three anchors of varying sizes were utilized for bounding box prediction. Sahil et al. [10] proposed a web-based application of tesseract-OCR where a user can upload a document image and do translation with the help of Google Translate API. A Python script and different modules were utilized to address various issues in document-based text segmentation and translation. The authors of [11] proposed a novel open-source line recognizer that combines deep convolutional networks and LSTMs, which uses CUDA for achieving better training performance on PyTorch. Mani et al. [12] proposed a model for translating English phrases into Hindi using ConceptNet for Statistical Machine Translation and Rule-Based Machine Translation in tandem. Abhash et al. [13] proposed a system that utilized a Deep Neural Network (DNN) to construct a text-to-speech system for the Assamese language. The system is trained using audio data provided by collaboration and made freely available for academic usage.

3 Methodology

For real-time detection of text in a natural scene, we chose YOLOv3-Tiny [14] and YOLOv5s [15] algorithms for bounding box prediction over the textual region. Compared to YOLOv2, YOLOv3 [8] included multi-label classification and multi-scale detection, as well as employing the DarkNet53 deep neural network layers as a feature extractor, which enhanced the older version of YOLO that doesn’t perform well while detecting small objects. As a result, YOLOv3 has emerged as one of the most effective object detection algorithms. The basic workflow of the YOLOv3 network is to receive a 2D image as an input, and the convolution layer will extract and map the hidden features of an image using a sliding window. On the other hand, the pooling layer will downsample and select the important features, which drastically reduces computational costs during feature extraction. Convolution [19, 20] is used to extract visual feature information. Our proposed model based on the YOLO-DarkNet architecture has three stages: text detector, text recognizer, and neural machine translator. The text detector stage is able to achieve detection in real time. Several versions of YOLO are optimized and trained using Google Colab and also on a laptop with a discrete GPU. Hyper-parameter tuning is done to reduce weight and improve accuracy. YOLOv3-Tiny achieves desirable detection performance, but when it comes to precision, YOLOv5s outperformed all the above-mentioned algorithms. Figure 1 depicts the block diagram of our proposed work.

YOLOv3 begins by scaling the input image of any aspect ratio to 416 × 416 pixels in size before dividing it into same-size S × S cells using a Feature-Pyramid network. The detection of the text is done on three separate scales of feature graph size: 13 × 13, 26 × 26, and 52 × 52. Double up-sampling is used on two adjacent scales for the placement of the feature graph. A particular cell will use three anchor boxes to predict three bounding boxes. YOLOv3 consists of a total of 65 million parameters. Figure 2 depicts the flowchart of the YOLOv3 algorithm.

The x and y coordinates and text width w and text height ℎ will be predicted by convolutional neural network layer of YOLOv3 for each bounding box in each cell, which are denoted as ${t}_{x},{t}_{y},{t}_{w,}$ and ${t}_{h}$, respectively. If in a particular cell a deviation $({C}_{x},{C}_{y})$ occurs at the center of the top left text region, and the anchor box of height ${P}_{h}$ and width ${P_w}$, then the changed bounding box is

$${b}_{x}=\sigma \left({t}_{x}\right)+{c}_{x}$$

$${b}_{y}=\sigma \left({t}_{y}\right)+{c}_{y}$$

$${b}_{w}={P}_{w}{e}^{{t}_{w}}$$

$${b}_{h}={P}_{h}{e}^{{t}_{h}}$$

In the training process, the loss is obtained by calculating the sum of squares error (MSE). The gradient of the training iterations or epochs is obtained by minimizing the loss function. Assuming t̂ ^*as the real coordinates, then the gradient is the difference between the real coordinates values and the predicted coordinates: t̂ ^* − t^* [9].

Logistic regression is used to predict the object present in the bounding box. The probability of this anchor box is 1 if the rate of overlapping is larger between the anchor box and the true target bounding box [9]. The prediction is disregarded if the overlap is more than the provided threshold but less than the maximum. YOLOv3 is designed to provide one anchor box for a single object. If there is no object in the scene, then the anchor box will not produce any loss function. During training, YOLOv3 uses binary cross-entropy loss and logistic regression to make category predictions, allowing it to perform multi-tag classification of a target [9]. YOLOv3-tiny is a lighter version of YOLOv3. It has 13 layers in total, including 7 convolutional and 6 max-pooling layers. YOLOv5 is rumored to be the next updated version of the YOLO family, released in 2020 by Ultralytics just a few days after YOLOv4 and made it open source. It is simply the PyTorch implementation of YOLOv3. Because there is no official paper, the performance's authenticity cannot be guaranteed. It achieves the same prediction speed as YOLOv3 with better detection precision while using less computation power. Figure 3 depicts the YOLOv5 family’s performance chart.

The steps involved in training our proposed natural scene text detector are as follows.

3.1 Datasets Collection

Images of Assamese text in natural scenes are captured using a 12-megapixel smartphone camera which produced an output image of 1600 × 900 pixels size. Several font styles, colors, background texture, lighting condition, angle, and position are taken into consideration. A total of 1000 training images and 300 validation images with over 6000 texts is collected. Figure 4 below shows a few dataset samples. The top left-most image with the largest font consisting of Assamese texts indicates “Goswami Milk Product,” and the top right-most image with two white-colored texts indicates “Hotel Lakhimi,” and so on.

3.2 Image Annotation

Each textual region of Assamese text in an image is labeled using open-source LabelImg software. Both labeled images and ground truth are stored in a single folder. Ground truth is a.txt file generated after labeling which consists of class number and bounding box coordinates. Our datasets annotation is prepared for a single class in COCO datasets format. Figure 5 depicts labeled images.

3.3 Hyper-Parameter Optimization

The optimizers used in our proposed model are as follows.

Stochastic gradient descent optimizer. This optimizer is used in our proposed model for optimizing the training iterations and reducing the loss functions. This approach tends to maximize an objective function with acceptable smoothness properties. It is a stochastic approximation to gradient descent optimization since it replaces the actual gradient (taken from the entire dataset) with an estimate of it (calculated from a randomly selected subset of the data). This reduces the extremely high processing cost, allowing for faster iterations in exchange for a lower convergence rate, especially in high-dimensional optimization problems.

Adam Optimizer. Adaptive Moment Estimation or Adam is a new approach for optimizing gradient descent. This optimizer is recommended for solving a problem with a huge number of data and parameters. It is efficient and costs less burden in computing memory a. It is a hybrid form of the gradient descent with momentum and the “RMSP” algorithms.

In the training phase of YOLOv3 in Google Colab, the momentum is set to 0.9, stochastic gradient descent is used for optimization, and the initial learning rate is set to 0.0001. Decay is set to 0.0005 which will stabilize the network in the first 1000 training iterations. Later, to avoid gradient disappearance, a step strategy is set in the 1800 and 2200 training iterations which will alter the learning rate. Maximum batches are set to no. of classes × 2000 = 1 × 2000 = 2000, i.e., training will be stopped after 2000 iterations. With the implementation of transfer learning, we altered the last three layers of YOLOv3. Classes are set to 1 and their corresponding number of filters is set to 18. To speed up the process GPU acceleration, CUDNN and OPENCV are set to 1. Hyper-parameters configuration for YOLOv3-tiny and YOLOv5s is the same as that of the YOLOv3 detection model. Moreover, during the training phase of YOLOv5s on discrete RTX 3050 mobile GPU, the class is set to 1, and the learning rate is set to 0.01 depending on the optimizer (SGD = 1E-2, Adam = 1E-3) for 500 epochs.

3.4 Selection of Backbone Network

There are four types of YOLO algorithm backbone networks:

(1)
DarkNet19
(2)
DarkNet39
(3)
DarkNet53
(4)
CSPDarknet53.

We chose DarkNet53 and CSPDarknet53, which is a convolution neural network with 53 layers deep, for our detection model with YOLOv3, YOLOv3-tiny, and YOLOv5. This backbone network is stacked on top of our custom YOLOv3 network to improve feature extraction. CSPDarknet53 splits the feature map of the base layer into two parts using a CSPNet strategy and then combines them using a cross-stage hierarchy. Using a split and merge strategy provides a greater gradient flow through the network. YOLOv5's backbone network is CSPDarknet53, and its architecture is depicted in Fig. 6.

3.5 Training of YOLOv5s Using RTX 3050 Laptop Discrete GPU

YOLOv5 can be trained on both GPU and CPU. It is recommended to train on GPU-enabled devices with the latest version of CUDA for reducing training time. If a device doesn’t have a GPU card, a virtual system enabled with Nvidia Tesla T4 on Google Colab can be used for training. Our proposed model with YOLOv5s is trained on RTX 3050 with CUDA version of 11.1 alongside Ryzen 7 octa-core processor up to 4.2 GHz and 16 gigabytes of 3200 MHz dual channel RAM.

Steps that are followed during training on Jupyter Notebook are as follows.

Clone the YOLOv5 repository. First of all, we have cloned the YOLOv5 repository from the official Ultralytics Github repository [16]. This repository contains YOLOv5s with a CSPDarknet YML file and all the essential hyper-parameters.

Install prerequisite libraries. To enable GPU for training, we have installed CUDA toolkit v11.1 using the conda install command. The yolov5 repository also includes a “requirements.txt” file that contains all of the libraries needed to train the model. Some of the libraries are OpenCV, PyYAML, torchvision, WandB, etc.

Create a YML data path file. A YML (Yet Another Markup Language) is a data serialization language that is frequently used to navigate datasets’ path during training. We have created dataset.yml using notepad++ to access datasets directory path and set the number of classes to 1.

Integrate WandB. We used the Weights & Biases (WandB) Python package, which helped us track our training performance in real time. It is simple to integrate with popular deep learning frameworks such as Pytorch, TensorFlow, or Keras.

Training custom YOLOv5s model. After creating the YML file, we wrote the command to load our custom datasets and pre-trained model’s weight file to train a new model. We set image size = 288, batch size = 16, epochs to 500, and workers = 2. The training will begin shortly after executing the training command or train.py script and it will take some time depending on the input model hyper-parameters and hardware specifications.

4 Results and Discussion

Our proposed model based on YOLOv3 is trained using custom datasets of Assamese text with a single class on Google Colab which took around 1.2 h duration of training for 2000 iterations and produced a custom weights file of 234 MB. So, to reduce our proposed model weight file’s size, we have also trained our proposed based on YOLOv3-tiny and YOLOv5s algorithm over our custom datasets. YOLOv3-tiny took around 1.01 h of training time and produced a weights file of 33 MB. The loss curves chart of YOLOv3 and YOLOv3-tiny are shown in Fig. 7a and b, respectively.

Training of YOLOv5s on discrete RTX 3050 GPU took around 5.4 h of training duration for 500 epochs and generated a weight file of 13.6 MB. The training loss curve of YOLOv5s using SGD as a hyper-parameter optimizer reduces the object loss and box loss during training which ultimately leads to achieving high performance and precision. Training loss and precision plots are shown in Fig. 8 above.

From the above charts, we can see that the YOLO algorithm's training loss decreases faster with the DarkNet53 network, the data is less volatile, and the ultimate stable value is minimal. The custom weights of YOLOv3 and YOLOv3-tiny can now be used for testing our detection model. In terms of the detection effect, we independently tested the weights of three different versions of YOLO networks for the detection of text in the same natural scene image, as shown in Fig. 9. Both the top left and right images indicate “Gauhati University Idol”, the bottom left and right indicates “Tarif Bakery” and “Welcoming Kharupetia Town Committee, Darang”. The candidate bounding boxes extracted from YOLOv3-tiny-DarkNet53 are less precise than those extracted from YOLOv3-DarkNet53, but there is no incorrect detection of the textual region and still sufficiently provide a foundation for subsequent identification work, and the incorrect detection areas are not identified, according to the test images. Furthermore, the detection speed of YOLOv3-tiny is far superior to that of YOLOv3.

Based on the analysis of the above three model detection effects, when we experimented with the same set of training data, YOLOv5s as shown in Fig. 9 has a detection speed slower than YOLOv3-tiny but achieved the highest recognition rate among other models, and there is no incorrect detection for a given IOU threshold, similarly when tested for a single detection target. Also, network simplification improves training speed while maintaining the recognition rate. In terms of detection speed, YOLOv3-DarkNet53 takes 23 ms, YOLO-v3-tiny-DarkNet53 takes 8.3 ms, and YOLOv5s takes 14.2 ms of inference time. Therefore, after optimizing the network with SGD and Adam, the recognition speed of YOLO-v3-tiny-DarkNet53 is faster, and the frame per second count is significantly increased. But when we analyze in terms of precision, YOLOv5s is performing significantly better which achieves a mean average precision (mAP) score of 94.3%. In terms of training duration, YOLOv3-tiny outperforms all other models, providing easier hyper-parameter and network fine-tuning work. Comparative analysis of YOLOv3-tiny with YOLOv3, YOLOv5s, and reference model MobileNetV3 is shown in Table 1.

Table 1 Comparison between the proposed model and experimental results [18]

Full size table

5 Conclusion

Text detection plays a vital role in the domain of computer vision and its various application has made our life easy and more productive. Scene text detection is a computer vision task that can detect text in a natural scene where different objects, font size, color, lights, etc., are always a factor to be considered and tends to remove some unwanted characteristic like background pattern and noise. The deep learning method has made scene text detection a much easier and more reliable method to overcome problems like region segmentation and pattern recognition. The traditional text detection method is slow and takes a lot of time to train and learn. YOLO algorithm is one among the deep learning algorithm that has passed the real-time detection of the 30 FPS mark. Our proposed model with YOLOv3-tiny with DarkNet53 backbone provides more FPS than YOLOv3, MobileNetv3, and YOLOv5s. In terms of detection metrics, we find out the mean average precision (mAP) and it is observed that YOLOv3-tiny is slightly lower than YOLOv3, YOLOv5, and MobileNetv3. On the other hand, YOLOv5s achieve the highest detection performance and best precision. The weight size of YOLOv5s is very small as compared to YOLOv3 and YOLOv3-tiny but still larger than MobileNetv3 which might constrain a low-end hardware system. We will further develop our work for the recognition of the detected Assamese texts and to translate them into English texts by using an improved version of LSTM-based Tesseract-OCR and Neural Machine Translation.

References

Smith R (2007) An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). pp. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
Brisinello M, Grbic R, Vranješ M, Vranješ D (2019) Review on text detection methods on scene images. In: 61st International Symposium ELMAR-2019. Zadar, Croatia, pp 23–25.
Google Scholar
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Article Google Scholar
Behzadi M, Safabakhsh R (2018) Text detection in natural scenes using fully convolutional DenseNets. In: 2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS). pp 11–14. https://doi.org/10.1109/ICSPIS.2018.8700562
Mehta MA, Pote SA (2018) Text detection from scene videos having blurriness and text of different sizes. 2018 IEEE Punecon 1–4. https://doi.org/10.1109/PUNECON.2018.8745375
Mokayed H, Shivakumara P, Liwicki M, Pal U (2020) A new defect detection method for improving text detection and recognition performances in natural scene images. In: 2020 Swedish Workshop on Data Science (SweDS). pp 1−7. https://doi.org/10.1109/SweDS51247.2020.9275589
He W, Zhang Y, Yin F, Liu CL (2018) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419. https://doi.org/10.1109/TIP.2018.2855399
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. ArXiv https://arxiv.org/abs/1804.02767
Wang H, Zhang Z (2019) Text detection algorithm based on improved YOLOv3. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). pp 147–150. https://doi.org/10.1109/ICEIEC.2019.8784576
Thakare S, Kamble A, Thengne V, Kamble UR (2018) Document segmentation and language translation using tesseract-OCR. In: 2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS). pp 148–151. https://doi.org/10.1109/ICIINFS.2018.8721372
Deep learning for aircraft wake vortex identification-scientific figure on ResearchGate. https://www.researchgate.net/figure/YOLO-v3-algorithm-flow-chart-A-convolutional-neural-network-usually-consists-of-an-input_fig2_337451395. Accessed 14 May 2022.
Bansal M, Jain G (2017) Improvement of english-hindi machine translation using conceptNet. In: 2017 Recent Developments in Control Automation & Power Engineering (RDCAPE). pp 198−202. https://doi.org/10.1109/RDCAPE.2017.8358266
Deka A, Sarmah P, Samudravijaya K, Prasanna SRM (2019) Development of Assamese text-to-speech system using deep neural network. In: 2019 National Conference on Communications (NCC). pp 1−5. https://doi.org/10.1109/NCC.2019.8732262
Adarsh P, Rathi P, Kumar M (2020) YOLO v3-tiny: object detection and recognition using one stage improved model. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). pp. 687–694. https://doi.org/10.1109/ICACCS48705.2020.9074315
Pytorch-YOLOv5. https://pytorch.org/hub/ultralytics_yolov5/. Accessed 12 Dec 2021.
YOLOv5 Github page. https://github.com/ultralytics/yolov5. Accessed 14 May 2022.
YOLOv5 study notes. https://www.codetd.com/en/article/12277758. Accessed 14 May 2022.
Menghan Z, Zitian L, Yuncheng S (2020) Optimization and comparative analysis of YOLOV3 target detection method based on lightweight network structure. In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). pp 20−24. https://doi.org/10.1109/ICAICA50127.2020.9182679
Koo J, Klabjan D, Utke J (2020) Combined convolutional and recurrent neural networks for hierarchical classification of images. In: 2020 IEEE International Conference on Big Data (Big Data). pp 1354−1361. https://doi.org/10.1109/BigData50022.2020.9378237
Nie X, Yang M, Liu RW (2019) Deep neural network-based robust ship detection under different weather conditions. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). pp. 47−52. https://doi.org/10.1109/ITSC.2019.8917475

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Gauhati University, Guwahati, Assam, India
Monor Enghi, Anjan Kumar Talukdar & Kandarpa Kumar Sarma

Authors

Monor Enghi
View author publications
You can also search for this author in PubMed Google Scholar
Anjan Kumar Talukdar
View author publications
You can also search for this author in PubMed Google Scholar
Kandarpa Kumar Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monor Enghi .

Editor information

Editors and Affiliations

Department of Electronics and Electrical Engineering and Head Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, Assam, India
Ratnajit Bhattacharjee
Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, India
Debanga Raj Neog
Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, India
Konda Reddy Mopuri
Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, India
Santosh Kumar Vipparthi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Enghi, M., Talukdar, A.K., Sarma, K.K. (2023). Real-Time Detection of Natural Scene Assamese Texts Using Deep Learning. In: Bhattacharjee, R., Neog, D.R., Mopuri, K.R., Vipparthi, S.K. (eds) Artificial Intelligence and Data Science Based R&D Interventions. NERC 2022. Springer, Singapore. https://doi.org/10.1007/978-981-99-2609-1_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-2609-1_4
Published: 05 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2608-4
Online ISBN: 978-981-99-2609-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Real-Time Detection of Natural Scene Assamese Texts Using Deep Learning

Abstract