1 Introduction

Microscopy is an inexpensive and early diagnosis technique for many infectious diseases [22]. The microscopy has a wide range of medical diagnosis application like bacterial pneumonia, urinary tract infections, genital infections, blood sample examination and pyogenic infections. Skilled technicians examines the fresh stained microbial specimen for infections and report the pathologist. During microscopic examination process, staining can be done by a person with less clinical knowledge. The staining process differentiates bacteria from the background using identical characterization [7, 12]. The next task in microscopic analysis is specimen examination, which requires a well-trained skilled technician with good concentration and commitment during screening. The technician needs to verify almost 6272 FOV’s for a standard specimen of size 2 × 1 cm having the ocular and objective lens magnification at 10× and 100× respectively. Having the objective lens magnification at 100×, diameter of FOV is 180 mm. On viewing few field of views, one cannot diagnose the disease with good sensitivity and specificity.

A case study of tuberculosis (TB) screening using microscopy is done, in which a sputum sample is taken from the patient and it is well stained with Ziehl-Neelsen for smear preparation. After smear preparation, the slide is thoroughly examined by a skilled laboratory technician to detect Acid Fast Bacilli (AFB). After viewing nearly 100 microscopic field of views, technician provides report about the level of infection for further diagnosis. If more than 10 AFB per field are present, then it is graded as 3+ infection level, for 1–10 AFB per 100 fields is 2+, 10–99 AFB in 100 fields is 1+, 1–9 AFB in 100 fields is scanty [1]. The manual microscopic smear examination leads to overload, time consumption and fatigues, when a technician screens more number of samples per day. In TB epidemic regions, technicians are loaded with more samples for screening, which may leads to low sensitivity and specificity of examination. The primary goals in tuberculosis identification are Rapid detection and early identification. So in order to increase the TB bacilli identification, image processing along with machine learning techniques have been practiced.

Since TB is a contagious disease and to reduce the human involvement in diagnosis, computer vision based techniques have been evolving. This methodology involves, data acquisition from the microscope, preprocessing, segmenting the region of interest and finding pattern descriptors from the acquired image. These extracted features are fed to the classifiers for detection [25]. In supervised learning, each time the network train the samples and check whether the output class matches with the user defined input.

Generally, the detection of TB bacilli from the microscopic images starts with Image acquisition. A microscope with an objective lens of 100× magnification, along with digital camera has been used for acquisition. The pre-processing of image is done by converting RGB color to NTSC (Luminance, Hue, Saturation) [20]. The saturation component in NTSC image is extracted to obtain the grayscale image. Grayscale image is then converted to binary image through the thresholding process using Otsu method. The feature extraction for shape identification uses two parameters, eccentricity and compactness to find the bacilli. These identified feature are then fed into the classifier. Khutlang et al., proposed the detection of TB bacilli with two stages of classification [18]. In the first stage, a one-class pixel classifier is implemented, through which the geometric transformation invariant features are extracted. The second stage employs a one-class object classification. Gaussian, mixture of Gaussians (MoG) and principal component analysis (PCA) classifiers are used by Khutlang et al. The output Objects of the first stage of classifier are filtered based on their area. The threshold was set at a minimum of 50 pixels and a maximum of 400 pixels. The numbers of target and output objects extracted from the first stage of classification are used as prior knowledge in second stage. In stage two classification, accuracy of k-nearest neighbor classifier was used to determine the number of Fourier coefficients to use. Thus it improves the sensitivity of conventional microscopy for TB screening. Neural networks is also be used for detection of tuberculosis bacilli. Images are captured from ZN-stained tissue slides using a light microscope. The bacilli appears as red color after the staining process. Hence a CY-based color filter is used to remove pixels which are not related to red color. Then the k-mean clustering is implemented to segment TB bacilli [26]. The features are extracted based on size, perimeter and shape factors. Thereafter these features are fed into hybrid multilayered perceptron (HMLP) network, called HMLP-ELM network. The HMLP-ELM network classifies the TB bacilli and non TB bacilli. Santiago et al., proposed a Bayesian methodology for taking decisions in sequential screening systems that considers the false alarm rate of detector [28]. Apart from the steps involved in detection of bacilli, here a bacilli classifier is used to eliminate the Non-bacillus objects. In the bacilli preprocessing stage, image is divided into patches for identification of bacilli. Several overlapping grids are used, when a bacilli is located on the edge of a patch. From the selected patches features are extracted using canny edge detector. Then the classified non bacillus objects are discarded and segments the bacilli objects, which are rotated and centered. In object classification level, the probability of positive TB bacilli is analyzed using the trained TB bacilli. Osman et al. [24] proposed the CY color model and K-means clustering algorithm for removing the artifacts and used moment invariant features for extracting the segmented regions. Here the Hybrid Multi-Layer Perceptron (HMLP) network is trained using the features through Extreme Learning Machine for better performance. The author tends to obtain the accuracy of 77.25%.

From the above techniques, the identification of tuberculosis bacilli is done by extracting the features and these features are characterized as a pattern. These patterns are used for recognition of TB bacilli. But the deep learning networks are more powerful than the existing techniques, because it does the representational learning from image [23]. Using this representation, infected and non-infected microbes are made separable. Hence, CNN turns to be a best classifier, in classification of non-living and living microbes. Here, we have attempted to use those approaches to recognize the infected and non-infected microbes. This makes the system robust and can be used with Internet of Things and Cloud Computing to manage the diagnosis in smart cities [3, 5, 6].

2 Background

This section provides the basic details on three topics namely, DeepNets, Support Vector Machine and microscopic data acquisition system, which are the vital components for the proposed Cybernetic Microbial Detection System (CMDS).

2.1 DeepNets

The ConvNet configurations evaluated in this work is VGG16 and VGG19. The depth of network is differed from 16 weight layers in the network A and 19 layers in network B. VGG16 has 13 convolution and 3 fully-connected layers [31]. In network B model the ConvNet has 16 convolution layers and 3 fully-connected layers with a total of 19 layers. The first layer in VGG architecture has convolution layer, with small width of size 64 channels, followed by the pooling layer. The width of the convolution layer is increased after the pooling layer by a factor of 2 and attains 512 channels at the last pooling layer. The total weights in the VGG net is less compared to the shallow nets with huge convolution widths [29]. In VGG16 ConvNets, very small receptive fields are used in all convolution layers. The receptive field of size 3 × 3 is convolved over every pixel of the image with stride 1. Some DeepNets has larger receptive field of size 5 × 5 and 7 × 7. The advantage of using 3 × 3 receptive field over large receptive field is, more non-linear rectification layers are obtained, which makes the function more discriminative. On having a 3 × 3 receptive field, the parameters at each layer is also decreased. Consider, if both the input and output of convolution layer stack has C channels then the overall stacks are parameterized by 3 (32c2C2) = 27c2 weights. In the stack of convolution layers, a 1 × 1 convolution layer is also included to increase the non-linearity. In spite of having 1 × 1 convolution layers with linear projections, the input and output channels in each convolution layers are same. More rectification functions includes additional non-linearity to the network.

Three Fully-Connected (FC) layers follow a stack of convolutional layers which has different depth in different architectures. The first two layers have 4096 channels each. The third layer is used to perform classification of 1000 weight and hence contains 1000 channels (one for each class). The soft-max layer forms the final layer. All the networks have same FC layer configuration in the last layers. The rectification ReLU layer has non-linearity weights in all the layers which are hidden. It is noted that Local Response Normalization (LRN) is not included in none of the networks (except for one). ILSVRC dataset performance is not improved by such normalization, but it leads to increased memory consumption and computation time.

Alex et al., developed the first deep convolution neural network architecture [19]. The deep CNN was trained to classify ImageNet data which contained over 15 million annotated images from a total of over 22,000 categories. AlexNet Won in the ImageNet Large Scale Visual Recognition Challenge in 2012. For preprocessing the image AlexNet uses data augmentation techniques like image translation, horizontal reflection and patch extraction. During training the dataset, the overfitting problem is addressed by implementing dropout layers. To fasten the training of large dataset, two GTX 580 GPU’s are used, which takes five to 6 days for training the dataset. The input of size 224 × 224 × 3 is given to the convolution neural network with 96 kernels. Each kernel has a 11 × 11  size mask and slides over the image with stride of 4 pixels. The second convolution layer takes the input from first convolution layer and filters it with 256 kernels of size 5 × 5 × 48. Finally the output from the fifth convolution with 256 kernels of size 3 × 3 × 192, is given to a fully connected layer of 4096 neurons.

2.2 Support vector machine (SVM)

The Support Vector Machine (SVM) uses the powerful kernel based function for classification and regression task [14]. Generally SVM’s with good parameters gives better hyperplanes for classification. For a large dataset, the classification problem can be solved by many methods. The support vector machines are consider as a powerful method for classification due to its linear learning techniques and kernel based functionalities. The SVM can be used as binary classifiers for an unbalanced dataset because of its sensitivity in classification. An unbalanced dataset may have different sizes of classes for classification. The SVM learns the parameter and tunes its automatically using grid search algorithm. For an unbalanced dataset, using derivative free methods, the parameters are optimized in context of regression. This optimization is done by mean squared error estimation. The methods used in SVM insists the learning of parameters and optimization of parameters in which the derivative information is not needed [11, 13]. In SVM a tenfold cross validation is performed in 10 training and testing stages to obtain better accuracy.

2.3 Microscopic data acquisition system: Survey

In order to provide an efficient data acquisition system, a programmable microscopic stage is proposed. The available motorized microscopic stages in market are expensive and developed with some limitations. These stages can automatically move in X, Y geometrical directions for scanning the specimen [16]. The automated microscopic stage along with some image recognition part, makes a reliable diagnosis of the disease. In 2014, Champbell et al., developed a 3-axis low cost motorized microscopic stage for the photon microscope. The motorized stage is actuated by a open looped stepper motors for linear translators [9]. Although the stage can move in X, Y and Z directions, the accuracy may decrease due to stepper motors. During cell counting in stereological operations, the information on volume, surface area and dimensional analysis are done. In dimensional analysis, the area to be analyze from the slide is chosen by points, lines and area. Using this stereological dimensions, the microscopic stage moves over the specimen for cell manipulation and morphology detection. The cell proliferation applications are crucial in automated microscopy due to tracking and migration of microbes [21]. The angle at which the microbes moves can be tracked using the location and direction of the migrated cell. In cell manipulation system, a submicron precision motorized stage is developed with two stepper motors to actuate the shaft [8]. For increased precision of the micro-controlling system, a serial communication chip FT232RL along with ATMEGA 8 software was used by Bhakti et al. The horizontal and vertical resolution of the stage movements is calculated as 0.198 ± 0.001 μm/step and 0.197 ± 0.004 μm/step.

3 Proposed system architecture

This proposed system targets the basic health care diagnosis for remote areas. In an acute and chronic condition, the proposed system helps to reduce the dependence on technicians for diagnosis. Even the persons with less knowledge and expertise in handling the clinical microscopes can also examine the microbial samples using this proposed system. The proposed CMDS is also well suitable to serve the purpose in assisting health nursing in remote disease prone areas for microscopic specimen examination. The CMDS is further divided into two sub-systems, namely Data acquisition system and Microbial recognition system, Fig. 1 shows the overall system flow and architecture of the proposed system.

Fig. 1
figure 1

Proposed system architecture of CMDS

In the data acquisition system, scanning movements for smear examination are automated by a programmable microscopic stage. Different types of scanning movements for specimen examination are done using our programmable microscopic stage. This microscopic stage incorporates all the possible horizontal and vertical directional movements in a single framework. Hence after the acquisition of image or video, the data is given to software for preprocessing and image recognition. In the preprocessing phase acquired image is re-scaled to fit into the image recognition system.

In microbial image recognition system simulation of human intelligence was carried out using deep neural network layers and classifiers are added after the top layers for identification of microbes. The infected and non-infected microscopic images are fed into the customized VGG net with its pre-trained weights. The partial representation of image just above the fully-connected layer in VGG nets and the respective diagnosis results reported by the technician are given to a powerful kernel based support vector machine (SVM). Similarly more infected and non-infected images along with its test reports were collected for large samples. These acquired datasets are used to self-tune the parameters of the support vector machine for better classification and recognition. Thus the automation of screening process using the programmable microscopic stage and the microbial recognition model reduces the reliance on skilled technicians for specimen examination and increases sensitivity [27]. Hence this data acquisition system along with the microbial recognition software bundle can be used in remote areas for early diagnosis and detection of infectious diseases [2,3,4]. The functional details about the data acquisition system and microbial recognition system are given in the following Sections 4 and 5.

4 Data acquisition system

To assist the technician in specimen examination and to increase the sensitivity, a programmable robotic arm is designed for the microscope [15]. The robotic arm is attached to the microscopic stage to automate X, Y movement of the microscopic stage. The specimen to be examined is placed over the microscopic stage and scanned for the detection of bacteria. Circular, Inward spiral, standard zig-zag or Interlaced scanning patterns can be programmed for the microscopic stage movement. This makes the whole slide scanning more robust towards false bacteria detection. After screening the whole slide, all FOV’s are acquired as a video and passed to the microbial recognition system.

The proposed design of a programmable microscopic stage consists of two components: micro control and machinery components. The micro control components consists of a Licensed DraftSight CAD software for programming the input to micro controllers. These user define inputs are imported as DFX files into a Micro-Computer Numerical Control (Micro-CNC) software. This software supports the users to have control over the microscopic robotic arm. The Micro-CNC software extracts the directional information from the user inputs and pass it to a Machine Control Unit (MCU) for generating the control signals. The machine control unit has a Programmable Logic Controller (PLC) for generating control signals, which acts as a platform between the micro control and machinery components. On receiving the control signals machinery components gets actuated. These machinery components in turn has a linear driving system, which is attached to the microscope stage for automated X, Y directional stage movements. The linear driving system has a close loop feedback based servo motors and drives for actuating the precise movement of the stage as shown in Fig. 2.

Fig. 2
figure 2

Motorized stage with image acquisition system

4.1 Micro control components

The micro control components consists of control system generation phase which includes the input system and machine control unit. The scanning pattern of the microscope is passed as input to the Machine Control Unit (MCU) in DFX file format or G-code. DraftSight, a CAD software is used for drawing 2-dimensional scanning pattern. Then the drawn 2D scanning pattern is passed to the Micro-CNC software for extracting the movement and X, Y direction information. The Micro-CNC software interprets the ASCII format codes from the DFX files and convert into G-Codes, for a machine understandable format. These G-Codes are then passed into Machine control Unit which has the Programmable Logic Controller to generate control signals.

4.2 2-dimensional input scanning patterns

As per the WHO standard, the specimen is spreaded over slide at the size of 2 × 1 cm. Therefore to view the entire specimen, the total area to be covered is 20,000 μm square, which is fatigue to the technician. To cover the entire specimen, stage movement is automated by user defined scanning patterns. DraftSight makes the work easier in drawing a 2D scanning pattern. Therefore to scan a complete slide, scanning pattern is drawn using DraftSight and G-codes are generated through Micro-CNC software.

The DFX has a 2-dimensional image file, which has data about the scanning pattern of the microscopic stage. The DFX file have information in the form of ASCII or binary formats, which are in turn imported into the Micro CNC software for generating G codes. The floating point precision of DFX file is of 16 decimal points which is either in ASCII or binary. The ASCII format code is saved in text file that can be read and edited. The DFX files consist of header, tables, blocks, classes, objects, entities and end of film marker.

  • The Header consist of variables that have the AutoCAD settings.

  • The Table section consists of list of information used in the drawings, such as line types, font and layer names.

  • The Block section contains predefined drawing elements.

  • The Class section contains description of application defined classes of objects.

  • The Object section have non graphical details of the drawings.

  • The Entities section consists of the object data of the drawing. This includes raw data, LINE and ARC entities.

4.3 Algorithm for drawing the pattern

The step by step procedure for drawing the 2D input pattern for a left to right scan is as follows.

  1. 1.

    Select a line to be drawn with starting coordinates (0,0)

  2. 2.

    Specify the distance(mm) to be covered in X axis

  3. 3.

    Specify the distance to be moved between consecutive lines in Y axis

  4. 4.

    The source entity is specified and boundary for scanning the entire slide in X axis has to be given

  5. 5.

    Draw the consecutive lines, until the specimen area is covered.

4.4 Micro-CNC software

A Micro-CNC software is developed to perform complicated horizontal, vertical movements and better user interface. The DFX file generated from DraftSight is passed to the Micro-CNC software to generate G codes, which orderly communicates with the PLC. This makes the system high versatile and is used for many application. The Numerical Control Programming uses G codes to communicate with the machines. The G codes are also called as preparatory codes in which the words begin with letter G for CNC programming. The G code parameters in DFX files are specified before the code generation.

These DFX files are inserted into the Micro-CNC software for G code generation. The major parameters of G code includes XY federate, the stage movement rate and relative tool path which explains the starting position of the specimen for scanning. Thus the G code commend the machine operations such as direction, acceleration, feed rate, number of steps to be moved, movement of axis ratio and switch coordinate system. After code generation the system is connected with the machine control unit (MCU). The Machine Control Unit is intimated on receiving G codes from the Micro-CNC software.

4.5 Machinery components

The machinery components of automated microscopic stage obtains the input from the Micro-CNC software and passes it to machine control unit (MCU). The machine control unit has the Programmable logic Controller (PLU) for generating the control signals. Thus the MCU acts as an interface between the micro control and machinery components. The generated control signals are passed to the X, Y linear driving system, which is connected with the microscopic stage.

4.6 Machine control unit

The Machine Control Unit is the interface between the two components. The Machine Control Unit (MCU) enables high machining speed, better precision and minimization of errors with increasing feed rates. The MCUs are microprocessor, which have more inputs and output channels. The MCU as an intermediate system gets the input from the micro-CNC software and extracts the directional and control information for linear driving system. These information are converted into digital signals through Programming Logic Controller (PLC). The machine control unit (MCUs) have two sub blocks for performing the operations.

  1. 1.

    Data Processing Unit

  2. 2.

    Control Loop Unit

4.7 Programmable logic controller (PLC)

Programmable Logic Controller acts as an interface between the hardware and software components. It works within the Machine Control Unit and communicates with machinery components. The PLC is a micro controller system that makes decision based upon the input programs to control the operations of the output devices. This PLC control system is used to achieve micro-controlled machine movements. By communicating significant information, the PLCs can change their directional movement operation.

4.8 Motorized microscopic stage

Generally the microscopic stage is moved manually in X, Y axis with a translation knob coupled to the microscopic stage. To automate the movement of the microscopic stage, a linear driving system is attached to the microscopic stage, which automates the X, Y directional movement. The linear driving system is linked with servo motors and servo drives to obtain microscopic stage movement with higher accuracy. Servos are small mechanical devices whose sole purpose is to rotate a tiny shaft extending from the top of the servo housing. The low-voltage command signals from the drive, which in turn applies the necessary voltage and current to the motor, resulting in the desired motion. To drive the servo controller via USB, we have used Microsoft Visual basic 6.0 APIs to connect, accelerate, speed control and stop movements of servo drive.

First, we need to connect the computer to PLC servo drive controller via USB using VB6 (Visual Basic 6.0) API’s are employed for doing this purpose. First, we need to find/assign the port number being used by servo controller; same should be assigned by adding Microsoft Communication control object (MSCOM) using VB6 library. In order to set up the servo speed, acceleration, target, the defined MSCOM control were assigned with appropriate value and action parameters. Thus enables the motorized stage along with the camera makes a complete data acquisition system as shown in Fig. 2. In our work for bacilli detection, the data acquisition is developed by a motorized stage in Olympus CX-21i binocular microscope, attached with a high definition canon 1200D camera for image and video acquisition.

5 Microbial recognition system

Deep learning methods use convolution, max-pooling, fully-connected layer and softmax function to learn the lower level parameters for classification. It learns the parameters by back propagating the error values. Here we exploit that a SVM replaces the fully-connected layers for better recognition of microbes. Thereafter we optimize the support vector machine from the partial representation of images, which are inferred from DeepNets before the fully-connected layers. For our study we retained top 13 layers of VGG16 and 16 layers of VGG19 and hybridized with SVM for better classification. Fig. 3 shows the hybridization of VGG16 and SVM.

Fig. 3
figure 3

Modified VGG16-SVM Classification model

For a large scale image recognition systems, Visual Geometry Group (VGG) used an effective deep convolution network to increase the accuracy. In VGG the depth of network architecture is increased using a very small convolution filters of size 3 × 3. This improves the prior-art configurations by pushing the weight layers depth to 16 and 19. This deep convolution network works well for all generalize database and achieves a state of the art results. The important feature of this architecture is the network’s increased weight layers depth. In VGG16, a RGB image of size 224 × 224 is passed through the convolutional layers that have 5 blocks, where each block has a 3 × 3 convolution filters which increases in numbers. When the convolutional layer inputs are quilted together, stride is fixed to 1 pixel in order to preserve the spatial resolution after convolution (i.e. the padding is 1 pixel for 33 convolution filters). After convolution max pooling layers are used to separate the blocks. Max-pooling is done with stride 2 pixels over 22 windows. The pooling layers are in between the successive convolution network. The pooling can be done using maximum pooling or average pooling. Thus the pooling layers reduce the spatial size of image representation and controls overfitting in the model by reducing the parameter and computations of the network. Our hybrid architecture has 5 blocks of convolution layers followed by support vector machine. The final layer of the network is used to get output class probabilities, which is a softmax layer.

5.1 Transfer learning

After getting partial representation for the dataset using VGG convolution network, weights are removed before the fully connected (FC) layer, and classified using shallow machine learning techniques. The pre-trained ConvNet works better, if the dataset has fixed feature extractor for object recognition [10]. Thus in case of microbial object identification, since fixed feature recognition is used for microbial recognition, the weights are extracted before the fully connected layer and are classified using linear classifier like SVM for the dataset.

5.2 Support vector machine classification

SVM is generally used for finding an optimal separating hyper-plane or decision surface. SVM is categorized with the help of a nonlinear transformation Φ, even when the data are linearly inseparable [32]. The quadratic programming problem is solved for finding the best hyperplane. This transformation was carried out by kernel functions like linear, radial basis function, sigmoid and polynomial for the TB dataset.

The trained SVM quality particularly depends on settings of the learning parameters which are present in the model. In order to determine these values in an automated way, a framework is described for unbalanced datasets. In order to achieve this, a certain quality measure is maximized by the gradient-free numerical optimizer which reflects the trained SVMs performance used for cross validation tests.

The training can be done efficiently in SVM’s using additional learning parameters. So a grid search algorithm is used to calculate values for the parameters. Hence in our recognition system, the bottle-neck features are extracted using VGG16 and VGG19 layer DeepNets and are given as input to the SVM classifier. Thereafter the values of parameters are automatically determined by the kernel method.

5.3 Design and development of the modified DeepNet

The network was developed using a high end neural network API called Keras. The application can be written using python which runs on top of Tensorflow or Theano or CNTK libraries. It make use of the GPU and CPU for doing the computations. The keras application has the pre-trained weights of Imagenet dataset which helps in feature extraction and prediction.

$$ \mathrm{model}=\mathrm{applications}.\mathrm{VGG}16\left(\mathrm{include}\_\mathrm{top}=\mathrm{False},\mathrm{weights}={}^{\hbox{'}}{\mathrm{imagenet}}^{\hbox{'}}\right) $$

Here the top layers before the fully-connected layer is removed and the weights are transferred into SVM for classification.

6 Experiments and discussion

This section describes the details of Microscopic image collections of infected & non-infected AFB, training and testing of VGG-SVM model for classification of infected and non-infected AFB from the collected images.

6.1 TB digital image collection for learning

For training and validating the introduced deep learning model VGG-SVM, it is necessary to collect digital images/video of infected and non-infected TB bacilli from the Ziehl-Neelsen (ZN) stained (also known as the acid-fast Bacilli - AFB) sputum smear specimen from various patients as well as the existing microscopic images which are obtained from open source database.

Herein two different sources of data have been employed to establish TB digital corpus and are mentioned as follows;

  1. 1.

    Microscopic digital videos acquired from sputum smears.

  2. 2.

    Infected and Non-infected microscopic digital images collected from the existing public corpus

Pondicherry Institute of Medical Sciences (PIMS) is a multi-specialty hospital, bordering the state of Tamilnadu and Pudcherry in Southern part of India, have been preparing thin smear with significant areas of sputum with a view to screen pulmonary tuberculosis. PIMS had been sharing ZN stained sputum smears of patients who were infected by tuberculosis with anonymity to us since 2016 to establish a digital image/video TB digital corpus. Using the pre-defined scanning pattern of the programmable microscopic stage, the sample is scanned from top left corner and proceeds with line by line scanning from left to right followed by right to left with a vertical gap of 180 mm. This movement offer zig-zag scanning pattern during the microscopic motion. The FOV details are recorded using a digital camera mounted on the microscope with a frame rate of 25 Frames per Second (FPS). The size of every frame is 1920 × 1280 pixels with depth as 72 DPI and the focal length of camera at 50 mm. Later on, we isolate the non-overlapped FOV frames from the acquired video. A total of 13 sputum specimens from different patients are digitized using the proposed data acquisition system. After digitization, the specimens are handed back to PIMS for secure disposal of specimen in accordance with medical standards.

In addition to the acquired data, we collected ZN stained overlapped bacilli and non-bacilli FOV sputum smear images from Ziehl-Neelsen Sputum smear Microscopy image database (ZNSMiDB) [30]. This database was developed by Shah et al., from Jaypee University of Information Technology, in collaboration with Indira Gandhi Medical College, India. The TB database includes over 500 varies microscopic images at a resolution of size 800 × 600 with 72 dpi. This database has a variety of microscopic images like overlapped bacilli images, few bacilli FOV images and non-bacilli FOV images, acquired from three different bright-field microscopes.

6.2 Classification of TB bacilli using VGG16 and VGG19

A total of 1242 images with 620 infected FOV’s and 622 non-infected FOV’s are first classified by fine-tuning the weight layers of fully connected layers of VGG16 and VGG19 deep networks separately. In addition to the trained parameters from VGG network model, it also trains and learns feature from the given TB images. During training, the input sample is resized to 224 × 224 and each pixel intensity was subtracted from the RGB mean of image. After pre-processing the image is fed into a sequence of convolution layers with a 3 × 3 convolution mask, followed by pooling layers for sub-sampling. Convolution filters of size 1 × 1 are used in one of the configurations, in which the linear transformation of the input channels is processed. Again it is transformed into non-linear data and passed to the next convolution layer. To preserve the spatial resolution after convolution, the stride during convolution is fixed to 1 pixel. This is considered as the spatial padding of convolution layer input, i.e. the padding of 1 pixel for 3 × 3 convolution layers. With the help of five max-pooling layers, the spatial pooling is carried out, where some of the convolution layers are followed by pooling layer. Then the max-pooling is performed on the output of convolution layer with stride 2 in a 2 × 2 pixel window. In a VGG16 weight layer network, it has 13 convolution layers and 3 fully-connected layers and in VGG19 weight layer network, it has 16 convolution layers and 3 fully-connected layers. Here the pre-trained shared weights of VGG16/VGG19 are imported from the model and are used to train the TB image dataset. For classification of TB bacilli, trained weights are fine tuned in the fully-connected layer with 0.5 dropout and used sigmoid activation function in the last fully-connected layer to obtain an accuracy of 63.45%.

While fine-tuning the dataset in VGG16, the accuracy value in training dataset is 0.98936 and loss value is 0.00532. After training, the test samples are validated with an accuracy of 63.45%. as in Fig. 4a, b. Similarly, the same set of images were trained and tested with VGG19 obtains a validation accuracy of 65.18% and illustrated in Fig. 5a, b.

Fig. 4
figure 4

a. Accuracy for training and validation of TB dataset using VGG16 b Loss on training and validation of TB dataset using VGG16

Fig. 5
figure 5

a Accuracy for training and validation of TB dataset using VGG19 b Loss on training and validation of TB dataset using VGG19

6.3 Transfer learning from VGG and hybridation with SVM

In transfer learning the pre-trained VGG nets weight values are taken just before the fully-connected layer and its output values were fed to the support vector machine for classification of non-infected and infected TB images. In transfer learning the rest of VGG ConvNets, other than fully connected layers is treated as fixed feature extractor for the given TB dataset. After extracting the features from the top stack convolution layers, the features are passed to the support vector machine for classification. In SVM, optimization of the parameters gives better separation of hyper plane and orientation of data. Hence for the TB data, parameters are important for having a good threshold in computing the validation classes for unknown data points. While training a SVM, these parameters may not be visible for the user and are known as learning parameters [17]. The error toleration and margin maximization in SVM classification is determined by Cost-value, which is the learning parameter for classification. In case, if larger is the C value, then the margin is narrow with few training errors and if the C value is smaller, it has a large margin with many training points located inside the margin. For an unbalanced dataset, the margin should not be too narrow. The SVM uses the grid search algorithm to find the best C value and kernel function for the TB image dataset. The best kernel function is searched between linear, Gaussian Radial Basis Function (RBF), polynomial and sigmoid functions. For our TB bacilli image dataset, the best accuracy in recognition is obtained using linear kernel function and Cost-value as 1.3. The Table 1, gives the comparative TB recognition accuracy using VGG16 + SVM, VGG19 + SVM, VGG16 and VGG19 ConvNets.

Table 1 Accuracy of DeepNets using transfer learning and fine tuning

7 Conclusion

The existing microscopy based screening is human centric process subject to variations in case detection. In epidemic region, fall short of skilled technicians often delays screening and prevents prompt treatment of diagnosis, which leads to latent/chronic stage. The proposed decision support system will be more beneficial for society to manage the crisis of increase microscopic analysis and serve them better and faster with reduced infectious process. It also improves the productivity of screening and reduces the work barrier of laboratory technician. The portability of the system makes it reachable to remote areas. Experiments with tuberculosis dataset obtain better results in case detection using VGG19+SVM (86.6% of accuracy) compared with either VGG16 or VGG19. Thus this cybernetic system helps in community health medications in humanitarian operation and aids as a decision support system. However the limitation of the system lies in the availability of dataset, which plays a role in performance. Moreover the system does not support adaptive learning for optimal experimentation. The future scope of the system is to increase the sensitivity and specificity by customizing a DeepNet for specific communicable or tropical diseases. Reduction of layers may be attempted to reduce the computational complexity in screening. Cloud enabled service might be linked to exploit the high computation on the cloud space rather than input or output through client/mobile device.