Keywords

1 Introduction

Galaxy is a spiral combination of stars, dark matter, dust, and glasses. Galaxies are of several types and kinds with different features. There are innumerable galaxies in the universe. The study of these galaxies, their types as well as their properties is really crucial [1]. The reason behind this is that galaxies are an important indication of the origin of the universe and its development [2]. Galaxy Classification plays an important role in evaluating the universe in order to understand galaxy formation [3]. Basically, Galaxy Classification is nothing but the system which dissects the galaxies morphologically on the basis of their appearance and impression. There are some strategies that are used to classify the galaxies morphologically in order to get accurate results. The challenges faced by astrophysicists are solved by these classification strategies. Operations are performed on large databases containing information about galaxies so that astrophysicists could test the propositions and get new solutions [4]. Using this, they can elaborate on the formation of stars, the evolution of this universe, and the science behind the processes which administer the galaxies.

So, classifying galaxies is studying, operating, and graphically inspecting galaxy images in 2-D (two-dimensional) form and then differentiating them as per their look [5]. This process is becoming time-consuming day by day because of the evolution of cameras with charged coupled devices and the increasing size of telescopes which is resulting in an increase in the size of databases containing astronomical data, for example, the dataset which is used by the proposed model [6]. The size of the database makes it impossible to analyze the data manually. This model training and testing are most important based on the images to be classified. Classification of galaxies has become more challenging because galaxies are complicated in nature and image quality [7].

The most helpful part of galaxy classification is it will help the astronomers in differentiating the galaxies accurately. In recent years, the evolution of computational tools and training algorithms have advanced the research and analysis of galaxies and their appearance [8].

In the field of Visual Detection as well as Recognition, deep learning plays an important role as it has brought a great scope for improvement. Unprocessed data is fed as an input to Deep Learning Techniques for optimizing the segmentation parameters or feature designs which do not need any prior expertise in the field.

2 Literature Review

  1. 1.

    In recent years, data on galaxies is generated in a large amount due to some sky- surveys like Sloan Digital Sky Survey i.e., SDSS. That’s why an efficient classification was a need in this speedy Deep Learning Era. So, Algorithms were intended to differentiate the galaxies into multiple classes. Earlier galaxies were classified into six classes. But the proposed model is classifying the galaxies into eight classes [1]. The proposed model is having a framework that extracts the important features making it reliable. The model got 84.73 percent of testing accuracy after considering important features.

  2. 2.

    As the neural network algorithms are broadly focusing on the survey information, the Radio Galaxy Classification has become a researcher's need. Whether the trained algorithms are having the cross-survey identification potential or not was the main query before researchers. The authors thought of building a Transfer Learning Model to solve this problem. It was used in different radio surveys. The model was also trained for Random Initialization and got good accuracy. First pre-trained images with inherited model weights were trained in such a way that they became performance boosters. The model was trained with NVSS data [10].

  3. 3.

    The paper presented the model with Deep Convolutional Network in order to classify Galaxies. According to features observed morphologically, classification was done with three main classes namely Elliptical, Spiral and Regular. The framework which was proposed in the research had broadly 8 layers, along with one main layer consisting of 96 filters for extracting main features. The model was training near about 1356 images which give a good testing accuracy. The proposed model was performing well with the classification queries efficiently [12].

  4. 4.

    Morphological Classification of Galaxy is becoming trending in recent years. As far as we are discovering the universe in a broader vision, there is a need to build a robust and reliable model to classify the galaxies from their images. Researchers proposed two fundamental approaches, one was based upon traditional machine learning techniques with non-parametric morphology and the other was based on Deep Learning. Researchers built a system called CyMorph with a non-parametric approach. The dataset used was Sloan Digital Sky Survey i.e., SDSS with release 7 (DR7). In this work, three classes were considered. Overall Accuracy obtained was great because Decision Tree Models were used to approach the quality of the classification morphologically [13].

3 Methodology

3.1 Dataset Collection

The dataset which is used in our proposed research is the Galaxy10 DECals Dataset which is basically an improved version of the Galaxy10 Dataset. The dataset of Galaxy10 was originally created by Galaxy Zoo (GZ) Data Release 2 where they were classifying 270 thousand SDSS galaxy images among which 22 thousand of those images were selected in the grouping of 10 broader classes. It was done on the basis of votes given to every image. Later, the images from DECals is DESI Legacy Imaging Surveys were utilized by Galaxy Zoo in order to get images perfectly resoluted and greater image quality. The Galaxy10 DECals which merged all the three (GZ i.e., Galaxy Zoo DR2 with DECals images rather than SDSS images and DECals campaign ab, c) which then results in 441 thousand unique galaxies covered by DECals. After that 18 thousand of those images were selected in 10 broader classes. This was obtained after the continuous filtering with the votes obtained for each image. There was one more class named Edge-on Disk with Boxy Bulge which was later abandoned due to less number of images which was just 17 images. This was done to make the dataset of Galaxy10 DECals more distinct and precise in the formed 10 broader classes (Figs. 1, 2 and 3).

The dataset used contains 256 × 256 pixel-colored Galaxy images (g, r, and z band). It contains 17736 images which are broadly classified into 10 major classes. Images of the dataset come from DESI Legacy Imaging Surveys and labeling was done by Galaxy Zoo.

Table 1. Type of galaxies with the number of images
Fig. 1.
figure 1

Sample Images from each Class of Galaxy10 DECals

3.2 Proposed Deep Galaxies CNN Model

This project is based on morphological classification using Deep Convolution Neural Network. It shows how computational cosmology could help to make hard classification easy. The galaxy can be classified in various ways, they can be in three classes namely - Elliptical, Spiral, Irregular or they could be classified deeper as Disturbed, Merging, round smooth, barred spiral, bulge, etc.

This project uses astroNN dataset, this dataset is generated from [DESI Legacy Imaging Survey], and images got themselves labeled by [Galaxy Zoo].

Fig. 2.
figure 2

Flow diagram of the proposed model

Fig. 3.
figure 3

Proposed model architecture

The important advantage of the proposed model will be classifying the results into more classes (class 0 to class 9, as explained in Table 1). This is obligating to Astrophysicist and Astronomers.

3.3 Overview of Algorithms

3.3.1 Machine Learning

That works on building software applications being accurate in predicting the results and outcomes which will not need any external programming help. Machine Learning can help industries and companies to understand as well as examine their customers and consumers subordinately [4]. In addition to this, this predictive learning method can help in the formation of operating systems (OS) of self-driving cars, for example, Tesla. Data of the consumers is collected and correlated with their behaviors all the time. Machine Learning is used as an important as well as the primary driver for the business models of the companies such as Uber.

3.3.2 Deep Learning

In Deep Learning, a machine that imitates the neuron network in a human’s brain can be created. The basis of it is on Deep Networks in which a task is divided and then distributed to machine learning algorithms. The algorithm includes many connected layers. A set of hidden layers is in between the input layer (1st layer) and the output layer (last layer), which gives the word Deep, meaning that the networks that join neurons are in more than 2 layers [8]. The interconnected neurons propagate the input signal and after that, it goes through the process [14].

3.3.3 Convolutional Neural Network

Convolutinnal Neural Networks or CNNs are types of artificial neural network that is ANN, which is broadly used for image classification and object recognition. They are types of discriminative models that are initially developed to work on images that are not pre-processed. Now, these models can also work with text and sound recognition and classification. CNN architecture was framed and designed for recognition models which can read the zip codes and digits, which was constructed in the era of the 90s (1990) which initiated the deep artificial neural networks or ANNs [6].

3.3.4 Dropout Layer

Dropout layer in CNN architecture acts as a filtering layer that nullifies the efforts of some neurons when passing to the next layer. A dropout layer is added to the input vector which normalizes some features. These layers play an important role in preventing the overfitting problem while training the data.

3.3.5 Optimizers

Optimizers are techniques or algorithms that are used to change the features of neural networks which includes learning rate and weights. These are used to minimize losses and errors. Also, as the functions are minimized, optimizers are used in the Optimization Problems wherever needed. The following update equation is used to initialize some policies with renovated Epoch.

$$W_{New}=W_{old} - {\text{lr }} \ast (\nabla {\text{WL}})W_{old}$$
(1)

Equation 1 is used to Renovate Epoch.

3.3.6 Pooling Layer

Pooling Layer is used to minimize the size of the image which is obtained from the reduction of dimensions of the feature maps. The number of learning parameters is reduced as well as computational efforts are minimized by using this Pooling Technique [11]. It basically recollects all the important features present in the convolution layer. There are two types of Pooling which are as follows:

  1. a.

    Max Pooling: Selection of the maximum values from the matrix of specified size (basically, the default size is 2 X 2) takes place. Due to this Feature Extraction of more important parameters takes place.

  2. b.

    Average Pooling: The average of all the pixel values of the matrix (basically, the default size is 2 X 2) takes place in the pooling layer.

3.3.7 Dense Layer

After the Pooling procedure, the output is then passed on to the Dense Layer. The constraint of the Dense layer is that the input should be in a One-Dimensional or 1-D shape (the array should be 1-D) [14]. The image classification takes place in this layer on the basis of the output of previous convolutional layers. The input received is known as Dense.

3.3.8 Flatten Method

Flatten method is used in the conversion of a multi-dimensional matrix to a single- dimensional matrix. This method avoids the overfitting problem thus data is easier to interpret. It includes Preprocessing of Data and Data Augmentation (Table 2).

4 Comparative Results

Table 2. Comparative analysis of existing and proposed model

4.1 Model Accuracy

A validation method used in the classification problems in the Machine Learning Model is known as Accuracy. The overall accuracy of the proposed model is 84.04% (Figs. 4, 5, 6 and 7).

Fig. 4.
figure 4

Graph of model accuracy

Fig. 5.
figure 5

Graph of model loss

Fig. 6.
figure 6

Confusion matrix of result

Fig. 7.
figure 7

Final result of classified galaxies

(1)

Actual: Edge-on Galaxies without bul

Predicted: Disturbed

(2)

Actual: Merging

Predicted: Merging

(3)

Actual: Cigar Shaped Smooth

Predicted: Cigar Shaped smooth

(4)

Actual: Round Smooth

Predicted: Round smooth

(5)

Actual: Disturbed

Predicted: Edge-on Galaxies with bulge

(6)

Actual: Unbarred loose spiral

Predicted: Edge-on Galaxies without bulge

(7)

Actual: Merging

Predicted: Merging

(8)

Actual: Distributed

Predicted: Merging

(9)

Actual: Distributed

Predicted: Unbarred

(10)

Actual: Unbarred loose

Predicted: Edge-on Galaxies

(11)

Actual: Cigar Shaped Smooth

Predicted: Cigar Shaped

(12)

Actual: Unbarred tight spiral

Predicted: Unbarred tight spiral

(13)

Actual: Merging

Predicted: Merging

(14)

Actual: Merging

Predicted: Merging

(15)

Actual: Round Smooth

Predicted: Round Smooth

(16)

Actual: Distributed

Predicted: Distributed

(17)

Actual: In-between Round Smooth

Predicted: In-between Round Smooth

(18)

Actual: Round Smooth

Predicted: Round Smooth

(19)

Actual: Merging

Predicted: Merging

(20)

Actual: Distributed

Predicted: Distributed

(21)

Actual: Merging

Predicted: Merging

5 Conclusion and Future Scope

Galaxy Classification is a traditional yet interesting topic. It has helped researchers, astronomers, and astrophysicists in dissecting the galaxies morphologically on the basis of their appearance and impression. This paper was based on research on Deep Convolutional Neural Network Framework on the Galaxy Classification issue. The proposed model and framework have trained over 17736 images and accomplished 84.04% testing accuracy.

In future work, the focus will be on galaxy morphology classification should be done to a greater extent with more accuracy. The next step of the research will be training the model with larger and higher quality databases and the dataset. Advancement of the Algorithms for Galaxy Morphological Classification will be taken to a greater extent with the help of advanced Deep Learning Techniques.