1 Introduction

Sunspots, the most prominent features on the solar photosphere, appear dark when observed in visible wavelengths. They also show periodic variations in their properties on an \({\approx}\,11\) years time scale, generally referred as the solar cycle (Hathaway, 2015). In fact, after the observations by Hale (1908), it became clear that sunspots are the locations of strong magnetic fields (\({\approx}\,4\) kG) which inhibit convection within them. Due to such a suppression of energy, they appear as dark structures (Solanki, 2003). A closer inspection of sunspot images reveals that there are, actually, two different features within a spot: a darker (with respect to photospheric intensity) umbra surrounded by a lighter penumbra. This contrast in appearance is generally attributed to different strengths and orientations of the magnetic fields which are present in these two regions (Mathew et al., 2003). Hence, area measurements of umbra and penumbra carry this magnetic field information too. The other importance of these measurements come from their application in calculating the Photometric Solar Index (PSI) values which quantise the decrement of Total Solar Irradiance (TSI) due to the presence of a spot on the solar disc (Fröhlich, 1977; Hudson et al., 1982). Thus, knowledge of long-term variations in the umbra and penumbra area will enhance our understanding of the solar variability.

One of the earliest measurements of umbra and penumbra area values was reported by Nicholson (1933) who studied almost one thousand unipolar or preceding member of bipolar sunspots from Royal Observatory, Greenwich (RGO) between 1917 to 1920. The average ratio (\(q \)), between the area of penumbra to that of umbra, was quoted to be \({\approx}\,4.7\) and it was also found to be independent of sunspot sizes. However, examining the diameters of umbra and penumbra of 53 sunspots as photographed by Wolfer at Zürich, Waldmeier (1939) noted that the \(q \) value decreases from 6.8 to 3.4 as the sunspot area increases from 100 μhem to 1000 μhem. The first investigation of the long-term evolution of this ratio was reported by Jensen, Nordø, and Ringnes (1955, 1956) where the authors analysed the RGO data from 1878 to 1945. Interestingly, they noted that the ratio is a decreasing function of the sunspot size during cycle maxima but the variation is much lower than the values reported in Waldmeier (1939). Several follow-up studies by Tandberg-Hanssen (1956), Antalová (1971), Beck and Chapman (1993) confirmed such results by including more complex sunspots and larger statistics.

Using the largest set of observations as recorded in RGO data (161839 sunspot groups between 1874 – 1976), Hathaway (2013) calculated the \(q \) values for each of these cases and noted that they increase from 5 to 6 as sunspot group size increases from 100 μhem to 2000 μhem. However, the author did not find any dependency of \(q \) on the cycle phase or the locations of the spots. The most remarkable result of all was the behaviour of smaller sunspot groups (area \({<}\,100\) μhem), for which the author found a substantial change in the \(q \) values within a relatively smaller time scale. The ratio decreased significantly from 7 to 3 during Solar Cycles 14 – 16; however, it again increased to \({>}\,7\) in 1961 at the end of Cycle 19.

2 Data

In this study, we have used the newly digitised and calibrated white-light full disk images (Figure 1a) from Kodaikanal Solar Observatory. Details of this digitisation, including the various steps of calibration process, were reported in Ravindra et al. (2013). Recently, Mandal et al. (2017) catalogued the whole spot area seriesFootnote 1 (between 1921 and 2011) by using a semi-automated sunspot detection algorithm on this data. We start our analysis with these detected binary images of sunspots as shown in Figure 1b. In order to isolate the spots, we multiply the binary mask with the limb-darkening corrected full disc images. The final results are displayed in Figure 1c – 1d.

Figure 1
figure 1

Panel (a): A calibrated white-light image from Kodaikanal Observatory as recorded on 1955-01-07 08:15. Panel (b): Binary image of the extracted sunspots. Panel (c): Isolated spots in the original grey scale image produced by multiplying images on Panel (a) with Panel (b). A zoomed-in view is presented in Panel (d).

3 Method

Considering the volume of the data to be processed, we opted for an automatic boundary detection algorithm. A number of methods have already been used in the past to automatically detect umbrae of sunspots: Brandt, Schmidt, and Steinegger (1990) and Pucha, Hiremath, and Gurumath (2016) using a fixed intensity threshold; Pettauer and Brandt (1997) using a cumulative histogram method and Steinegger, Bonet, and Vázquez (1997a) using the inflection method. Despite their successes on other datasets (mostly of smaller duration), we found that none of these methods actually produces a faithful result when applied on the entire set of Kodaikanal data. Main reasons behind this are the varying image quality over time, poor contrast, the presence of artefacts, etc. Keeping these limitations in mind, we select an adaptive umbra detection method based on the Otsu thresholding technique (Otsu, 1979). This method finds the optimum threshold for an image which has a bimodal intensity distribution. In our case, the two different intensity levels of umbra and penumbra constitute a similar type of distribution which is suitable for such an application. Mathematically, to calculate the threshold, this method maximises the between-class variance of the distribution. If \(t\) is the threshold that separates \({L}\) bins of histogram in background class (\(C_{\mathrm{b}}\)) and foreground class (\(C_{\mathrm{f}}\)), then the probabilities of occurrence of background (\(\omega_{\mathrm{b}}\)) and foreground classes (\(\omega_{\mathrm{f}}\)) are

$$\begin{aligned} \omega_{\mathrm{b}} =&\sum_{i=1}^{{t}}P(i)= \omega(t), \end{aligned}$$
(1)
$$\begin{aligned} \omega_{\mathrm{f}} =&\sum_{{i=t+1}}^{{L}}P(i)=1- \omega(t) , \end{aligned}$$
(2)

where \(P(i)\) represents the probability of occurrence of the \(i\)th bin. The between-class variance (\(\sigma_{\mathrm{B}}\)) of the distribution for a particular \(t\) can be written as

$$ \sigma_{\mathrm{B}}(t)^{2}=\omega_{\mathrm{b}}( \mu_{\mathrm{b}}-\mu )^{2}+\omega_{\mathrm{f}}( \mu_{\mathrm{f}}-\mu)^{2}. $$
(3)

In Equation 3, \(\mu\) (the mean of the distribution) and \(\mu _{\mathrm{b}}\) and \(\mu_{\mathrm{f}}\) (the means of the background and foreground class) are defined as

$$\begin{aligned} \mu =&\sum_{i=1}^{{L}}iP(i), \end{aligned}$$
(4)
$$\begin{aligned} \mu_{\mathrm{b}} =&\sum_{i=1}^{t}iP(i/C_{\mathrm{b}})= \frac{\mu (t)}{\omega(t)}, \end{aligned}$$
(5)
$$\begin{aligned} \mu_{\mathrm{f}} =&\sum_{i=t+1}^{{L}}iP(i/C_{\mathrm {f}})= \frac{\mu-\mu(t)}{1-\omega(t)}. \end{aligned}$$
(6)

In this work, we use the cgotsu_threshold.proFootnote 2 routine, an IDLFootnote 3 implementation of the above concept.

We demonstrate the application of this algorithm on our data with two representative examples as shown in Figure 2. The red contours on the spots represent the umbra–penumbra boundary as estimated by visual inspections. We expect an umbral boundary, as detected by this Otsu method, to more or less coincide with this contour. When applied on the original image, the detected umbra turns out to be significantly larger in size as seen in Panels a1 and b1 of Figure 2. Upon investigation, we realise that this over-estimation occurs due to the presence of few brighter pixels on the edge of the detected spots. In fact, these bright pixels are originally a part of the quiet Sun region and got picked up during the sunspot detection procedure. Though the number of such pixels is very small compared to the total pixels of a typical sunspot, it seems to have a significant influence on the derived threshold value. To get rid of these “rouge pixels”, a pre-processing technique is applied before feeding the spots into the Otsu method. We set up an intensity filter which is based on a threshold defined as

$$ I_{\mathrm{th}}=\bar{I}-k\sigma, $$
(7)

where \(\bar{I}\) and \(\sigma\) are mean and standard deviation of spot region. With this criterion, a pixel with intensity (\(I_{\mathrm{n}}\)) greater than \(I_{\mathrm{th}}\) gets removed from that specific spot i.e. we set \(I_{\mathrm{n}}=0\). From Equation 7, we note that k is a free parameter which needs to be optimised. We fix this issue by taking a large subset of randomly chosen sunspots (of different contrasts and morphologies) and repeating the above procedure with multiple values of k. After visual inspections of each of those results, we find that two values, \(k=0.3\) and \(k=0.5\), produce the most accurate results as compared to other k values. However, more often than not, the umbra gets underestimated with \(k = 0.5\) (Figure 2a3, b3).

Figure 2
figure 2

Two representative examples of our umbra detection technique on Kodaikanal sunspot data. The red contours (in Panels (a1) – (b3)) highlight the umbra–penumbra boundaries as estimated by eye whereas the detected umbrae for different set of threshold values are shown as grey regions. See text for more details.

To better visualise this effect, we compare our results with the umbra measurements from Debrecen ObservatoryFootnote 4 (Baranyi, Győri, and Ludmány, 2016; Győri, Ludmány, and Baranyi, 2017) as shown in Figure 3a. The plot highlights the fact that \(k=0.3\) is indeed a better choice for our Kodaikanal data. However, there is a large discrepancy between the Kodaikanal values with those from Debrecen, near the Cycle 22 maximum. To eliminate the possibility of this being an artefact of our umbra detection technique, we also plot the whole spot area between the two observatories in Figure 3b. The presence of a similar difference in this case too indicates an underestimation of the total sunspot area during the original spot detection procedure, as reported in Mandal et al. (2017).

Figure 3
figure 3

Panel (a): Comparison of yearly averaged umbral areas between Kodaikanal (\(k=0.3\) (red) & \(k=0.5\) (green)) and Debrecen data (black). Panel (b): Same as before but for the whole spot area.

Finally, we compute the penumbra to umbra area ratio:

$$ {\mathrm{Ratio}}= q = \frac{A_{\mathrm{W}}}{A_{\mathrm{U}}}-1, $$
(8)

where \(A_{\mathrm{W}}\) and \(A_{\mathrm{U}}\) are the whole spot area and umbra area. This definition is the same as in Antalová (1971) and Hathaway (2013).

4 Results

We calculate the ratio \(q \) for the whole period of the currently available Kodaikanal data which covers Cycle 16 to Cycle 23. Different aspects of this ratio are discussed in this section.

4.1 Individual Variations

To investigate the overall behaviour of \(q \), we group the sunspot areas into bin sizes of 20 μhem between 20 – 2000 μhem and calculate the average ratio (\(q _{\mathrm{avg}}\)) for all the sunspots falling in that particular bin. Figure 4a shows the quantity \(q _{\mathrm{avg}}\) as a function of total spot area. The shaded region represents the standard error of \(2\sigma\) uncertainty. The error bars beyond area \({>}\,1500\) μhem are considerably larger due to the poor statistics in those bins. Initially, the ratio for smaller spots (area \({<}\,100\) μhem), increases rapidly from 3.4 to 5.2. As the area increases (\({>}\,100\) μhem), \(q _{\mathrm{avg}}\) tends to settle down to a value of \({\approx}\,6\) (Jha, Mandal, and Banerjee, 2018). In fact, these results are consistent with the findings by Antalová (1971) and Hathaway (2013). Physically this means that larger spots tend to have a larger penumbra (the observed slow upward trend); however, large uncertainties make this conclusion rather weak. In addition to this, we note that there is a local minimum of \(q _{\mathrm{avg}}\) around 150 μhem, which also needs further investigation; we do not have a convincing explanation for this. The behaviour of \(q \) for every detected sunspot is also analysed and presented in a histogram in Figure 4b. The distribution peaks at \({\approx}\,4.5\) and falls rapidly on both sides from the peak. Another interesting aspect is the coverage of umbra with respect to the total area for any individual sunspot. Figure 4c shows the distribution of this quantity (expressed in %). The distribution peaks at 15%, although there are a significant number of cases between 15% to 25%. These properties are in good agreement with values previously measured by Watson, Fletcher, and Marshall (2011), Carrasco et al. (2018).

Figure 4
figure 4

Panel (a): Penumbra to umbra area ratio as a function of total sunspot area binned over 20 μhem. Grey shaded region represent the \(2\sigma\) errors. Panel (b) shows the distribution of individual ratio (\(q \)) whereas the distribution of percentage coverage of umbral area over the whole spot area is shown in Panel (c).

4.2 Dependency on Cycle Strength and Its Phases

During the onset of a solar cycle, we see very few spots present on the disc (mostly of smaller sizes (Mandal et al., 2017)). They are also located at higher latitudes and with the progress of the cycle, they move towards the equator to form the popular “sunspot butterfly diagram”.

We look for any such dependency of \(q _{\mathrm{avg}}\) by dividing the solar disc into several latitudinal bands. We fold the two hemispheres together and the results are plotted in Figure 5a. As seen from the plot, we find that the ratio does not depend on the latitude of a spot (Antalová, 1971; Hathaway, 2013). In a slightly different representation of the same phenomenon, we isolate the spots according to their appearances during a solar cycle. In fact, we are also motivated by some of the earlier studies by Jensen, Nordø, and Ringnes (1955), Tandberg-Hanssen (1956), Antalová (1971), where these authors reported different values of \(q _{\mathrm{avg}}\) during a cycle maximum as opposed to a cycle minimum. To check this, a cycle is divided into four phases: minimum phase, rising phase, maximum phase and declining phase. The definition of each of these phases is the same as described in Hathaway (2013). Considering all the cycles together, we generate a plot as shown in Figure 5b. In this case, too, we do not notice any change for a given spot range in different phases of cycles. This is consistent with the RGO data as found by Hathaway (2013).

Figure 5
figure 5

Panel (a): Variation of \(q _{\mathrm{avg}}\) as a function of total area in four different latitude bands as written on the panel; Panel (b): Same as previous but separated for four different activity phases of cycles.

The other factor to potentially affect this ratio is the strength of a cycle. Similar spots in a weak cycle (Cycle 16) may have different \(q _{\mathrm{avg}}\) values from a strong cycle (Cycle 19). From Figure 6a – h we note that there is absolutely no variation of \(q _{\mathrm{avg}}\) with cycles of different strengths. A similar analysis by Hathaway (2013) of RGO data showed two different behaviours, specifically for the smaller spots (\(\mbox{area} <100\) μhem), between even and odd numbered cycles. However, we do not find any such relation in our data.

Figure 6
figure 6

Panels (a) – (h) show the variations in \(q _{\mathrm{avg}}\) as recorded for each solar cycle (Cycles 16 – 23). The dashed red line is plotted just for reference.

4.3 Behaviour of Smaller and Larger Spots

Sunspots of different sizes tend to show different behaviour (Mandal and Banerjee, 2016). In this section, we look for the temporal evolution of \(q \) from two class of sunspots: i) sunspots with \(\mbox{area} <100\) μhem (Figure 7a); ii) sunspots with \(\mbox{area} >100\) μhem (Figure 7b). The choice of this threshold at 100 μhem is primarily dictated by the fact that we see a jump in \(q \) value at this area value in Figure 4a. In order to compare our results with Hathaway (2013), we over-plot the \(q \) values for RGO data as shown in Figure 7. For spots \({>}\,100\) μhem, the ratio neither shows any significant time variation, nor any tendency to follow the solar cycles. The over-plotted RGO data is in accordance with our values, except for some systematically lower values during Cycles 16 to 17. One of the highlights of the work by Hathaway (2013) was the large secular variation of the ratio for smaller spots which showed 300% increment with time. However, this property is not visible from Kodaikanal data, which shows that the ratio remains constant at \({\approx}\,4.5\) throughout the time interval. In fact, analysing the Coimbra Astronomical Observatory (COI) data, Carrasco et al. (2018) also reported the absence of any type of secular variation in smaller spots.

Figure 7
figure 7

Yearly averaged values of \(q \) as obtained from Kodaikanal data (red points) for two sunspot classes; for area \({\geq}\,100\) μhem (Panel (a)) and for \(\mbox{area} <100\) μhem (Panel (b)). Similar values from RGO are also over-plotted (grey points) for comparison. Error bars in each case represent the \(2\sigma\) uncertainties.

As mentioned in the introduction, differences in the derived \(q \) values largely depend on the methods that have been used to detect umbra–penumbra boundary (Steinegger, Bonet, and Vázquez, 1997b). Our method of Otsu thresholding has not been utilised in the literature before; thus, we feel the need of checking the robustness of this method on other independent datasets. The following section describes the application of this method on the space-based SOHO/MDI continuum images.

4.4 Application on SOHO/MDI

We analyse SOHO/MDI (Scherrer et al., 1995) continuum images from 1996 to 2010 with a frequency of one image per day. First, we detect the sunspots using the same Sunspot Tracking and Recognition Algorithm (STARA: Watson et al., 2009) as used on the Kodaikanal data. The detected spots are then fed to the Otsu algorithm for umbra detection. Figure 8 summarises the whole procedure.

Figure 8
figure 8

Detection of umbra from SOHO/MDI data. Panel (a): A representative continuum image as captured on 1999-05-14 23:59. Panel (b) and (c): Detected sunspots and its zoomed-in view, respectively. Panel (d): Contours of the umbrae over-plotted onto the spot.

We first compare the whole spot area values between Kodaikanal and MDI and the result is shown in Figure 9a. Computed yearly averages of whole spot areas are very similar to each other (\(\text{c.c.}=0.99\)). A similar behaviour is found for the umbral areas too (Figure 9b). Hence, the overall spot areas measured from these two observatories show similar trends. However, our prime interest in this case is to recover the behaviour of small (\({<}\,100\) μhem) spots as seen from Kodaikanal.

Figure 9
figure 9

Comparison of yearly averaged whole sunspot area and umbra area as extracted from MDI and Kodaikanal sets.

In Figure 10 we plot the \(q \) values (black solid line) for spots with area \({<}\,100\) μhem as calculated from MDI. Kodaikanal values, for the overlapping period, are also over-plotted (in red) for ease of comparison. We see similar trends in the two curves; however, the MDI values need to be scaled up by adding a constant factor of 0.5 to match the absolute values of Kodaikanal. This underestimation in MDI data returns primarily due to the bright pixels present near the spot boundaries. During our analysis, we learnt that it is impossible to completely avoid these bright pixels while using the Sunspot Tracking and Recognition Algorithm (STARA) on large datasets. We can get around this problem by using a suitable \(k\) value as used in the earlier case. However, such a treatment only scales the absolute values, not the trend. Hence we present the results as is.

Figure 10
figure 10

Ratio of areas of penumbra to umbra as a function of time for smaller sunspots.

5 Conclusion

In this paper, we investigated the long-term evolution of sunspot penumbra to umbra area ratio primarily using Kodaikanal white-light data. The main findings are summarised now:

  • A total of eight solar cycles (Cycles 16 – 23) data of Kodaikanal white-light digitised archive (1923 – 2011) and 15 years of MDI data (1996 – 2010) have been analysed in this work. We have used an automated umbra detection technique based on the Otsu thresholding method and found that this method is efficient in isolating the umbra from a variety of spots with different intensity contrasts.

  • The penumbra to umbra ratio is found to be in the range of 5.5 to 6 for the spot range of 100 μhem to 2000 μhem. It is also found to be independent of cycle strength, latitude zone and cycle phase. These results are in agreement with the previous reports in the literature.

  • We segregated the spots according to their sizes and found that there is no signature of long-term secular variations for spots \({<}\,100\) μhem. This result contradicts the observations made by Hathaway (2013) using the RGO data. However, our results are in close agreement with a recent study by Carrasco et al. (2018).

  • To check the robustness of our umbra detection technique, we analysed SOHO/MDI continuum images. These results also confirmed our previous findings from Kodaikanal data including the absence of any trend for smaller spots. During this study, we realised that, although the Otsu technique is robust and adaptive in determining the umbral boundaries, it is also sensitive to any presence of artefacts within the spots.

In the future, we plan to continue our study using the Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI) (Schou et al., 2012) data. This will not only extend the time series but will also allow us to study the effect of a higher spatial resolution (i.e. more pixels within a spot) in determining the optimum threshold. We also plan to use the Debrecen sunspot images (which are available online) and repeat the measurements of this ratio using our method. Debrecen has more than 50 years of overlap with Kodaikanal, which makes this data suitable for cross calibration too.