Abstract
Machine learning is becoming a critical tool for the interrogation of large, complex data. Labelling, defined as the process of adding meaningful annotations, is a crucial step of supervised machine learning. However, labelling datasets is time consuming. Here we show that convolutional neural networks (CNNs) trained on crudely labelled astronomical videos can be leveraged to improve the quality of data labelling and reduce the need for human intervention. We use videos of the solar magnetic field that are divided into two classes—emergence or non-emergence of bipolar magnetic regions (BMRs)—on the basis of their first detection on the solar disk. We train CNNs using crude labels, manually verify, correct disagreements between the labelling and CNN, and repeat this process until convergence is reached. Traditionally, flux emergence labelling is done manually. We find that a high-quality labelled dataset derived through this iterative process reduces the necessary manual verification by 50%. Furthermore, by gradually masking the videos and looking for maximum changes in CNN inference, we locate BMR emergence time without retraining the CNN. This demonstrates the versatility of CNNs for simplifying the challenging task of labelling complex dynamic events.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The SoHO/MDI magnetograms, used to create the flux emergence videos for this study, are available from the Joint Science Operations Center (http://jsoc.stanford.edu/ajax/lookdata.html?ds=mdi.fd_M_96m_lev182). All the flux evolution videos with their emergence or non-emergence labels can be accessed through Harvard Dataverse at https://doi.org/10.7910/DVN/6F25MG. Source data are provided with this paper.
Code availability
The iterative relabelling algorithm has been explicitly depicted in the Methods. The code for data preparation and training the CNN can be accessed in the form of a python notebook via GitHub at https://github.com/subhamoysgit/flux_emergence/.
References
Zhang, Y. & Zhao, Y. Astronomy in the big data era. Data Sci. J. 14, 11 (2015).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. in Advances in Neural Information Processing Systems Vol. 25 (eds Pereira, F. et al.) 1097–1105 (Curran Associates, 2012).
Settles, B. Active Learning Literature Survey Computer Sciences Technical Report No. 1648 (Univ. Wisconsin–Madison, 2009).
Dubey, G., van der Holst, B. & Poedts, S. The initiation of coronal mass ejections by magnetic flux emergence. Astron. Astrophys. 459, 927–934 (2006).
Zhang, Y., Zhang, M. & Zhang, H. On the relationship between flux emergence and CME initiation. Sol. Phys. 250, 75–88 (2008).
Rycroft, M. J. in Handbook of Satellite Applications (eds Pelton J. N. et al.) 1175–1193 (Springer, 2013).
DeForest, C. E., Hagenaar, H. J., Lamb, D. A., Parnell, C. E. & Welsch, B. T. Solar magnetic tracking. I. Software comparison and recommended practices. Astrophys. J. 666, 576–587 (2007).
Lamb, D. A., DeForest, C. E., Hagenaar, H. J., Parnell, C. E. & Welsch, B. T. Solar magnetic tracking. II. The apparent unipolar origin of quiet-sun flux. Astrophys. J. 674, 520–529 (2008).
Iida, Y., Hagenaar, H. J. & Yokoyama, T. Detection of flux emergence, splitting, merging, and cancellation of network field. I. Splitting and merging. Astrophys. J. 752, 149 (2012).
Iida, Y., Hagenaar, H. J. & Yokoyama, T. Detection of flux emergence, splitting, merging, and cancellation of network fields. II. Apparent unipolar flux change and cancellation. Astrophys. J. 814, 134 (2015).
Iida, Yusuke Tracking of magnetic flux concentrations over a five-day observation, and an insight into surface magnetic flux transport. J. Space Weather. Space Clim. 6, A27 (2016).
Jiang, H. et al. Identifying and tracking solar magnetic flux elements with deep learning. Astrophys. J. Suppl. Ser. 250, 5 (2020).
Scherrer, P. H. et al. The Solar Oscillations Investigation – Michelson Doppler Imager. Sol. Phys. 162, 129–188 (1995).
Muñoz-Jaramillo, A. et al. The best of both worlds: using automatic detection and limited human supervision to create a homogenous magnetic catalog spanning four solar cycles. In 2016 IEEE International Conference on Big Data 3194–3203 (IEEE, 2016).
LeCun, Y., Haffner, P., Bottou, L. & Bengio, Y. in Shape, Contour and Grouping in Computer Vision. Lecture Notes in Computer Science Vol. 1681 (eds Forsyth D. A. et al.) 319–345 (Springer, 1999).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (2015).
Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions. Preprint at https://arxiv.org/abs/1710.05941 (2017).
Han, J. & Moraga, C. in From Natural to Artificial Neural Computation (eds Mira, J. & Sandoval, F.) 195–201 (Springer, 1995).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Ruder, S. An overview of gradient descent optimization algorithms. Preprint at https://arxiv.org/abs/1609.04747 (2016).
Acknowledgements
This research was funded by NASA grant numbers 80NSSC19M0165 and 80NSSC18K0671.
Author information
Authors and Affiliations
Contributions
S.C. and A.M.-J. planned the experiments and wrote the paper. S.C. set up and ran the experiments. A.M.-J. provided the list of events that was analysed. D.A.L. assembled the video sequences used in this work and helped edit the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Astronomy thanks Andong Hu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Source Data Fig. 1
Raw data for all video frames with header information in FITS format.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data for all of the rows and columns.
Source Data Fig. 4
Statistical source data for creating the plots.
Source Data Fig. 5
Statistical source data for creating the plots.
Source Data Fig. 6
Statistical source data for colouring the video frames.
Source Data Fig. 7
Statistical source data for creating the plots.
Rights and permissions
About this article
Cite this article
Chatterjee, S., Muñoz-Jaramillo, A. & Lamb, D.A. Efficient labelling of solar flux evolution videos by a deep learning model. Nat Astron 6, 796–803 (2022). https://doi.org/10.1038/s41550-022-01701-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41550-022-01701-3
- Springer Nature Limited