Design and Application of Fault-Tolerant On-Board Computer System with High Reliability

Chen, Yukun; Zhang, Dezhi; Rong, Gang; Wang, Xu; Qiu, Feng

doi:10.1007/978-981-99-2653-4_37

Yukun Chen⁴⁰,
Dezhi Zhang⁴⁰,
Gang Rong⁴⁰,
Xu Wang⁴⁰ &
…
Feng Qiu⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 872))

Included in the following conference series:

International Conference in Communications, Signal Processing, and Systems

298 Accesses

Abstract

On-board computer system has a significant role in spacecraft electronic system, and its reliability is especially essential to achieve final mission. In order to still work normally when on-board computer has failure, system architecture, switch method and estimation rule were introduced. On the basis of preserving the state signal between main computer and switch circuit, the state signal between backup computer and switch circuit, and the additional state signal between main computer and backup computer was adopted, then presented a modified independence switch method. By using modified independence switch method, independence switch function still worked normally when independence switch module had failure. Comparator was implemented by adopting software vote and software switch approach, and it could eliminate hardware comparator’s key failure. Results indicated that the redundant technology could effectively improve the reliability of space on-board computer system. The scheme has engineering application value for design and application of space on-board computer system with high reliability.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Research on Triple-Module Redundancy Computer with Reconfigurable Capacity

Fault-Tolerant Architecture of Storage Device for On-board Spacecraft Control Systems

Article 01 January 2019

Reliability Analysis of Triple-Redundant CompactPCI SBC

Keywords

1 Introduction

With the development of aerospace technique, the spacecraft reliability and security must meet the needs of task. On-board computer system under space atmosphere is influenced by plasma, energetic charged particles, earth magnetic field, solar electromagnetic radiation, meteoroid, and so on, which will degrade performance. On-board computer system in space orbit has the feature of unmaintainability except for space station [1]. Aircraft mission may fail when On-board computer system has failure. Fault tolerance technology has become a urgent topic for on-board computer system to increase reliability.

2 The Dual Redundant Hardware Architecture and Switching Strategy

Backup fault tolerance architecture is common means for fault tolerance system architecture. It has four types, includes cold standby, warm standby, hot standby and duplex mode [2]. Redundancy will increase additional costs. According to performance and reliability, on-board computer system adopts multiple scheme based on flight sequence.

2.1 The Fault Tolerance Architecture of Dual Cold Standby

When aircraft is under station control or stable self-control, the scheme of one hot standby and one cold standby is appropriate to insure working life and decrease power consumption [3]. The typical fault tolerance architecture of dual cold standby includes two same sets of processor and multiple I/O, as depicted in Fig. 1. Fault tolerance module manages the switch between dual on-board computer. When host has failure, control right is transferred from host to backup. When backup also has failure while host don’t recover, control right is transferred from backup to emergency module.

Three factor should be taken into account, one is failure test, two is switch from host to backup, and finally is state recovery. Failure test is indispensable base for fault tolerance architecture of dual cold standby [4]. Failure test includes many methods, such as system self-test, program repeating, choosing two from three data sector, and watchdog technology. Majority fault can be discovered through system self-test, and the key point is that system self-test must work normally.

2.2 The Fault Tolerance Architecture of Dual Hot Standby

To insure aircraft wok normally and deal with fault quickly, on-board computer adopts the architecture of dual hot standby in initial attitude setting phase. When host has failure, control right is transferred from host to backup by commands or autonomous discrimination [5].

Host and backup both have power supply module separately. Control right can be achieved only by either host or backup at the same time. In the hot standby mode, both host and backup can accept system input signal. Processing results of the computer that has control right are chosen as system output through switching circuit. When command centre discovers a computer has severe failure and cannot work, the power supply can be shut down through remote control or autonomous switching circuit. Fault isolation circuit deletes the computer that has failure. Closing failure computer cannot influence the other computer. Figure 2 shows the typical fault tolerance architecture of dual hot standby.

Dual switching circuit is monitored by host and backup. It has timing monitor and corresponding logic circuit, designed by watchdog mechanism [6]. Dual switching circuit has two triggers. Host and backup reset corresponding trigger, but timing signal inside the dual switching circuit set two triggers. If reset signal don’t appear before timing signal coming, switching signal will be generated automatically, then the computer working normally will be on duty. If host and backup are both working normally, switching signal is not generated, and host will be on duty. If host and backup are both out of work, host will be on duty, then at least there is still a guarding computer to avoid switching frequently between host and backup.

2.3 The Switching Implementation Mode Between Dual Computer

When aircraft is in the orbit, control right can be switched between dual on-board computer through remote control and switching autonomously [7]. When flight control centre estimates that current computer has failure according to telemetry data, control right can be switched between dual redundant computer by remote control command. When remote control mode takes into effect, autonomous switching is shut down, then output of the dual redundant is determined only by remote command. To shut down autonomous switching, permitting or forbidden time window of autonomous switching is set by remote command. Only when aircraft is in autonomous switching state, autonomous switching is permitted for on-board computer. In autonomous switching state, backup will take into effect when host has failure. Autonomous switching right is achieved by integral circuit to avoid accomplishing only by a piece of command. Switching command must be sent continuously many times, a certain level of integral circuit must be achieved to drive relay switching, and then backup computer will be on duty.

2.4 The Modified Autonomous Switching Strategy Between Dual Computer

To avoid logic estimation failure between normal computer and faulty computer when autonomous switching module has hardware malfunction. The typical fault tolerance architecture of dual hot standby is optimized. On the base of host state signal and backup state signal, the working state signal was presented. Autonomous switching could still be achieved if autonomous switching module has hardware failure. It can improve system redundancy. The principle was depicted as follows: host sent regularly its normal state signal to backup under regular condition, but backup could not receive normal state signal when host had failure. Backup could estimate that whether host worked normally through dual computer communication port. If backup worked normally and found failure on host, it would transmit on duty pulse to get control right. System architecture would be reconfigurable. Autonomous switching could be achieved when autonomous switching module inside dual computer switching circuit had failure, and it could tolerance a fault on autonomous switching module. Figure 3 illustrates the modified fault tolerance architecture of dual hot standby.

There were separate cache in host and backup computer for exchanging data each other. Host sent its data to cache of backup, while backup sent its data to cache of host. Host and backup had the same component. Figure 4 shows the principle block diagram for dual computer communication.

Figure 5 displays the data flow diagram for dual computer communication. If M represents host, then N represents backup. Whereas if M represents backup, then N represents host.

3 The Design of Comparator in Dual Duplex Mode Architecture

When taking into account influence on system availability caused by success ratio and time of failure judgement, duplex system has greater availability than hot standby and warm standby system under certain conditions. Besides duplex system has no switch problem, it is suitable to run under real time condition for crucial task, but it increases power consumption and has additional comparison circuit [8]. Dual duplex mode will output the comparison result of host and backup, so comparator is the crucial component for redundant system of dual duplex mode.

3.1 Hardware Design

Comparator is achieved by hardware in common redundant system of dual duplex mode. Hardware comparison module consists of comparison circuit and detection circuit executing agency [9]. Comparator adopts logic circuit for low redundancy level, and adopts independent processor system for high redundancy level. Figure 6 illustrates architecture for single comparator. If comparator has failure, it cannot generate detection signal or indicate faulty output, so the reliability of comparator becomes the new key single point for redundant system. To solve comparator’s single point of failure, the problem can be relieved by increasing comparator’s redundancy. Figure 7 depicts the architecture for dual comparators. However when detection and switching circuit for multiple comparators should be introduced, hardware will become further complex. More and more redundancy will decrease the whole system reliability.

3.2 Software Design

To avoid comparator’s disadvantage and increase reliability for real time and embedded on-board computer system, the design for solving comparator’s reliability was presented. Based on the scheme of software voting and software switching, hardware unit for comparator was abandoned and comparator was implemented as software. The design was analyzed and detected as a part of system resource, which could solve the reliability problem caused by alone comparator detection, therefore the system could be optimized. From principle analysis, it was available that comparator belonged to system resource. First, comparator had less process load and simpler category, and software reliability could be guaranteed after testing, so it had low probability of leading to failure in system. On the other hand, hardware lifetime is limited according to reliability theory. Software reliability is almost invariable once put into use, so reliability of comparator can be guaranteed. Figure 8 illustrates the determination flow diagram of software for comparator.

4 Conclusions

The reliability and security of on-board computer is the crucial topic for aircraft. The paper introduced architecture of fault-tolerant on-board computer system with redundancy function, and analyzed the dual switching strategy and determination criterion, finally presented a modified autonomous switching strategy and software suitable for comparator. Practice indicates that the redundant design improvement can effectively enhance the performance of space on-board computer system. The measure has engineering application value for design and implementation of space on-board computer system with high reliability.

References

Yang, M., Hua, G., Feng, Y.: Fault Tolerance Techniques for Spacecraft Control Computer. National Defense Industry Press, Beijing (2014)
Google Scholar
Sun, X., Chen, Z., Gu, Y.: Research on fault-tolerant flight control computer system based on dynamic reconfiguration. J. Syst. Simul. 30(10), 3957–3963 (2018)
Google Scholar
Yu, Y., Wang, H.: Deep Learning-based Reentry Predictor-corrector Fault-tolerant Guidance for Hypersonic Vehicles. ACTA ARMAMENTARII 41(4), 659–665 (2020)
Google Scholar
Jiang, B., Zhang, K., Yang, H.: Fault-tolerant control of satellite attitude control systems. Acta Aeronautical et Astronautica Sinica 42(11), 524662 (2021)
Google Scholar
Wang, Y., Wen, X.: Research status and progress of fault diagnosis technology for spacecraft. Aero Weaponry 23(5), 71–76 (2016)
Google Scholar
Xu, A., Xia, D., Zheng, J.: The study of fault tolerance technical in civil aircrafts slat flap control computer. Microelectron. Comput. 32(6), 36–40 (2015)
Google Scholar
Xiao, A., Hu, M.: Reliability analysis of the computer with quad-modular redundancy byzantine fault tolerant. Aerosp. Control Appl. 40(3), 41–46 (2014)
Google Scholar
Lv, Y.: A fault-tolerant method for space computer memory with low-cost and high-reliability. Aerosp. Control Appl. 46(3), 66–70 (2020)
Google Scholar
Wang, Z., Cheng, S.F., Ma, X.B.: Design and implementation of highly reliable fault-tolerant computer with integrated multi-task. Aeronaut. Comput. Tech. 50(4), 111–112 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

China Academy of Launch Vehicle Technology, Beijing, 100076, China
Yukun Chen, Dezhi Zhang, Gang Rong, Xu Wang & Feng Qiu

Authors

Yukun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dezhi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Rong
View author publications
You can also search for this author in PubMed Google Scholar
Xu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Qilian Liang
Tianjin Normal University, Tianjin, China
Wei Wang
Dalian University of Technology, Dalian, China
Xin Liu
School of Information Science and Technology, Dalian Maritime University, Dalian, China
Zhenyu Na
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Baoju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Zhang, D., Rong, G., Wang, X., Qiu, F. (2023). Design and Application of Fault-Tolerant On-Board Computer System with High Reliability. In: Liang, Q., Wang, W., Liu, X., Na, Z., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2022. Lecture Notes in Electrical Engineering, vol 872. Springer, Singapore. https://doi.org/10.1007/978-981-99-2653-4_37

Download citation

DOI: https://doi.org/10.1007/978-981-99-2653-4_37
Published: 12 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2652-7
Online ISBN: 978-981-99-2653-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics