1 Introduction

Reliability and resilience are now two crucial aspects of Bangladesh (BD) electric power systems, especially as the Government of BD aims at transforming conventional power grids into a more efficient and highly automated and secured network structure from generation to distribution to ensure an uninterrupted power supply to all over the country. It is important to note that resilience is not the same as robustness [1,2,3,4]. While these systems may be able to handle common issues that happen frequently like "N-1" contingencies pretty well, they can prove quite vulnerable when dealing with high-impact disruptions such as "N-k" contingencies which occur less often but have far-reaching consequences [5,6,7,8,9]. Hence, relying solely on traditional methods of ensuring reliability does not necessarily make a system resilient.

One example of our grid vulnerability is the catastrophic national grid failure of recent times, where the power system failed miserably to address increased power demand instantly and lacked coordination of communication between the national load dispatch centre (NLDC) and power distribution companies [10]. This failure left nearly 130 million people without electricity for as long as 4–8 h highlighting a clear example of the absence of grid resilience in our system. Recent Climate change adds another layer of complexity and urgency to the situation. With the expected increase in the frequency and intensity of severe weather events, power systems must be equipped to withstand and recover from such unplanned catastrophic events. This includes the ability to adapt to changing conditions, mitigate damage, and quickly restore power to affected areas.

2 Grid resiliency: a pathway to enhanced energy security

An electrical grid is an interconnected network which is used for delivering electricity from generating stations to consumers. Electrical power is produced at the generating stations which are transmitted to demand centres through high-voltage (132/230/400 kV or higher) transmission lines and then through distribution, this high voltage is stepped down to medium and low voltages to meet the demand of individual customers (i.e., households, industries, offices etc.). Reliable operation of the national electrical grid is therefore crucial to maintain socio-economic growth.

Currently, the BD Power development board boasts of its Installed capacity of 24,143 MW with a few thousand Mega Watts also in the pipeline [11]. The fossil fuel-dependent electricity sector in BD has an energy mix of coal, gas, oil, hydro, solar, wind and imports from neighbouring India. The transition towards clean energy would reduce dependency on fossil fuels, avoid fuel supply instability and lower power costs.

Until March 2023, the Power Grid Company of BD (PGCB) has been operating transmission lines of 1972 ckt km of 400 kV lines, 4236 circuit km of 230 kV lines, 8,464 circuit km of 132 kV lines which includes one 400 kV Station, five 400/230kV substations, four 400/132kV substation, twenty-eight 230/132 kV substations, one 230/33 KV substation and one hundred twenty six 132/33 kV substations [12]. A summary of BD generation, transmission and distribution is given in Tables 1, 2 and 3 respectively.

A reliable power grid ensures the delivery of electrical energy constantly without interruption. Lack of real-time monitoring and situational awareness tools, improper coordination of control actions, lack of early security assessment and warning, lack of advanced communication system between generating plants, grid indiscipline, human error, lack of reactive power compensation etc. lead to major grid failures which cause partial or complete blackouts [13].

Human error led to the major recent blackout in the national power transmission grid on 4th October at 2:05 p.m. is a great example of grid failures in BD [14]. Following the grid failure, all power plants tripped one after another and the electricity supply went off in Dhaka, Chittagong, Sylhet, Barisal and Mymensingh divisions.

The evolution of grid empowerment has become a core necessity of electrical power systems. Electrical grid resilience is the system’s ability to recover and function quickly following a disruption [15]. A transition from reliable to resilient electrical grid requires the involvement and integration of advanced information and computing technologies.

Table 1 BD grid installed capacity [16]
Table 2 BD transmission particulars [16]
Table 3 Distribution particulars [16]

3 Framework for transition from reliability to resiliency

Reliability and resilience are related attributes but they are not interchangeable. It is important to note that high reliability does not guarantee resilience in a power system. Reliability primarily addresses the prevention of power disruptions, while resilience focuses on the system’s ability to bounce back from disruptions and continue functioning [17,18,19,20]. Resilience is not a generic merit but a case-by-case property. Reliability can be evaluated without specifying the threats, but resilience is always relative to a particular threat. There are a few indices commonly used to measure the dependability of power systems such as loss of load probability (LOLP), Customer Average Interruption Duration Index (CAIDI), expected energy not supplied (EENS), system average interruption frequency index (SAIFI), Fault Tree Analysis (FTA) and Markov Models, system average interruption index (SAIDI) [21, 22]. Both reliability and resilience are essential in the context of a smart grid, as they work together to ensure the continuous supply of power, regardless of the challenges faced by the system.

Fig. 1
figure 1

Resilient and non-resilient system

The analysis and characteristics of a resilient system involve different stages in which a system operates when confronted with a disruptive event, as depicted in Fig. 1. From the figure, it can be said that a resilient system (represented by the green line) demonstrates greater resistance to disruption compared to traditional systems (represented by the red line). For example, during the period from ‘0 to t1, prior to the event, advanced weather forecasting and robustness of the system can be utilized for anticipating and preparing to cope with the first strike. A better prevention strategy plays a major role in reducing the impact of the first strike. At the point t1 to t2, the system enters into a post-disruptive state and starts absorbing shocks and hurdles. Through the implementation of stronger security measures, system resourcefulness, and infrastructure enhancements, the system resists further degradation. These infrastructural improvements are crucial in safeguarding against potential threats and minimizing the impact of disasters on both human lives and critical infrastructure. At stage t3 to t4, the initiation of recovery actions takes place, which is usually known as the restorative state. Emergency resources like critical load restoration may be quickly put into action to resume the system’s recovery to a stable state at the minimum restoration time. Thus, resilience is a critical factor in designing a robust system that can withstand various disturbances and adapt quickly to return to a stable state.

Important aspects of Power System Resiliency:

  1. i

    Maintaining situational awareness at all times is crucial. By continuously monitoring internal factors such as operation and control strategies, as well as the dynamic external environment, including weather conditions, situational awareness allows for a comprehensive understanding of the current situation and the ability to anticipate potential disruptions. This proactive approach enables effective management of extreme events, rather than a passive response.

  2. ii

    Before an extreme event occurs, it is crucial to prioritize robustness and preparedness. This involves implementing both infrastructural and operational measures in advance to minimize the impact of potential disruptions. Strengthening the system infrastructure makes it more resilient to anticipated disruptions and less susceptible to unexpected ones. Additionally, the system’s operation should be planned to enhance flexibility, enabling it to absorb various manifestations of disruptions. These measures must be adaptable to changing conditions.

  3. iii

    During an extreme event, responsiveness and survivability are paramount. At the period of that event, the grid must respond promptly and effectively to disruptions. This involves detecting the occurrence of the event, assessing its impact on the system, and taking immediate actions to mitigate its effects. By being responsive, the grid can minimize the duration and extent of power outages, prevent cascading failures, and limit the disruption caused to the overall system. Timely responses help preserve the functionality of critical infrastructure and ensure the safety and well-being of the people relying on electricity. Grid survivability refers to the ability of the system to withstand the impact of such events while maintaining a certain level of performance. A resilient grid is designed to have built-in redundancies, alternative pathways for power flow, and robust infrastructure that can resist and absorb the effects of disruptions. By ensuring survivability, the grid can prevent complete system failures and minimize the loss of power supply to critical services and infrastructure. This enables the grid to continue functioning even under challenging conditions, reducing the overall impact on society and the economy.

  4. iv

    After an extreme event, recoverability and speed are crucial factors because they determine how quickly the power system can bounce back to its normal state. The system’s performance should be recoverable, and efforts should be made to swiftly restore it to the level achieved prior to the extreme event. This includes implementing measures to repair any damages and restore normal operations efficiently. This involves identifying and addressing the causes of the disruption, repairing or replacing damaged equipment, and restoring power supply to affected areas. The faster the system can recover, the shorter the duration of power outages and the quicker life can return to normal for the affected population. Recoverability is crucial for minimizing the social, economic, and environmental impacts of the event. The faster the grid can restore power, the less disruption and inconvenience experienced by customers and critical services.

4 Cloud computing: types and classification

Cloud computing refers to computing assets and services available as necessary using the internet. These consist of software implements and resources such as networking, data storage, databases, software analytics and others for enabling rapid change, adaptable assets and cost advantages through scaling of economies. This reduces the necessity of organizations to invest in local management of assets and only pay for what they actually use. Cloud computing can be classified according to either service or deployment [23].

4.1 Service model

Cloud computing is mainly of three types according to the services it can provide. These are Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Each can provide a different degree of supervision, adaptability and administration so that clients can optimize their chosen set of services.

4.1.1 IaaS

The IaaS model consists of storage servers and virtual machines (VMs). The distribution of cloud computing resources is performed using IaaS. Clients can use VMs for accessing assets as required where they can install necessary software solutions. IaaS provides a hardware platform to its users so that it can be accessed as necessary. Also, the IaaS VMs allow their clients to use any supported operating system [24].

4.1.2 PaaS

PaaS allows the deployment of programming models to IaaS. Clients can use these programming models through cloud computing and run their code. The execution of client codes is the responsibility of PaaS. Through the use of PaaS clients can develop and deploy web-based applications and services without local installation of coding software [25].

4.1.3 SaaS

SaaS allows the use of any applicable software applications using cloud computing. These services are web-based so clients only need an up-to-date web browser. The SaaS model enables clients to run their applications and software services without the need for local installation. Although this is only limited to services which are already available. It does not support the addition of new applications [24].

4.2 Deployment model

Whereas, cloud can also be classified according to deployment models, such as private, public, community, and hybrid [26].

4.2.1 Private cloud

The cloud resource is possessed by private entities, and the data is kept within the entity. This type of arrangement aims to serve their business interests.

4.2.2 Public cloud

While, the public cloud is the property of a service provider, which is open to the general population for their use.

4.2.3 Community cloud

This is comparable to the private cloud model with some extra benefits to provide services to a select group of entities who have related demands.

4.2.4 Hybrid cloud

This model is a combination of the previous models. The private, public, and community clouds are combined in such a way as to meet the specifications of all related entities.

Figure 2 shows the classification of cloud computing models according to application and deployment.

Fig. 2
figure 2

Cloud computing infrastructure [23]

5 Necessity of cloud computing for smart grid resiliency in BD

Smart grid resiliency and reliability in BD require cloud computing integration. This is explained using the following factors.

5.1 Enhancing grid frequency stability with cloud data centres

Electric grids require transmission operators to maintain balance, reliability, and frequency within defined limits. This is not only a need, but also an obligation to ensure the smooth functioning of the grid. In BD, the grid frequency is ideally controlled to be around 50 ± 0.05 Hz [27, 28]. Maintaining grid frequency is critical for reliability and prompt restoration to nominal levels after disruption is crucial for resilience. Even slight frequency deviation can cause problems for industry consumers whose mechanisms are frequency sensitive. Generators and other large machineries are usually efficient in certain frequency ranges and their operation can be impaired if the frequency goes beyond design limits. Automatic disconnection of grid elements such as generators, transmission lines and consumers can occur if frequency gets out of bounds [29]. Certain clocks maintain time using grid frequency which can become unreliable if the frequency drifts too much [30].

The transition from conventional synchronous generators to intermittent non-synchronous sources like wind, solar power and nuclear fuels poses a major challenge in the sense that these new sources do not provide sufficient rotational system inertia support to the power system. As a result, without supplementary devices like energy storage and demand response (DR), the power system may experience higher frequency and a faster rate of change of frequency (RoCoF) during disturbances [31, 32]. This highlights the importance of implementing fast-response supplementary devices to maintain system stability and reliability in the absence of rotational inertia support.

The operational challenges that arise when frequency metrics in the power system exceed prescribed tolerances i.e., go beyond the limits, could lead to involuntary shedding of customer loads, distributed generation issues, and even a total system blackout. To address this, there is a growing need for new Frequency Response Reserve (FFR) services that can balance generation and demand in real time, thereby supporting the adoption of renewable energy and mitigating the impact of reduced system inertia [33, 34]. Cloud-responsive data centres with intelligent controls are seen as potential solutions to fulfil BD’s renewable generation objectives and meet the future expansion of grids of diversified fossil power plant targets securely and a tentative model is given in Fig. 3.

Fig. 3
figure 3

Frequency control in coordinated environment of data centre

Frequency control takes place using primary, secondary and tertiary control. Both primary and secondary control occurs automatically with a response time of approximately 15–30 s and 200 s respectively. The first and fastest control is the primary control which usually occurs using the Free Governor Mode of Operation (FGMO) [35, 36]. In FGMO the governors, which control the mechanical power input to the prime mover of generators, are free to adjust their settings to match the change in load keeping frequency as stable as possible. When stability is reached, secondary control i.e., Load Frequency Control (LFC) is used to distribute the load in such a way as to restore frequency and generator reserves to normal levels. Tertiary control is done manually if primary and secondary is not enough [36].

In practice, the BD grid frequency variation often exceeds 1 Hz, primarily due to insufficient generation capacity and inadequate enforcement of the National Load Dispatch Centre’s (NLDC) authority. Also, there is no automatic primary and secondary control in the BD grid. The infrastructure is inadequate for FGMO and reserve generation capacity is lacking for LFC [37].

This frequency deviation can be minimized if automatic frequency control is performed, proper load forecasting is done and the operating reserve is maintained. Cloud computing will enable FGMO and comprehensive efficient load forecasting which in turn will reduce the necessity of maintaining large reserves. A trial FGMO was performed in select generating stations of BD which demonstrated significant improvement in frequency control, this shows promise for a comprehensive implementation using cloud computing [35].

In the USA, grid frequency fluctuations are usually monitored continuously using a type of phase measurement unit (PMU) under the FNET system. Emails are sent out to appropriate authorities when the frequency goes out of bounds. However, this system is unresponsive to slow changes in frequency. ISO New England, a regional transmission system operator (TSO) in Massachusetts USA, developed an Amazon Web Services (AWS) cloud computing-based big data investigator that can identify slow but significant changes in frequency from archived data [38, 39].

5.2 Economic load dispatch

Economic load dispatch (ELD) is a nonlinear optimization process where the system aims to optimally select available generating units to meet the grid demand where the operating cost is minimized within restrictions regarding transmission losses and environmental emissions. This distribution of generation is necessary to minimize per-unit operating costs, increase efficiency and decrease pollution. It primarily consists of unit commitment and load scheduling. Unit commitment involves selecting which generating units to run and at what output level to minimize cost. Load scheduling refers to predicting the instantaneous grid-connected load in the immediate future. Cost will be minimized if these two tasks are properly done by choosing the optimal load flow scenario [40]. ELD can be solved using conventional linear programming methods such as Lambda iteration, Newton–Raphson method etc. However, due to the increasing complexity of modern grids, probabilistic methods such as dynamic programming, genetic algorithms, artificial intelligence etc. are used for ELD.

In the BD grid, NLDC is responsible for the economic load dispatch where calculations are centrally done and instructions are given over the phone instead of automatic supervisory control and data acquisition (SCADA) instructions [41].

ELD involves complicated and real-time processing of large amounts of data which also needs to be distributed without delay. Expansion of the BD grid will require scaling the central processing of NLDC which may not be economically feasible. Nonetheless, cloud computing will enable easy incorporation of decentralized load dispatch calculation and instruction transmission with built-in redundancy [42].

5.3 Up-gradation of NLDC

PGCB, which is run by the government, plans to revamp its NLDC to accommodate the transmission of a massive 60,000MW of electricity load before 2041 [43, 44]. NLDC bears the overall responsibility for managing the frequency and voltage levels of the power system [45]. To accomplish this duty, NLDC must make sure the complete cooperation of generators and distribution utilities by enforcing grid regulations and utilizing the authorities granted to NLDC by the Grid Codes. Some of the important functions under this co-ordination are:

  1. i

    Analyze the operational state of the power system and issue coordinated commands to generators and distribution utilities to implement appropriate actions for regulating system frequency and voltages. This involves utilizing all available capabilities and facilities, such as Speed governor systems and the ramping capacity of generating units (measured in MW/Min), which determines the rate at which generation can change. If the frequency of the system experiences an increase to 52 Hz or a decrease to 49.5 Hz, it is necessary for generating units to alter their output by a quantity equivalent to 2% of generator output for each deviation in frequency outside the standard operating range spanning from 51–49 Hz, until normalcy is restored (ideally via action taken by generating unit governor) [46,47,48].

  2. ii

    Generators need to quickly notify NLDC if any limitations could impact their ability to perform and contribute to managing frequency and voltage.

To ensure optimal regulation and management of frequency and voltage for power systems, NLDC requires adequate empowerment and upgradation. This includes the necessary authority to secure a high level of cooperation from generators, distribution utilities, as well as all other entities with facilities that can influence regulation processes.

To this end, the government has taken various types of initiatives where a ’free governing mode of operation’-FGMO will be introduced in NLDC [35, 49, 50]. The government is increasingly concerned with ensuring the smooth operation of the NLDC centre, particularly as the demand and supply gap continues to widen during peak and off-peak seasons. This gap can reach up to 6,000MW to 7,000MW and is expected to widen even further with the commissioning of mega power plants such as the Rooppur Nuclear Power Plant in Pabna within the next two to three years.

This issue is of utmost importance as it directly impacts the reliable operation of the NLDC centre. The government is actively exploring solutions to address this challenge, including the implementation of new technologies and infrastructure upgrades of NLDC. Under this context, cloud computing facilities can play a vital for the following reasons:

  1. i

    Improved Data Management: Cloud storage allows for efficient and secure storage of vast amounts of data generated by the NLDC. This enables the NLDC to effectively manage and analyze real-time data from various sources, such as power generation units, transmission lines, and consumer demand [51, 52]. By having access to comprehensive and up-to-date data, operators can make informed decisions to enhance the resiliency of the power grid.

  2. ii

    Enhanced Monitoring and Control: A web-based network enables real-time monitoring and control of the power grid from anywhere with an internet connection. This capability allows NLDC operators to closely monitor critical infrastructure, identify potential issues or disruptions, and take prompt actions to maintain grid resiliency. Remote access and control enhance the agility and responsiveness of the NLDC in managing the power system [53].

  3. iii

    Scalability and Flexibility: Cloud storage and web-based networks provide scalability and flexibility to accommodate the increasing complexity and growth of the power grid. As the grid expands, the NLDC can easily scale its storage capacity and network infrastructure to handle the growing volume of data and ensure reliable operations. This scalability and flexibility contribute to the resilience of the NLDC in adapting to changing grid dynamics.

  4. iv

    Redundancy and Backup: Cloud storage offers redundancy and backup capabilities, ensuring the safety and availability of critical data even in the event of hardware failures or disasters. By storing data in multiple locations, the NLDC can minimize the risk of data loss and improve its disaster recovery capabilities [54]. This redundancy strengthens the resiliency of the NLDC by safeguarding crucial operational information.

  5. v

    Collaboration and Information Sharing: A web-based network facilitates seamless collaboration and information sharing among various stakeholders involved in power grid operations. It allows for real-time communication between the NLDC, power generation units, transmission companies, and other relevant entities. This collaboration enhances situational awareness, coordination, and joint decision-making, leading to improved grid resiliency.

  6. vi

    Cyber security and Data Protection: Modernizing the NLDC with cloud storage and a web-based network necessitates a robust cybersecurity framework [55, 56]. Implementing stringent security measures helps safeguard the NLDC’s data and infrastructure against cyber threats. By adopting industry best practices, encryption protocols, and continuous monitoring, the NLDC can bolster the resiliency of its systems and protect against potential cyber-attacks.

ISO New England TSO does automatic generation control, economic load dispatch and other important functions using a SCADA system. In the event of a failure of the SCADA network, the operator resorted to calling over secure phone lines for load balancing, generator control and other tasks that were time-consuming and prone to human error. During the pandemic, the operator developed an AWS cloud-based backup solution for these time-critical tasks which was found to be quite effective [38, 39].

5.4 Renewable energy integration

Renewable energy refers to natural clean energy sources such as solar, wind, tidal/wave, biomass/biogas and geothermal. In recent years, an increasing percentage of power grids have incorporated renewable energy to reduce dependence on fossil fuels and energy imports. Fossil fuel usage reduction will in turn reduce the effect of climate change [57]. It will also enable better efficiency and reliability because the international fossil fuel market has become volatile in recent months [58]. However, renewable energy integration to the grid has some challenges. The most common forms of renewable energy, that is solar and wind, are irregular in nature. Available energy from solar and wind varies greatly depending on weather, season, time of day and geographical location. Renewable energy plants are usually distributed and away from load centres which results in increased cost of transmission especially for off-shore installations. Since renewable sources are unpredictable, their inclusion decreases power grid reliability and inertia with regards to frequency fluctuations [37, 57].

In BD the current share of on-grid renewable energy is 4.55% of which approximately 80% is solar. Compared to an on-grid renewable capacity of 811 MW there is 370 MW off-grid capacity. Since it does not affect the national grid, it does not affect reliability. Figure 4 shows the energy generation mix for BD where the renewable contribution is 1183.71 MW (4.55%) out of approximately 26,000 MW and Fig. 5 shows renewable energy share [59].

The unpredictability of renewable energy can be mitigated using energy storage solutions, and virtual power plants (VPP) for the management of small distributed energy sources e.g. EVs and weather forecasting [57]. Renewable power plant infrastructure is usually distributed and remote which can be monitored and managed remotely using cloud computing. This will result in proper maintenance and reduce the likelihood of unplanned downtime. Solar and wind energy output greatly depend on weather conditions. Real-time and relevant weather forecasting of renewable power plants is only possible through cloud computing [60].

Renewable energy sources need distributed energy resource (DER) grouping and management which is usually achieved through VPPs. These require cloud computing and Internet of Things devices. Centrica, a UK based energy service provider and AutoGrid, a California - USA based DER service provider use AWS cloud computing in their VPPs to combine various kinds of energy sources, loads and energy storage for forecasting and optimum resource management while being security standard compliant [38, 39].

Fig. 4
figure 4

Energy generation mix (MW)

Fig. 5
figure 5

Renewable energy share (MW)

5.5 Computational intelligence (CI) based transmission operation and control

A robust transmission network ensures reliable and efficient delivery of power, seamlessly across different regions to support the integration of renewable energy sources allowing greater flexibility in managing electricity supply and demand. To facilitate these advanced monitoring and control capabilities, Real-time data on electricity flows, voltage levels, and system conditions is a prior requirement to optimize grid performance, identify potential issues, and take proactive measures to ensure efficient operation.

Under the guideline of the "Schedule and Dispatch" Code, transmission system stability and reliability depend heavily on the full cooperation of generators along with NLDC (System Operator - SO) instructions and directives, without any interruption or delays [61, 62]. During the fiscal year 2021–2022, a total length of 1003.824 circuit kilometre transmission line was added to the system through different projects and a summarized overview is mentioned in Table 2. At the end of fiscal year 2021–2022, grid capacity increased by 13% at different voltage levels [63, 64].

In essence, an integrated and comprehensive cloud-based database is strongly necessary to fulfil these various requirements of advanced protection systems, power scheduling, operation planning, enhanced predictive facilities and operator’s abilities and resources, control elements, communication systems and decision support systems for the smooth operation of the grid network.

Midcontinent ISO, an Indiana–USA based independent system operator, performs power system modelling once every three months. However, the operator expects more frequent modelling requests from their clients. Furthermore, the model data also get updated periodically creating the possibility of errors due to discrepancies between different versions of the same model. The proposed solution is that the operator will use cloud-based model management using a SaaS model for their clients to reduce errors and for quicker updating of the model [38, 39].

5.6 Dynamic tariff

Dynamic energy tariff or pricing is a form of demand-side management where consumers are incentivized to consume energy at different times to reduce demand during peak times. This kind of flexible tariff can only be possible through the use of Smart energy meters and cloud computing where the consumption data is transmitted to grid operators at set intervals. Dynamic energy tariffs can be implemented in the form of time of use (TOU), Critical peak pricing (CPP), Real-time pricing (RTP) and Peak load pricing (PLP). Dynamic tariffs can enable savings for consumers through incentives and for producers through peak shaving so that the grid can operate with lower reserves. TOU is the simplest form of dynamic tariff where electricity pricing is higher during peak times and lower during off-peak times [65].

In BD there is currently no provision for a dynamic energy tariff since most energy meters are non-smart meters [65, 66]. A small segment of the customer base uses smart prepaid meters but their data is not used for dynamic tariff. In the future when smart meters will be used to a greater extent, cloud computing will be a necessity for TOU or any other kind of dynamic tariff policy [65, 67].

5.7 Load forecasting

In the current context of the BD Power system, with the presence of deregulated electric power industries and the support of free competitive markets, load forecasting has become more critical than ever. BD Power System involves several different generation and distribution companies, while the transmission system is overseen by the Power Grid Company, the National Load Dispatch Centre falls under the jurisdiction of PGCB and is responsible for managing load dispatch and power generation control across the entire country. Forecasting loads with necessary precision plays a vital role in various operational choices, such as optimizing power usage by avoiding over or under-generation, making economically feasible investment decisions based on future load demand, managing resources, planning infrastructure development, and scheduling maintenance. In competitive electricity markets, accurate load forecasts are essential for energy transactions.

With the integration of Smart electrical systems, BD power grid operations now involve a significant deployment of intelligent terminal equipment at different stages. As a result, there has been an immense transformation in the environment, methods and goals associated with load forecasting. However, heterogeneity in terms of diversity, complexity and intermittency of the load have emerged as big challenges for existing forecasting models to manage effectively. The power producers in the private sector have reported that negative electrical consumption during off-peak hours has had an impact on their ability to generate power. The demand for electricity during these times is significantly lower than what was initially forecasted, causing issues with load management of the power grid due to sudden fluctuations in demand during both peak and off-peak hours.

Traditional load forecasting methods primarily focus on the technological aspects of forecasting, whereas smart grid load forecasting requires a more sophisticated approach at the service management level. After analyzing existing load forecasting methodology, several issues have been identified in load forecasting:

  1. 1.

    Data storage and processing: The proliferation of terminal devices such as smart meters and embedded smart appliances, coupled with the advancement of "informatization, digitalization, automation, and interaction" processes, poses challenges for data acquisition, processing, storage, and calculation. The current methods employed in traditional load forecasting suffer from slow processing speed, long response times, and inadequacy in terms of data migration and disaster preparedness for information data platforms. Consequently, new approaches need to be adopted to address these limitations. Additionally, traditional load clustering analysis methods often result in local optima or even non-convergence, significantly hindering the progress of accurate load forecasting.

  2. 2.

    Forecasting mathematical model: As the demand for accuracy and density in load forecasting increases hugely with the continuous expansion of grids, there is a dire need to explore real-time and self-adaptive mathematical models. These models should be capable of adapting to load-changing conditions with respect to stochastic generation and providing accurate forecasts in real time.

In summary, the complexities associated with load forecasting in transforming the grids more resilient require solutions that address issues related to data storage and processing as well as the development of advanced forecasting mathematical models that offer real-time and self-adaptive capabilities.

Deploying a cloud-based decentralized AI load prediction system can effectively tackle difficulties associated with sluggish data processing and sharing -resource utilization, while it can adapt to the latest advancements in intelligent grid technology.

Fig. 6
figure 6

Cloud based network

Figure 6 shows the cloud architecture of load forecasting for the smart grid. Load forecasting is not necessarily a single-ended function now, rather it evolves from a heterogeneity of system platforms where diversification of infrastructure such as servers, diversification of development platforms, diversification of data standards, diversification of forecasting technology and diversification of objects are integrated through a single platform of cloud computing centre. The term "cloud" refers to two main elements, namely service and management. The services category covers Infrastructure as a Service, Platform as a Service and Software as a Service. Cloud Management mainly emphasizes meeting the service requirements of users such as managing user accounts, implementing single sign-on functionality, and configuring user settings are some of the responsibilities involved in user management. Meanwhile, service management guarantees compliance with SOA design standards which improve overall quality for operating cloud services. Accounting management involves keeping track of how network resources and services are used by clients through recording their utilization patterns. Lastly, security measures ensure that users can gain access to these benefits lawfully while maintaining privacy within reasonable limits where applicable. All the aforementioned resources being combined will promote the development of cloud computing for the smooth operation of power load forecasting.

ISO New England TSO uses cloud computing for limited-duration load forecasting using machine learning. The operator’s standard load forecasting was not very accurate. A chosen cloud provider’s PaaS allowed the use of cloud infrastructure to produce a more accurate load forecasting model using several kinds of machine learning. The operator focused on model training instead of platform and hardware finetuning which saved time and resources [38, 39].

5.8 System loss

In a national grid, system loss in generation, distribution and transmission can be classified into three categories. These include technical losses, non-technical losses and administrative losses. Technical losses include losses due to generation efficiency, transmission line conductor loss, power transformer copper and core loss, losses due to incorrect energy meter readings and other theoretical and technological shortcomings. Non-technical losses include power theft, improper billing due to corruption, energy meter tampering and other losses where electricity is consumed but not billed. Administrative losses include power consumption by the grid itself for the proper functioning of substations and power plants [29]. In BD the system loss for the fiscal year 2021–2022 was 10.41%.

The non-technical losses can be eliminated completely using smart metering and cloud computing [67]. Technical losses with regards to efficiency can be reduced with better forecasting and scheduling through cloud computing (Table 4).

Table 4 Summary of cloud computing necessity for grid resiliency

6 Cloud computing hurdles in BD

Even though cloud computing is necessary for many important scenarios as discussed in the previous section, it has some challenges as discussed below [68, 69].

6.1 Data security and privacy

This is one of the most important challenges that needs to be addressed properly because grid reliability and resiliency are a matter of national security.

6.1.1 Authorized access and privacy

Information regarding the national grid and its customers is sensitive and should only be accessible to authorized personnel. If this data is accessed by unauthorized parties, then the security of the grid may be compromised. Data breaches can occur if the cloud provider is hacked, if a phishing attack occurs at the grid operator level or if the base security of the publicly available cloud service is not up to the mark.

The selection of and access to customer personal details should be done carefully and with transparency, to achieve an acceptable level of privacy.

6.1.2 Credibility

Messages or instructions transmitted to and from the grid should be credible only if they involve authorized parties. Authentication should be easily verifiable but difficult to falsify.

6.1.3 Integrity

If the data or system operation integrity is not maintained in the cloud then it can have severe consequences with regards to grid resiliency. Data integrity ensures that only legitimate entities can modify it appropriately, whereas system integrity ensures that no unexpected or unusual operation is possible.

To minimize these security risks, a reputable cloud provider should be used that uses industry-standard encryption and follows security best practices including the use of secure authentication methods. Grid operators should be periodically reminded and trained against cyber-attacks.

In BD, a government-owned and operated Tier-3 certified National Data Center (NDC) was established in 2009. The data center is ITILv2 compliant, ISO 27001:2013 and ISO 20000 certified. It has two levels of Firewall and Next Generation Intrusion Prevention System (NGIPS) security. The data centre is N+1 fault tolerant and has a Disaster Recovery Capacity. Since this data centre is maintained by the government, policymakers will be more inclined to use it for the power grid. Because this will ensure that sensitive data will remain in the country [70].

6.2 Dependence on the internet

Cloud computing is intricately dependent on internet availability and having sufficient bandwidth. If there are sudden disruptions to communication between cloud servers and grid operators then the relevant tasks will be affected.

To prevent this kind of problem, sufficient bandwidth should be allocated and redundant internet connections should be utilized. Fail-safe contingencies should be devised in case of complete communication failure.

Even though the median BD internet speed is not among the fastest and below the global average, the nominal latency, which is arguably more important in terms of reliability, is slightly better than the global median value. As of July 2023, BD ranked 106th out of 141 on the Speedtest global index [71]. Furthermore, according to Bangladesh Submarine Cable Company Ltd (BSCCL), the total internet bandwidth is 2800 Gbps due to two submarine fiberoptic cables SEA-ME-WE 4 and 5. A third submarine cable is in the planning stage which will add an additional 7200 Gbps bandwidth [72].

6.3 Interoperability

If several cloud computing providers are used for cost saving, redundancy or any other relevant causes, then the cloud applications of different providers should be able to communicate with each other. Their operation, security, encryption and management should be similar and compatible. Otherwise, power grid management through cloud computing will be difficult, if not impossible, especially if it becomes necessary to migrate from one cloud provider to another in the future.

A viable solution is to use providers that follow a cloud standard where only common cloud APIs are used to facilitate interoperability.

Although interoperability of NDC with other cloud service providers is unknown most of the popular cloud providers offer their service in BD, which are interoperable with each other [73].

6.4 Reliability

Cloud computing with regards to national grid management is very time sensitive so the reliability and availability of the cloud services is of utmost importance.

To maintain reliability sufficient cloud computing resources should be allocated to meet peak computing and data-accessing demands. Proper backups should be maintained and grid operators should be adequately trained.

Finding a certifiably reliable cloud computing service inside BD at this moment may be challenging. However, in recent years several local private cloud service providers have started their operation with respectable reliability, at least anecdotally [73]. This includes the government-owned NDC which has ISO certification regarding compliance and reliability [70]. Local cloud providers are more affordable compared to foreign ones, albeit with lower-tier analytics, due to the ongoing dollar crisis. The national grid operator also has the option to use several cloud service providers from outside the country.

7 Grid communication techniques

7.1 Enhancing interoperability and upgrading of grid communication through OPGW systems

The inadequate communication infrastructure in the transmission domain, along with the ageing infrastructures, has resulted in a susceptibility of power grids to regular disruptions [74]. Additionally, the integration of large-scale renewable energy resources into the distribution grid has altered the conventional direction of power flow. The integration of Optical Ground Wire (OPGW) with the transmission network of PGCB is playing an important role in the implementation of the Government’s vision of "Digital Bangladesh". In addition to ensuring a fair circulation and supply of electricity to various grid substations and maintaining Data Transfer and constant communication between the electricity generating station and the National Load Dispatch Centre (NLDC), OPGW is also utilized for the PGCB’s own Communication System. Optical fibre (OPGW) placed inside ground wire has gained massive popularity for the protection of transmission lines from lightning surges. In 1996, PGCB started using OPGW over transmission lines instead of Ground wire experimentally for the very first time and currently almost the entire transmission line uses OPGW.

Up until 2007, PGCB’s transmission line with OPGW had a total length of 2,200 KM which is now over 7200 KM approximately. With this, 60 districts and about 200 sub-districts of the country are now under the Optical Fiber Network of PGCB. The use of OPGW as a strong telecom transmission backbone all over the country has expedited the ongoing technological revolution as a result of which the economic development of the country is further strengthened. In continuation, PGCB has started using OPGW commercially for the development of Information Technology of the country, not just limiting its use only for communication purposes and protection of transmission lines. In 2006, PCGB started its first commercial activity by primarily leasing 246 KM of dark optical fibre in the Dhaka-Chottogram region to Grameenphone Ltd., intending to develop the infrastructure of national communication. Later on, several other mobile operators, Nationwide Telecommunication Networks (NTTN) and other institutions leased a total of 17,223 KM of dark optical fibre all over the country. As a part of Corporate Social Responsibility (CSR) to elevate the quality of higher education, the University Grants Commission (UGC) has been provided with a lease of 3,284 KM of optical fibre at a minimal price. For the fulfilment of the government’s vision of ’Digital Bangladesh’, Fiber@home Ltd. and Summit Communication Ltd. have provided a lease of 3,600 KM of optical fibre at a reduced price set by the Domestic Network Coordination Committee (DNCC) to ensure digital for the entire population of the country.

After obtaining a National Telecommunication Transmission Network (NTTN) license from BTRC in 2014, PGCB established an OPGW office in 2017 for the commercial expansion of telecommunication business countrywide. Moreover, the initiative of a project for the deployment of High-Capacity Telecom Equipment for achieving Telecom Sector G Bandwidth Transmission using an NTTN license has been taken, at the same time taking advantage of the resources, infrastructure and other facilities of PGCB. Consequently, PGCB has completed all survey works from grid substations all over the country along with demand data of Transmission Bandwidth from International Internet Gateway (IIG) operators and International Service Providers (ISPs). In light of the demand needs, a pilot project work is underway along the Kuakata-Benapol-Bheramara-Dhaka route to assess the Business Viability. Upon completion, high-speed uninterrupted data can be transmitted at a discounted cost over the current market price to various telecom operators, IIG and ISP operators. PGCB’s income will significantly increase if it can grant transmission bandwidth at an affordable price by completing the process of providing a lease. This in turn will achieve significant progress for the Information Technology sector of the country. The income of the OPGW office is constantly growing since its establishment in 2017 and has increased by 43% from 2019–20 compared to the 2020–21 financial year. Above all, the role of PGCB’s OPGW is indispensable in the fruition of the Honorable Prime Minister’s dream of “Digital Bangladesh” [75].

7.2 Integration of PMU with cloud computing

Phasor Measurement Units (PMUs) are devices defined by a set of standards with two major goals measurement and protection - achieved on wide area electricity grids; it provides an accurate and constantly updated view of the power grid. Cloud computing has been used to help in PMU computation as a novel approach [76].

As more PMUs monitor additional buses though, the computational load required for solving state estimation becomes increasingly significant. The need for instant analysis, visualization, and smart management of the power grid has spurred advancements in PMU-based applications. For example, China is installing PMUs in all new substations by default. Eventually, in the context of BD, the entire transmission system could be fully monitored with these devices if proper steps were taken. An analysis of PMU data using an LSE solver revealed challenges in processing massive amounts of data within operational constraints, adapting to network disruptions, and scaling for larger networks [77].

Fig. 7
figure 7

Cloud architecture of PMU

Additionally, aggregating continuously transmitted high-volume data from geographically distributed PMUs is necessary for accurate computation by the LSE solver. The cloud server hosts the code for the Cloud LSE application. Runtime data from multiple PMU devices is sent to this server for processing via a PDC. The results from the cloud can be used for performing visual analysis and system monitoring, similar to what is being done in the classical LSE. A Cloud LSE application can provide inputs for the management of the power grid based on real-time data obtained from PMUs. As shown in Fig. 7 PDCs are intermediate devices that concentrate and aggregate the data from multiple PMUs so that they can be sent as a package to the Cloud LSE application. This application can be configured for different test systems [78,79,80].

The Romanian power system developed a cloud computing-based architecture called FIWARE, for the collection and processing of Phasor Measurement Unit (PMU) synchronous phase data. This kind of data allows better grid resiliency through analysis of disruptions and having an intuitive understanding of power flow and phase differences including recognition of temporary behaviour, which is not possible in a standard SCADA system.

Recovery of synchronous phasor data is difficult for standard TSOs because their network security policy prohibits foreign connections in their SCADA network. A possible solution is to use a different single-function port only for PMUs, separate from SCADA communication.

PMUs require data management in cloud computing because of the sheer volume of data it generates which requires processing in real-time for proper predictions [81].

8 Conclusion and future work

Bangladesh has experienced a radical transformation in the power sector in the last few years. Despite previously achieving a 100% electrification rate, the entire country is now severely facing frequent and persistent power outages, which have adversely impacted export industries; a vital source of income for the Bangladeshi economy. The struggling RMG sector poses a severe threat to the country’s economic stability due to energy insufficiency issues that may hinder factory operations. While unforeseeable challenges are expected to arise in the near future, Bangladesh is now in dire need of looking for a stable and sustainable solution to power outages through a resilient and reliable grid network. In this respect, the integration of cloud computing has emerged as a transformative transition for empowering the power grid in Bangladesh, shifting the focus from mere reliability to enhanced grid resiliency. Through the utilization of cloud-hosted technologies, the power sector in Bangladesh can achieve significant advancements in efficiency, scalability, and adaptability, ultimately bolstering the resilience of the entire grid infrastructure. By adopting cloud computing, power grid operators can effectively manage and monitor critical systems, optimize energy distribution, and respond swiftly to any disruptions or contingencies. While challenges such as security and infrastructure readiness must be addressed, the integration of cloud computing presents a promising pathway towards a more robust and resilient power grid in Bangladesh, capable of meeting the nation’s growing energy demands and ensuring a reliable and sustainable future.