1 Introduction

Societal prosperity of the latter half of the twentyfirst century has been underpinned by the Internet, formed by large-scale computing infrastructure composed of distributed systems which have accelerated economic, social and scientific advancement [1]. The complexity and scale of such systems have been driven by increased societal demand and dependence on such computing infrastructure, which in turn has resulted in the formation of new distributed system paradigms. In fact, these paradigms have evolved in response to technological changes and usage, resulting in alterations to the operational characteristics and assumptions of the underlying computing infrastructure. For example, early mainframe systems provided centralised computing and storage interfaced by teletype terminals. Clustering and packet switching advancement in microprocessor technology and GUIs transferred computing from large mainframes operated remotely to home PCs [2, 3]. Standardisation of network protocols enabled global networks-of-networks to exchange messages for global applications [1]. Organisations developed frameworks and protocols capable of offloading computation to remote machine pools of computing resources such as processing, storage and memory [4, 5], eventually incorporating sensing and actuator objectives with embedded network capabilities [6]. Thus, distributed systems paradigms have evolved to distribute and facilitate service from centralised clusters, extending infrastructure beyond the boundaries of central networks forming paradigms such as IoT and Fog computing [7, 8].

For the past 60 years distributed system paradigms have conceptually evolved to meet challenges introduced by an ever-changing computing infrastructure and society [9]. From mainframes to clusters, clusters to Cloud, and Cloud to distributed and decentralised infrastructures encompassing the IoT to Edge Infrastructure [10]. Yet paradigms still retain the same underlying characteristics and elements that define their operation [11]. Each is defined by persistent research activities and are often driven by the development of new capabilities, such as security [12], hardware accelerators [13], edge computing [14] and power efficiency [15]. Whilst application framework have evolved to meet challenges presented by integrating with wider eco-systems, ranging from distributed clouds to highly specialised application specific infrastructures [16,17,18]. As such distributed paradigms require constantly evolving middleware’s, communication protocols, and secure isolation mechanisms [19].

This work focuses on ascertaining the key characteristics and elements of distributed and networked systems, critically appraise the historical technologies and social behaviour that drove their paradigm formation, whilst identifying key trends across the paradigms including system architecture fragmentation, centralisation and decentralisation pivoting, and delays in paradigm conceptualisation to creation by tracing the impact of networked systems on society. From these findings we discuss how future distributed systems will support decentralisation of computation services through composition of decentralised computation platforms specialised to meet workload specific performance goals, forming exponentially larger systems capable of holistic operational requirements including capability and energy availability. Finally we summarise how a dynamic centralised/decentralised distributed paradigms may form and will shape the direction of future computer science research as well as their potential impact within greater society.

The rest of the article is structured as follows: Sect. 2 presents the background of distributed systems. Section 3 the evolution of the distributed system paradigms. Section 4 analyses trends and observations across all paradigms. Section 5 discusses future challenges facing distributed systems, and Sect. 6 presents our conclusions.

2 Background

Distributed systems describe a class of computing system in which hardware and software components are connected by means of a network, and coordinate their actions via message passing in order to meet a shared objective [20, 21]. Whilst paradigms exhibit differing operational behaviour and leverage various technologies, these systems are defined by their underlying core characteristics and elements that facilitate their operation.

2.1 Characteristics

2.1.1 Transparent concurrency

Distributed Systems are inherently concurrent, with any participating resource accessible via any number of local or remote processes. The capacity and availability of such a system can be increased by adding resources that require mechanisms for accounting and identification. Such a system is vulnerable to volatile inter-actor behaviours and must be resilient to node failure as well as lost and delayed messages [22]. The management and access of objects, hardware or data in a distributed networked environment is also of particular importance due to potential for physical resource contention [3, 4, 23, 24].

2.1.2 Lack of shared clock

Systems maintain their own independent time, interpreted from a variety of sources, and as such Operating Systems (OSs) are susceptible from clock skew and drift. Furthermore, detecting when a message was sent or received is important for ensuring correct system behaviour. Therefore, events are tracked by means of conceptual Logical and Vector clocks; by sequencing messages, processes distributed across a network are able to ensure total event ordering [25,26,27].

2.1.3 Dependable and secure operation

Components of a distributed system are autonomous, and service requests are dependent on correct transaction of operation between sub-systems. Failure of any subsystem may affect the result of service requests and may manifest in ways that are difficult to effectively mitigate. Fault tolerance and dependability are key characteristics towards ensuring the survivability of distributed systems and allow services to recover from faults and whilst maintaining correct service [22].

2.2 Elements

2.2.1 Physical system architecture

Physical system architecture identifies physical devices that exchange messages in a distributed system and what medium they communicate over. Early distributed systems such as mainframes were physically connected to clients. Later packet switching enabled long-haul multi-hop communication. Cellular networks incorporate mobile computing systems, whilst modern systems host services at specialised hardware between services providers and consumers. Initial designs of distributed systems aimed to provide service across local or campus wide networks of tens to hundreds of machines, and were focused on the development of operating systems and remote storage [1, 4]. Early efforts were designed to explore potential challenges and demonstrate their feasibility [8] and to enhance their functional and non-functional properties (performance, security, dependability, etc).

2.2.2 Entities

A logical perspective of a distributed system describes several process exchanging messages in order to achieve a common goal [28, 29]. Contemporary systems extend this definition by considering logical and aggregate entities, such as Objects and Components, used for abstracting resource and functionality [30]. Here systems are exposed as well-defined interfaces capable of describing natural decomposition of functional software requirements, and enabled exploring the loose-coupling between interchangeable components for domain specific problems found in distributed computing [31]. More recent systems leverage web services and micro-services, that consider their deployment to physical hardware as well as constraints including locality, utilization and stakeholders’ policies [32]. Grid and Cloud computing enable distributed computing by abstracting processing, memory and disk space aggregation [33] whereas Fog and Edge computing emphasize integrating mobile and embedded devices [34, 35].

2.2.3 Communication models

Several communication models support distributed systems [36,37,38] including (1) Inter-process Communication: Enabling two different processes to communicate with each other by means of operating system primitives such as pipes, streams, and datagrams in a client—server architecture; (2) Remote Invocation: Mechanisms and concepts enabling a process in one address space to affect execution of operations, procedures and methods in another address space; and (3) Indirect Communication: Mechanisms enabling message exchanges between one to many processes via an intermediary. In contrast with previous communication models, senders and receiving processes are decoupled, and responsible for facilitating message exchange is passed to the intermediary [39, 40].

2.2.4 Consensus and consistency

Distributed systems make decisions amongst groups of cooperating processes each possessing possibly inconsistent states. Consensus algorithms are a mechanism in which a majority subset of nodes or ‘quorum’ can fulfil a client request negotiate a truth and fulfil a client request. Replication and partitioning are common techniques used to improve system scalability, reliability and availability [22] when exposed to volatile environments. Consistency is a challenge to both replicated, partitioned storage and consensus algorithms [22, 25]. Consistency in distributed systems can be defined as strong consistency, where any update to a partition of a data set is immediately reflect in any subsequent accesses, or weak consistency in which updates may experience delay before they are propagated through the system and are reflected in subsequent access’s.

3 The evolution of distributed systems

Distributed systems have continued to evolve in response to various scientific, technological and societal factors. This has given rise to new forms of computer systems, as well as adaptation of paradigms from Client-Server through to IoT and Fog Computing [38]. However, the core characteristics and model elements discussed in Sect. 2 have remained relatively constant, with the precipitating paradigm augmenting (or re-engineering) technology from prior paradigms. Table 1 provides a detailed a timeline of key distributed paradigm formation, technologies that enabled their realisation, and a description of their respective elements. The formation of distributed systems does not occur in a vacuum, and is influenced by factors spanning other computer science disciplines (e.g. HCI, security), societal exposure, education, and business strategy [36, 37].

Table 1 Timeline of distributed system paradigms formation and key technological drivers

Due to the sheer volume of potential influences, we have focused our discussions pertaining to major technological advances and impact upon distributed system elements.

3.1 The mainframe (1960–1967)

Mainframes machines of the early 1960’s provided time sharing service to local clients that interacted with teletype terminals [41]. Such system conceptualised the client-server architecture, prevalent in present day distributed systems design [42]. The client process connects and requests server processes, enabling a single time-sharing system to multiplex resources amongst clients [43]. Mainframes remained prohibitively expensive and were the focus of supercomputing engineers that lead to the innovation of early disk-based storage and transistor memory [44].

3.2 Cluster networks (1967–1974)

The late 1960s and early 1970s saw the development of packet switching, and clusters of off-the-shelf computing components were identified as a cheaper alternative to more powerful yet more expensive supercomputer and mainframes [45]. New programming environments and resource abstractions were developed abstracting resource across local networks of machines [1, 4]. This time period also saw the creation of ARPANET and early networks that enabled global message exchange [5], allowing for services hostable on remote machines across geographic bounds decoupled from a fixed programming model. Cerf and Karn [5, 6] defined the TCP/IP protocol that facilitated datagram and stream orientated communication over a packet switched autonomous network of networks [46].

3.3 Internet and home PCs (1974–1985)

During this era, the Internet was created. Whilst early NCP-based ARPANET systems were characterised by powerful timesharing systems serving multiple clients over networks, new technologies such as TCP/IP had begun to transform the Internet into a network of several backbones, linking local networks to the wider Internet [5]. Thus, the number of hosts connected to the network began to grow rapidly, and centralised naming systems such as HOSTS.TXT could not scale sufficiently [2]. Domain Name Systems (DNSs) were formalised in 1985 and were able to transform hosts domain names to IP addresses; the Unix BIND system was the first public implementation of the DNS. Computers such as Xerox Star and Apple LISA utilizing early WIMP based GUIs demonstrated the feasibility of computing within the home, providing applications such as video games and web browsing to consumers.

3.4 World wide web (1985-1996)

During the late 1980s and early 1990s, the creation of HyperText Transport Protocol (HTTP) and HyperText Markup Language (HTML) [3] resulted in the first web browsers, website, and web-serverFootnote 1. Standardisation of TCP/IP provided infrastructure for interconnected network of networks known as the World Wide Web (WWW). This enables explosive growth of the number of hosts connected to the Internet, and was the public’s first large societal exposure to Information Technology [3, 5]. Mechanisms such as Remote Procedure Calls (RPCs) were invented, allowing for the first time applications interfaced with procedure, functions and method across address spaces and networks [23].

3.5 P2P, grids and web services (1994–2000)

Peer to Peer (P2P) applications such as Napster and Seti@Home demonstrated it was feasible for a global networks of decentralised cooperating processes to perform large-scale processing and storage. P2P enabled a division of workload amongst different peers/computing nodes whereby other peers could communicate with each other directly from the application layer [7]Footnote 2 without the requirement of central coordinator. The creation of Web Services enables further abstraction of the system interface from implementation in the Web [11]. Rather than facilitate direct communication between clients and servers, Web Services mediated communication via brokerage services [47]. Scientific communities identified that creating federations for large pools of computing resources from commodity hardware could achieve capability comparable to that of large supercomputing systems [48]. Beowulf enables resource sharing amongst process by means of software libraries and middle-wares, conceptualising clustered infrastructure as a single system [49]. Grid computing enabled open access to computing resources and storage by means of open-protocols and middleware. This time period also saw the creation of effective x86 virtualization [50], which became a driving force for subsequent paradigms.

3.6 Cloud, Mobile and IoT (2000–2010)

A convergence of cluster technology, virtualization, and middleware resulted in the formation of the Cloud computing that enabled creating service models for provision application and computing resource as a service [51]. Driven primarily by large technology organization whom constructed large-scale datacenter facilities, computation and storage began a transition from the client-side to the provider side more similar to that of mainframes in the 1960s and 1970s [32, 52]. Mobile computing enabled access to remote resources from resource constrained devices with limited network access [50, 53] IoT also began to emerge from the mobile computing and sensor network communities providing common objects with sensing, actuating and networking capabilities, contributing towards building a globally connected network of ‘things’ [54].

3.7 Fog and edge computing (2010-present)

Whilst data produced by IoT and Mobile computing platforms continued to increase rapidly, collecting and processing the data in real-time was, and still remains an unsolved issue [55]. This resulted in forming Edge computing whereby computing infrastructure such as power efficient processors, and workload specific accelerators are placed between consumer devices and datacenter providers [53]. Fog computing provides mechanisms that allow for provisioning applications upon edge devices [56, 57], capable of coordinating and executing dynamic workflows across decentralised computing systems. The composition of Fog and Edge computing paradigms further extended the Cloud computing model away from centralised stakeholders to decentralized multi-stakeholder systems [56] capable of providing ultra-low service response times, increased aggregate bandwidths and geo-aware provisioning [14, 55]. Such a system may comprise of one-off federations or clusters, realised to meet single application workflows or act as intermediate service brokers, and provide common abstractions such as utility and elastic computing across heterogeneous, decentralised networks of specialised embedded devices, contrasting with centralised networks found in clouds [34].

4 Trends and observations

By appraising the evolution of the past six decades of distributed system paradigms shown in Table 1, it is apparent that a variety of technological advancements within computer science have driven the formation of new distributed paradigms. It is thus now possible to observe longer-term trends and characteristics of particular interest within distributed systems research.

4.1 Diversification of paradigms

There appears to be an increased diversification of distributed system paradigms as the research area has matured as shown in Fig. 1. This is predominantly driven by two factors: first, it is observable that the acceleration of paradigm formation was precipitated by the invention of the WWW in 1999. This is intuitive as this event enabled distributed systems to transition away from specialized research focused activities into greater society, with each sector requiring specific requirements from entertainment to commercial use. The second reason is that the maturity of fundamental technologies (TCP/IP, HTTP, Unix) created a platform that heavily emphasised abstraction to interconnect heterogeneous platforms in an effective manner, hence future paradigms were able to build upon these concepts. Figure 2 also demonstrates how distributed system paradigms transitioned from a potentially ‘niche’ research with a development singular track within the computer science community towards an area spanning a wide variety of paradigms coinciding with the time the Internet and WWW gained traction.

Fig. 1
figure 1

Depiction of distributed system paradigm evolution

Fig. 2
figure 2

Time gap between paradigm conception and creation

4.2 Architecture pivoting from centralization to decentralization

The creation of a new technology appears to drive the next distributed system paradigm, and respectively alter its respective degree of centralization as shown in Fig. 1. The creation of a new paradigm results in researchers revisiting fundamental mechanisms (schedulers, fault tolerance, monitoring) to ensure that they are capable of effectively operating within the new set of system assumptions. This is exemplified when considering responsibilities frequently carried out by scientists; a principle purpose of peer-review within the research community is ascertaining whether proposed approaches exhibit suitable differences from previous paradigms to determine their novelty (or whether it is a ‘reinvention of the wheel’). This is apparent when considering a number of papers created that attempt to clearly distinguish between paradigms that leverage shared technologies [33]. We observe that the majority of paradigms predominantly are decentralized in nature, with the exception of Cloud computing which follows many similarities with the centralized mainframe in terms of the coordination of computational resources within a datacenter facility which users access via web APIs.

4.3 Time between system conception and creation

The delay between the description of a potential paradigm and actual successful implementation in recent years appears to be shorter in contrast to previous decades as shown in Figure 2. It is worth noting that ascertaining the precise publication fully credited in accurately describing the full realization of a paradigm due to a single individual or group is not necessarily feasible. Thus, we have attempted to seek papers which first define the appropriate terminology and paradigm description that were later adopted. As shown Fig 2, the formative years of distributed systems between 1960 and 1996 saw an average delay of 13 years and after the adoption of the WWW saw an average 8.8-years delay. It is observable that most paradigm are conceived and created sometime within 3–10 years, with the exception between 1960 and 1990 which is likely due to insufficient technologies when first envisioned, Later paradigms again appear to be relatively short in duration to create, and is likely a by-product of increased maturity of the research area, combined with its pervasiveness within society and growth of research activity within each respective paradigm (i.e. there are a sizable proportion of distributed researchers whom focus on a particular paradigm).

5 Future of large-scale computing infrastructure

5.1 Accelerated paradigm specialization

It is observable that specific distributed system paradigms have a particular affinity for tackling different objectives; whilst Cloud computing is capable of handling generalized application workload, paradigms such as edge computing and fog computing have been envisioned to be particularly effective for sensor actuation and increasingly important latency requirements. A growing number of microprocessors are being designed to accelerate specific tasks (such as graphics and machine learning using GPUs and NPUs, respectively). In tandem, the end of Moore’s law indicates that by 2025 chip density will reach a scale where heat dissipation and quantum uncertainty make transistors unreliable [58]. When combining all of these factors together, it is apparent computing systems are in the process of undergoing massive diversification. This diversification is not solely limited to hardware but can also be observed in software.

For example, the last decade has seen resource management undergo a transition from centralized monolithic scheduling to decentralized model architecture [9, 16, 53]. Centralized schedulers maintain a global view of cluster state and are therefore able to make high quality placement decisions at the cost of latency [5, 6, 59, 60]. However, decentralized schedulers maintain only partial state about the cluster, and so they are able to make low latency decision at the cost placement quality [61]. As a result, we envision that further diversification and fragmentation of the distributed paradigm will continue to accelerate and affect all of its respective elements. For example, it is not hard to envision that the system that enables an infrastructure autonomous vehicle operation being substantially different to that of remote sensor networks and smart phones; we are already seeing such diversification with making custom OSs and applications for these scenarios. In the case of cluster resource management there have been an increased research activity in hybrid schedulers, capable of multiplexing centralized and decentralized architectures [10, 19], and we expect that future distributed systems must be capable of architectural adaptivity in response to changes to operation.

5.2 Generalization against specialization

Related to paradigm specialization discussed in Section 5.1, the distributed systems research area appears to be at a particularly interesting cross-roads; ensure that system paradigms are designed to be generalizable to handle a wide variety of operational conditions and scenarios (at the cost of performance and efficiency), or alternatively focus on creating more specialized and bespoke distributed system more suitable to a particular task at the expense of generalization and portability. While the wide-spread adoption of the x86 architecture, middleware, and virtualization have reinforced that historically the community has championed generalizable and portability, continued diversification of paradigms and technological limitations have begun discussion whether the axis is pivoting in the other direction [62]. This is further reinforced by increasing customization of microprocessors, OSs, and power management techniques for particular use case scenarios. For example, the increased uptake of deep learning has resulted in further increase into research into GPU and NPUs inside and outside of the datacentre, as well as creation of cluster resource schedulers specifically for deep learning [63].

5.3 Complexity at scale and the role of academic research

An area of potential future research challenge moving forward is how to understand these future distributed systems at scale. For many years Computer scientists have leveraged well-structured system abstractions in order to reduce the complexity to understand component interactions and assumptions. However increasingly there have been difficulties in handling unseen emergent behaviour within massive-scale distributed systems [64] that require rethinking well-established assumptions for system mechanisms [65]. Moreover, with the rapid uptake of new technologies such as deep learning and reinforcement learning to conduct decision making of system operation [66], whilst introduction of temporal applications and mobile compute will likely lead to increased complexity of distributed system operation at scale. In relation to the academic research community, where there is a substantive reliance on simulation or small to medium-scale distributed systems, it will continue to become increasingly difficult to evaluate effectiveness of their approaches when exposed to emergent behaviour within systems at scale. Whilst production systems from industry can greatly support understanding of distributed systems at scale, it does not provide an avenue to conduct experiments within a controller environment to test hypothesis effectively.

5.4 The green agenda

Growing end-use demand for applications and subsequent data generation in the regions of Exabyte will usher in the first system at Exascale by 2020, and eventually Zettascale by 2035 [67]. Whilst an achievement in itself, it also brings a variety of associated challenges. One challenge which is particularly problematic is the enormous power requirements that will be necessary for operation. ICT presently consumes more than 10% of the global electricity annually [68]. The creation of ever larger systems through efficiency improvements is in fact detrimental due to the Rebound Effect [69] that causes even greater demand and consumption. At a time where climate change and a 1.5°C increase in global temperatures by 2100 due to Greenhouse Gas Emissions [70], we foresee energy and GHG emissions being increasingly important for future distributed system paradigms. This is not solely increasing energy-efficiency as we see today, but more fundamental concerns related to systems assuming operate constant stable power sources, integration with renewable energy sources, and alternative methods for reducing energy consumption but also computation itself. An area of particular interest is that of holistic coordination of energy management (asynchronous computing, voltage scaling, Wake-on-LAN, cooling, etc.) [15] towards studying and treating systems as living eco-systems, as opposed to individual components in isolation.

5.5 Shifting from centralised systems to decentralised edge

The evolution of centralised systems towards decentralised system transformed many industries and organisations which have resulted in significant contributions towards economic growth worldwide [71]. With the emergence of Big Data, centralised cloud systems have played an important role to process both structured and unstructured data in an efficient manner [72]. With the rapid adoption of IoT technology, these systems are able to process large amount data using various machine learning algorithms. It is difficult to process real-time jobs on centralised cloud systems due to increases in latency and response time, and incurs various complexities: New distributed applications (cryptocurrencies, the machine economy etc.) require computing models which are not compatible with existing centralised cloud systems [53]. As the adoption of edge computing is increasing, decentralised edge systems have been positioned to be particularly effective at processing user workloads immediately on powerful edge devices without the reliance upon large cloud datacenters [73], thus reducing round trip communication times at the cost of reduced computational performance. The evolution from centralised cloud systems to decentralised edge is growing among various industries while executing IoT based decentralised applications [74]. It is likely that given sufficient time and technological innovations, this pendulum will swing in the opposite direction, whereby computationally powerful decentralised systems will in turn form centralised architectures (and possibly an assortment of centralised systems coordinated via federation).

5.6 Distributed green computing

Rapid growth in large scale distributed application servicing paradigms ranging from Big Data and Machine Learning to the Internet of Things, are increasingly responsible for the world’s energy consumption and as such a major contributor to environmental pollution [75]. One such example includes distributed Machine Learning systems [76]—comprising of clusters of GPUs dedicated to Deep Learning applications; require effective energy management aware scheduling policies [77]. As such new orchestration mechanism capable of capturing GPU, CPU, and memory energy characteristics [78] informing new scheduling algorithms prioritising energy consumption in contrast with traditional performance and fairness scheduling objectives [13, 15, 79]. Such scheduler should holistically consider energy consumption and account for out of band costs including impact of workload consolidation on cooling systems [15, 79]. Furthermore, exergy and energy source can be utilised to further inform datacentre operators about the carbon impact of their infrastructure. Whilst, hybrid energy grids utilizing green intermittent decentralised energy sources including solar and wind can provide clean energy whilst brown energy source can be utilized at peak time, minimized reliance of fossil fuels energy sources, and achieve new sustainable computing standards [80].

6 Conclusions

In this paper, we have discussed and evaluated the evolution of the distributed paradigm over the past six decades by focussing on the development and decentralised pivoting of networked computing systems. We have identified core elements of distributed systems by describing their physical infrastructure, logical entities and communication models. We examine how cross cutting factors such conceptual and physical models influence centralisation and decentralisation across various paradigms. We observe long term trends in distributed systems research, by identifying influential links between system paradigms, and technological breakthroughs. Of particular interest, we have observed that distributed system paradigms have undergone a long history of decentralisation up until the inception of the World Wide Web. In the following years, pervasive computing paradigms—such as the Internet of Things—brought about by advancements and specialisation of microprocessor architecture, operating systems designs, and networking infrastructure further diversified both infrastructure and conceptual systems. Furthermore, it is apparent that the diversification of distributed systems paradigms that begun at conception of the World Wide Web is likely to further accelerate due to increased emphasis on decentralisation and prioritization of specialized hardware and software for particular problems within domains such as machine learning and robotics. This is somewhat removed from the past few decades which has emphasized generality and portability of distributed system operation and as such will be the focus of research efforts over the coming years. Moreover, there are potentially difficult challenges on the horizon related to the upfront cost of operating large systems testbeds out of reach for most academic laboratories, and the impact of climate change and how it shapes future system design.