Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Motivation

The GENI [13] Mesoscale deployment was a first-in-its-kind infrastructure: small clouds (called, collectively “GENI Racks”) spread across the United States, interconnected over a private, programmable layer-2 network with OpenFlow [49, 68] networking native to each cloud. The GENI Mesoscale combined the essential elements of the two principal precursors to GENI [27, 34]: the wide-area distribution and scale of PlanetLab [1, 10, 23, 59] with Emulab’s [40, 72] ability to do controlled, repeatable experimentation, and added the entirely novel feature of programmable networking through the OpenFlow protocol. The Mesoscale was to offer the ability to allocate customizable virtual machines, containers, and physical machines at any of 50 or more sites across the United States, interconnected with deeply-programmable networking on a private layer-2 substrate and with programmable networking on each rack.

In this, the Mesoscale envisioned three features not available in any commercial or operational academic cloud: geographic distribution, highly-customizable computing elements, and deeply-programmable networking [2]. The primary challenge is that each node in the network is hosted by a separate individual donor institution, which offers bandwidth and maintenance services as a donation to the community. Under these circumstances, the node and each experiment running on it must behave as a polite guest; it should not unduly consume resources, it should require hands-on maintenance very infrequently, and it should not do harmful things to the institution, or to third parties who will interpret the damage as emanating from the institution. This means every node in the network must have two distinct administrators: a central authority representing GENI, and the host institution. It further means that the node must be heavily instrumented, and either the central authority or the local authority must be able to shut it down immediately in the event of abuse or excessive resource consumption.

All of this is well known from PlanetLab [60]; GENI merely deepens the requirement, since it introduces programmable networks and heterogeneous computing environments (VMs, containers, and bare-metal machines running a variety of operating systems; PlanetLab permitted only Linux containers under VServers [66], which simplified administration significantly).

The geographic distribution and need to reduce the burden of on-site maintenance implied a requirement for highly-autonomous operation, intensive in situ measurement, and an ability to shut down slices rapidly, automatically, and selectively.

Since no facility prior to the Mesoscale had combined these features, we had no way of knowing precisely what experiments and services would be run on the Mesoscale. We drew on both the PlanetLab and Emulab experiences, but since both omitted some essential features present in the Mesoscale, it was certain that entirely new experiments would be run on this facility. As a result, flexibility became a crucial design criterion: we need to be able to rapidly customize the testbed to meet users’ needs.

In InstaGENI, this implied a decision to go broad and small rather than narrow and heavy: given a choice between a lighter-weight rack that could be deployed to more sites vs a heavier-weight rack with more features, we opted for the former once we’d achieved a critical mass of functionality within the rack. There were multiple rationales for this decision. First, is far easier to add resources to an existing site than it is to bring up a new one, since the later task involves identifying and training administrative personnel at the site, arranging network connections, acceptance and site tests, physical siting, installation, and plumbing, and so on; adding resources to an existing site is usually “just” a matter of buying computers and line cards and plugging them in. Second, we didn’t know the right size for a site until we had some experience. Experiments which required more resources than were present at a particular site could always simply add more sites to the experiment. The only real penalty for using other sites rather than adding resources from the same sites was latency. However, the penalty for missing a site was an area which lacked geographic coverage from GENI, and this is problematic from a coverage perspective for applications and services that require distribution. Further, GENI already had sites with many concentrated resources, notably the various Emulab-based testbeds spread around the country; experiments requiring concentrated resources could go there. Third, we wanted the community and the sites to be able to expand the racks in ways that we couldn’t anticipate. Leaving space in the racks and letting people install devices with interesting properties made for a variegated and rich testbed. For example, sites have installed electrical grid monitoring devices in GENI racks, offering virtual power grid labs on GENI [21]. Room for expansion offered the community the ability to offer such services very easily.

The next dictated design decision was to use a proven software base (ProtoGENI [62]) rather than install a new, experimental software base. Building a distributed, flexible infrastructure is a large challenge; building one on a new software base would be extremely challenging, both for developers and users. Moreover, fidelity and repeatability of experiments required that the infrastructure on which they were run was stable and, to the extent that we could make it so, artifact-free. Even subtle bugs and peculiarities in the infrastructure can lead to misleading experimental results. The ProtoGENI software stack is descended from Emulab, which had operated its eponymous Utah cluster 24/7 for a decade; we knew it was stable. Finally, GENI and the Mesoscale initiative represented a large bet for the systems community and the National Science Foundation. We needed to make the GENI Mesoscale a reliable experimental facility, good for constant use by a large community, within the lifetime of the GENI project. Spending a year debugging the infrastructure was contraindicated.

While we could not anticipate all the research that would be done on the Mesoscale, research and experiments in cloud management systems was certain to be a dominant theme. This is a topic of great current interest in both the systems community and in the industry, and an area of extremely active development. Moreover, while it’s relatively straightforward to do applications research on operational clouds, even at large scale, it’s difficult to do cloud management research. For this reason, a key requirement was to ensure that InstaGENI would be a platform for research into the management and monitoring of both centralized and distributed clouds. Experimenters needed to be able to build and manipulate their own cloud platforms within the InstaGENI architecture. Thus, InstaGENI had to be a meta-cloud, the first of its kind: a cloud that permitted within it the nesting of other clouds. Indeed, it was anticipated from the first design of InstaGENI that at least one cloud would be instantiated inside the underlying ProtoGENI base code: the GENI PlanetLab infrastructure would be nested within the rack. This would serve as the prototype for our clouds-within-clouds strategy, simultaneously satisfying the need for an infrastructure that would support seamlessly long-running distributed experiments and services using lightweight, end-system resources. The architecture that we designed together with the GENI PlanetLab team remained unchanged as GENI PlanetLab evolved into the GENI Experiment Engine [11]. Indeed, the switch from GENI PlanetLab to the GENI Experiment Engine was entirely transparent to InstaGENI: the GEE team used the standard InstaGENI tools unchanged, just as any other experimenter would.

This combination of a distributed cloud and meta-cloud strategy—small clouds everywhere and embedded clouds-within-clouds—is also used by projects National Science Foundation’s NSF Future Cloud concept, notably in CloudLab [63]. InstaGENI served as a validation point and proof-of-concept for this design.

A final requirement was nothing exotic: components can and do break, and must be easily replaced. Further, interest in InstaGENI racks is evident around the world, from Japan, Korea, Germany, Taiwan, and Brazil: hence the racks must be easily delivered anywhere. Further, though our original design was based entirely on HP equipment, we wanted to ensure that both expansion in the current racks and designs for future racks could incorporate equipment from other manufacturers as circumstances warranted. These considerations led to a COTS design philosophy: we would use only commodity components and build as little hardware dependence into our design as possible.

Thus, InstaGENI: a network of 35 small clouds, spread across the United States. The design requirements, discussed above, dictate the global architecture of InstaGENI: each cloud is small, expandable, built from commodity components, with a high degree of remote management and monitoring built-in. The network itself is designed to withstand partition: each rack is capable of acting autonomously. OpenFlow is native to the racks and the racks are interconnected in the control plane across both the routable Internet and the private GENI network, and the data plane across the racks is interconnected over layer 2. Then-GENI Project Director Chip Elliott’s vision for the GENIRacks was to be the successor to the router in the new network architecture implied by GENI: we believe that the InstaGENI design is a good start on that.

The remainder of this chapter is organized as follows: in Sect. 14.2 we consider the role that GENIRacks, and specifically InstaGENI, play in the Mesoscale and in the architecture of the future Internet. In Sect. 14.3 we describe the architecture of InstaGENI. In Sect. 14.4 we describe the architecture and implementation of InstaGENI. In Sect. 14.5 we describe the hardware and software implementation of InstaGENI. In Sect. 14.6 we describe deployment considerations and concerns. In Sect. 14.7 we describe operating and maintaining an InstaGENI rack. In Sect. 14.8 we describe the current status of the InstaGENI deployment. In Sect. 14.9 we describe related work, particularly our cousin project ExoGENI [3], and in Sect. 14.10 we conclude and offer thoughts on further work.

2 InstaGENI’s Place in the Universe

Testbeds and experiments are all very well; however, the implications of InstaGENI’s design are much broader than experimental facilities for systems computer scientists. Though this was and remains GENI’s primary mission, it was always far more than that: put simply, GENI is a prototype of the next Internet—and the GENIRacks were always envisioned as the “software routers” of that next generation of the Internet. This is a sufficiently ambitious goal, and a sufficiently deep topic, to warrant some discussion here.

We should start with the obvious: why do we need a new Internet at all? The fundamental answer is that both the fundamental underlying technology of the Internet and the use cases which informed its design point have changed radically in the generation since its architecture was finalized. In the founding era of the Internet, memory and computation were expensive relative to data transmission, and the fundamental use case was bulk, asynchronous data transfer. Today, computation and memory are cheap relative to networking, and the bulk of Internet traffic is in latency-sensitive high-bandwidth applications: video, real-time interactive simulation, high-bandwidth interactive communication, and the like.

Even the fundamental use case, bulk data transfer, has been significantly affected by the change in underlying technology. When computation and memory were expensive, moving data to computation—no matter how slow or painful—was a necessity. Now, however, cheap computation and memory are ubiquitous: it is feasible to move computation to data. And when it is feasible, it is almost always attractive. Programs are generally small relative to the data they process, and many programs reduce data. Some simple examples demonstrate the point. The CASA Nowcasting experiment [46] looks for significant severe weather events; local processing, sited at the weather radar, can find events of interest and propagate them to a cluster which can do detailed processing. Doing the reduction locally, at the weather radar, saves enormously on bandwidth and focuses the network on those events of interest.

This is a simple example, but many more in the same vein can be described; and as the Internet of Things becomes dominant, many more examples of this sort will emerge. The CASA radar is merely one example of a very large class of device: the high-bandwidth, high-capacity sensor. Choose virtually any Internet of Things use case that involves such sensors, from driverless cars to real-time crime detection. A straightforward, back-of-the-envelope calculation will demonstrate that the take from the various sensors will overwhelm the network; the IoT will require not just higher-capacity networks, but an entirely new architecture, with pervasive local computing.

Other examples include latency-sensitive computation. Real-time Interactive Simulation (RTIS) has long been a staple of computing entertainment and technical training; it is also now being used more generally in Science, Technology, Education, and Mathematics (STEM) education, educational assessment, and maintenance applications. “Gamification” is largely the deployment of RTIS for non-gaming applications. This has been spurred by the sophistication of the HTML5 platform, which has meant that the browser can now support significant, intensive 3D interactive applications.

Use of the browser as a rendering platform is preferred for a variety of reasons: ubiquity of access implies that demand on the client be minimized, and use of a standard browser platform is the best that we can do to minimize demands on the client. Further, for many use cases (educational assessment, for example) one wants to protect the application from client interference: the student shouldn’t be able to cheat on the test. These requirements imply the need for cloud-based hosting of at least some RTIS applications: in general, as much as one can get away with.

However, the Achilles’ Heel of cloud-based RTIS is latency; in general, the computing engine should be no more than a 50 ms round-trip from the user. Any latency more than that invites significant artifacts from a user’s perspective: jitter, jumpy displays, out-of-order event sequencing, and so on. The combination of the need for cloud-based hosting of the service with the application requirement of low latency to the end-user points at the need for a pervasive cloud [20].

An excellent example of a low-latency high-bandwidth application delivered to the user through a thin-client web browser is the Ignite Distributed Collaborative Scientific Visualization System [15, 16], described in another chapter in this book [39]. A combination of a large data set (9 GB), required high data rate between visualization client and data server (100 Mb/s–1 Gb/s) and low required round trip time (¡20 ms) required the use of a distributed, pervasive cloud. It is a prime example of the kinds of applications that require the InstaGENI distributed cloud.

This pervasive cloud, driven by the twin needs for in-situ data reduction and low latency between application host and application consumer, is the next generation of the network. The fundamental architecture of the current Internet is centered around moving data between fixed computation sites; the architecture of the next generation may well be centered around sending programs to be executed near data sources or users. In the argot of networking, provision of in-network layer 7 services will be the dominant use case for the network in the coming decades.

Provision of layer 4–7 services in the network is nothing new, of course: this has been the province of middleboxes and proxies almost since the inception of the Internet. What is different now is degree rather than kind: rather than being an ad-hoc appendage to the Internet, the pervasive cloud will make proxies and middleboxes the central component of the emerging new Internet architecture. In this architecture, universal, programmable middleboxes will play the role that routers played in the first generation of the Internet architecture. Fundamentally, a GENIRack is a platform for the deployment of universal, highly-programmable middleboxes; in other words, the prototype of this new central component of the emerging Internet.

3 Architecture of InstaGENI

Above all, InstaGENI is designed to meet the primary goals of the GENI project, which are directed at creating a highly customizable environment for innovative research, without restrictions and pre-conditions and with complete direct control over all resource elements. Consequently, InstaGENI is a deployment platform for GENI control frameworks, which enable researchers to discover, integrate, and experiment with GENI resources. Fundamentally, GENI is a platform for the deployment of virtual networks interconnecting virtual computational resources. “Virtual” here is used in the classic sense: details of the physical attributes and specific implementations are, to the extent possible and appropriate for the use of the resource, hidden from the programmer; further, the programmer is given the abstraction of an isolated network of isolated resources, all of which have guaranteed properties. The computational resources are generally but not always virtual machines or their close cousins, OS containers. On occasion the virtual resources are physical machines, radio nodes, specialized instruments, etc. [14, 33, 57, 67, 69]. GENI has been designed to enable novel edge components to be integrated into the experimental environment. In sum, InstaGENI is a distributed systems cluster with sliceability at the end-host, distributed systems, and network level.

3.1 The InstaGENI Software Architecture

The InstaGENI software architecture is designed to provide deeply configurable and deeply-programmable Infrastructure-as-a-Service and customizable OpenFlow networks as a service. A critical design consideration was user familiarity: an InstaGENI is essentially a small Emulab, with an embedded OpenFlow switch to permit the construction of virtual networks. Further, a collection of InstaGENI racks should behave as a distributed Emulab. While new capabilities and functions are provided within the InstaGENI rack—GENI Experiment Engine nodes hosted on InstaGENI Racks, more virtualization options, a Network Aggregate Manager, and the ability to run long-term slices within networks of virtual machines—it was critical that a user unfamiliar with InstaGENI be able to use InstaGENI just as he was used to using Emulab. Further, each rack must be independently manageable.

Management and control functions for nodes in InstaGENI racks are primarily provided by the ProtoGENI software stack. Each rack has its own installation of the control software, and is capable of operating as an independent unit.

Fig. 14.1
figure 1

The InstaGENI software architecture

The software architecture of InstaGENI is shown in Fig. 14.1. The important thing to take away from this diagram is nested and distributed control. The key element is the ProtoGENI Base Manager (or ProtoGENI Controller, or Boss Node) on the rack, which plays essentially the same role for an element of a distributed cloud that a node manager does in a cloud: it orchestrates resources on the individual rack. Nested controllers, whether they be entities such as the central GENI Portal or other controllers such as the GENI Experiment Engine, use the ProtoGENI Base Manager as an agent to manipulate resources on the individual rack: allocate and free VM’s and bare metal nodes, load images, etc.

The Control Node in each rack runs Xen. This allows multiple pieces of control software to run side-by-side in different virtual machines, with potentially different operating systems and administrative control. This configuration also eases the deployment, backup, and update of the control software. At installation, there are four such virtual machines:

  1. 1.

    An Emulab/ProtoGENI boss node: this is a database, web, and GENI API server, and also manages boot services for the nodes

  2. 2.

    A local fileserver and give users shells so that they can manage and manipulate the data on the fileserver even if they do not currently have a sliver. This VM can also act as a gateway for remote logins to sites that do not have sufficient Internet Protocol (IP) addresses to give every experiment node a publicly routable address.

  3. 3.

    An OpenFlow Aggregate Manager (FOAM) node to control the OpenFlow resources on the in-rack switch

  4. 4.

    A FlowVisor controller to provide support for control-plane multi-tenancy on the OpenFlow network.

Node Control and Imaging

The experiment nodes in the InstaGENI rack are managed by the normal ProtoGENI/Emulab software stack, which provides boot services, account creation, experimental management, etc. Users have full control over physical hosts, including loading new operating system images and making changes to the kernel, in particular, to the network stack. The ProtoGENI/Emulab software uses a combination of network booting, locked down BIOS, and power cycling to ensure that nodes can be returned to the control of the facility and to a clean state, meaning that accidental or intentional changes that render a node’s operating system unbootable or cut off from the network can be corrected. Nodes are scrubbed between uses; after a sliver is terminated, the node is re-loaded with a clean image for the next user.

Images for OSes popular with network researchers, including at least two Linux distributions and FreeBSD, are provided. Users may customize these images and make their own snapshots. Installation of other operating systems is possible, but involves significant expertise on the part of the experimenter and manual intervention on the part of the rack administrators. Users making images in this fashion are strongly encouraged to do so on the InstaGENI installation at the University of Utah, where the most assistance is available. One use of this capability is to boot nodes into images that support other control frameworks: e.g., to create slivers that act as GEE nodes or OpenFlow controllers.

In addition to raw hardware nodes, ProtoGENI also provides the ability to create multiple virtual machines (VMs) on the experimental nodes. ProtoGENI supports this in two forms; in the first, an experimenter can allocate a dedicated physical machine, and then slice that into any number of virtual containers. All of the containers are part of the one slice that is being run by the user. In the second form, one or more of the physical nodes are placed into shared mode, which allows multiple users to allocate containers alongside other experimenters. Typically, nodes running in shared mode exhibit better utilization. Physical nodes may be dynamically moved in and out of the shared pool at any time. InstaGENI racks typically allocate three nodes per rack as shared hosts; more nodes may be moved into this pool as required.

The slicing technology used for ProtoGENI virtual hosts is the Xen [6] virtual machine monitor. Earlier in its history, InstaGENI used OpenVZ, a Linux container technology for slicing shared hosts. OpenVZ has the advantage of being very lightweight and booting quickly [31], but we found that it was too restrictive for the types of experiments that GENI users wanted: many wanted the ability to run different Linux kernels, to move images back and forth between physical and virtual hosts, etc. Using a single kernel, as is done in OpenVZ, also proved to be less stable when exposed to the types of workloads offered by systems and network researchers.

Administration, Clearinghouse, and Local Control

The InstaGENI racks are registered as aggregate managers with the GENI clearinghouse, which provides for registration and resolution of metadata associated with users, slices, and component managers. The clearinghouse also serves as a “root of trust,” exchanging root cryptographic materials (such as CA certificates) between all parties, so that they do not have to do so pairwise. This means that these entities are visible to, and usable from, existing tools that support the GENI APIs and clearinghouse; these tools include the GENI portal, ProtoGENI command line tools, Jacks (a graphical experiment design tool) [26], GENI Desktop [19], and Omni(a command line tool for reserving resources across control frameworks) [36]. Details on the GENI Desktop and the GENI Architecture are given in other chapters in this volume [17, 38]. Local administrators are given several policy knobs, which allow the administrator to make the following simple policy decisions:

  • Allow all GENI users access to the rack

  • Allow GENI users to access the rack, but limit how many nodes each user may allocate at a time

  • Block all external users (e.g. those who do not have accounts registered on the particular rack) from using the rack

  • Issue credential to specific users that allow them to bypass the policies above

Other policies of these types (e.g. user and resource restrictions) can be added as required by sites.

Each rack is given its own Certification Authority (CA) certificate; to establish trust with the rest of the GENI and ProtoGENI federations, a bundle of these certificates are available from the ProtoGENI clearinghouse. ProtoGENI federates fetch this bundle nightly, so all current member of the ProtoGENI federation will, by default, accept the InstaGENI racks as members of the federation. An InstaGENI rack can participate in any number of federations by registering at more than one clearinghouse and adding CAs from other federates to its local set. This feature has been used to prototype federations that cross international boundaries [12, 18].

Nested Control Frameworks and PlanetLab/GENI ExperimentEngine Integration

The ability to nest control frameworks was a major design goal of InstaGENI. There are two major drivers for this design goal: first, to enable researchers in cloud technologies to bring up their own clouds within InstaGENI; and, second, to offer customized, simplified clouds for specific purposes, utilizing the mechanisms of the underlying meta-cloud for various services (network configuration, image load, and so on). PlanetLab was always designed as our prototype nested cloud. It, and its successor, the GENI Experiment Engine, are described more fully in another chapter [9] in this volume. Here, we cover some simple basics.

InstaGENI provides a GEE node image. Fundamentally, this is simply an Ubuntu 14.04 LTE image with the Docker container management system installed. GEE nodes use a container-based virtualization technology that provides an isolated Linux environment, rather than a standard VM, to a sliver. Containers can offer better efficiency than VMs, particularly for I/O, because a hypervisor typically introduces an extra layer in the software stack relative to a container-based OS. In the PlanetLab model, all slivers on a physical host run on an underlying shared kernel that slices cannot change. However it is possible to base the Linux environment offered to slivers on different Linux distributions.

The GEE uses Linux Containers (LXC) [48] running under Docker [25] as the core virtualization technology. LXC extends end-host networking with integrated network namespaces. Network namespaces provide each Linux container with its own view of the network. Within each container it is possible to customize many aspects of the network stack, including virtual device information such as IP and MAC address, IP forwarding rules, packet filtering rules, traffic shaping, Transmission Control Protocol (TCP) parameters, etc.

GEE nodes are managed through the GENI Experiment Engine portal and head node at http://www.gee-project.org. A full description of the GEE and its administration may be found in the GEE Chapter in this volume.

The essential elements of the GEE form the recipe for future nested clouds: form a base image which is deployed by the InstaGENI underlying service; choose which nodes to allocate in each rack to the nested cloud; use the InstaGENI service to allocate them and deploy the boss images; and write a separate, standalone controller to allocate slices on the nodes. We believe that this recipe can be followed for a large number of future nested Clouds on the InstaGENI infrastructure. Notably, we believe that it would be not only possible but easy to instantiate a distributed OpenStack-administered Cloud on the InstaGENI racks. Good examples of such OpenStack-administered distributed Clouds are the Canadian SAVI Network [5, 43, 44, 47] and the OpenCloud/XOS [61] from Stanford’s ONLab, so this feature offers a potential area of expansion for both these infrastructures.

4 The InstaGENI Network

InstaGENI features two networks: a control network over the routable Internet and a private layer-2 data plane network provided over the GENI Mesoscale [24], transitioning to Internet 2’s Advanced Layer 2 Service (AL2S). Experimenters have access to the raw network interfaces on nodes allocated to their slices.

Fig. 14.2
figure 2

The InstaGENI rack network

A diagram of the rack network is shown in Fig. 14.2. Control plane connections are through a dedicated, relatively low-bandwidth conventional switch. This handles boss/worker control communications, Integrated Lights-Out (iLO) connectionsFootnote 1 and external control connections. The external control interface is over the routable Internet and is a single 1 Gb/s connection.

Data plane switching is over an OpenFlow switch. Each worker node has three 1 Gb/s connections to the data plane switch. There must be at least one 1 Gb/s connection to the GENI Mesoscale network, and with a single 20-port linecard the switch can support up to five external dataplane connections (assuming the minimum of five worker nodes in the rack, and a single 20 x 1 Gb/s linecard). The additional connections can either be to the GENI layer-2 network, or to the routable Internet, or to another network. The switch can also be configured with 10 Gb/s or above optical connections.

Virtual Local Area Networks (VLANs) are created on the rack’s switch to instantiate links requested in users’ Resource Specifications (RSpecs), and to provide isolation for each experiment’s traffic. A small number of the available 4096 VLAN numbers are reserved for control purposes, leaving most available to experiments. Using 802.1q tagging, each physical interface has the ability, if requested, to act as many virtual interfaces, making use of many VLANs. With the exception of stitching, user traffic within racks is segregated by VLAN. The InstaGENI rack’s switch is capable of providing full line-rate service to all ports simultaneously, avoiding artifacts due to interference between experiments.

OpenFlow is separately enabled or disabled for individual VLANs. VLANs requested by users default to having OpenFlow disabled. Users are able to request OpenFlow for particular VLANs; in this case, the OpenFlow controller for the VLAN is pointed to the address supplied by the user. Some shared OpenFlow VLANs are available (such as those with access to other shared resources such as Wide-Area Network (WAN) connectivity), and requests for slices of those VLANs are regulated by FOAM [7] and sliced via FlowVisor [65]. A single switch is shared for experiment traffic and control traffic, so experimenters are able to enable OpenFlow only on the VLANs that are part of their slices; OpenFlow is not enabled on VLANs used for control traffic or connections to campus or wide-area networks.

Network ports that are not currently in use for slices or control purposes are disabled in order to reduce the possibility for traffic to inadvertently enter or exit the network.

ProtoGENI virtual containers also permit the experimental network interfaces to be virtualized so that links and LANs may be formed with other containers or physical nodes in the local rack. This technique is accomplished via the use of tagged VLANs and virtual network interfaces inside the containers. Note that ProtoGENI does not permit a particular physical interface to be oversubscribed; users must specify how much bandwidth they intend to use; once all of the bandwidth is allocated, that physical interface is no longer available for new containers. Bandwidth limits are enforced through the use of traffic shaping rules in the outer host environment. In addition to VLANs between nodes, ProtoGENI also supports Generic Routing Encapsulation (GRE) tunnels [29] that can be used to form point-to-point links between nodes residing in different locations.

The initial nationwide InstaGENI Network is shown in Fig. 14.3.Footnote 2 The InstaGENI Network Architecture was driven by two principal considerations: the need to offer layer-2 services across the wide area and the need to permit deep programmability and end-to-end OpenFlow capability across the entire Mesoscale.

Fig. 14.3
figure 3

The InstaGENI external network

The InstaGENI design required close consideration of three major classes of WAN connectivity. One class of WAN resources consists of those that constitute core foundation infrastructure, including those that support management planes, control planes and data planes beyond the support provided by the local rack network, which includes support provided by the local site, campus, regional, national, and international networks. A second class of WAN connectivity consists of the actual management plane, control plane, and data plane channels, which will be supported by the core infrastructure.

A third class of connectivity consists of the networks that are created, managed and controlled by experimenters.

One set of resources that constitutes part of the core foundation infrastructure includes those that support management planes, control planes and data planes provided by the local rack network, the site network, the campus network, the regional network and national and international networks. The InstaGENI design is based on an assumption that, in general, the WAN core foundation resources will be fairly similar and static. Also, the rack based interface for these capabilities will be fairly uniform. However, there are multiple options for the design and implementation of individual campus network resources, including those that enable resource segmentation, a critically important attribute especially for research experimentation, which requires reproducibility. Consequently, the basic connections to the InstaGENI racks are customized for individual sites. Also, considerations vary depending on local ownership and operations procedures. For example, some university research groups and CS departments manage their own networks, while others rely on division level or integrated campus-wide networks. In any case, the InstaGENI design is sufficiently flexible to accommodate all major potential options.

5 Implementation of InstaGENI

The InstaGENI hardware design was driven by three principal considerations. First, the goal was to support the software architecture described above; InstaGENI is fundamentally characterized by code, not boxes. Second, commodity off-the-shelf hardware was to be used, for reasons of maintenance and operations. When something broke, it had to be easy to fix or replace. Finally, a large collection of inexpensive racks is preferred to a smaller collection of more capable racks. It is relatively easy to add capacity—more worker nodes, more switch ports—to a modest rack, and somewhat more difficult to install a new rack. Therefore, our goal was to get a broad footprint of modest, but usable racks early, and make them more capable later. This strategy turned out to have unexpected benefits: lots of blank area in the rack made it possible for experimenters to install specialized hardware in individual racks, and use GENI to make that hardware available to experimenters nationwide.

Fig. 14.4
figure 4

Hardware diagram of the base InstaGENI rack

The base design of the InstaGENI rack is shown in Fig. 14.4. It consists of five experiment nodes, one control node, an OpenFlow switch for internal routing and data plane connectivity to the Mesoscale infrastructure and thence to the Internet, and a small control plane switch/router for control plane communication with the Internet.

Fig. 14.5
figure 5

Software diagram of the base InstaGENI rack

Figure 14.5 shows how the software architecture shown in Fig. 14.1 maps onto the physical racks. The embedded rack controller and user storage are on the control node. Each worker node can use any ProtoGENI image, including (but not limited to) XEN VMs, the GENI Experiment Engine node image, designed to host GEE slivers, or physical nodes running the image of the experimenter’s choice. Data plane connectivity is through GENI VLANs on the Internet-2 Advanced Layer-2 Service (AL2S) or the Mesoscale, and control connections to external embedded managers, such as the GEE, to ProtoGENI Central, and to the GENI Meta-Operations Center for logging and monitoring, are through the control connection.

The InstaGENI rack has been designed for expandability, while providing standalone functionality capable of running most ProtoGENI experiment or an exceptionally capable PlanetLab [10] site. As with all designs, the result is a compromise, yet with much potential for revision and expansion.

The base computation node is the HP ProLiant DL360e Gen8, which is used for both experiment and control nodes. The control node features a six-core, 1.9GHz processor. The experiment node has dual 2.10 GHz eight-core processors. InstaGENI therefore has 80 experimental cores/rack and six cores in the control node. Experiment nodes are configured for images and transient storage: hence disk (1 TB/node) is relatively light. Permanent user and image storage is on the control node, with features 4 TB/disk in a RAID array. Nodes in InstaGENI racks have local disk rather than a Storage-Area Network (SAN): this configuration enables isolation, when required, by allocating an entire physical node to a single slice, avoiding contention for disk or controller resources.

The experiment nodes and switch have been designed for highly flexible, rather than high-performance networking. The experiment node features four 1 Gb/s ports total with TCP/IP Offload Engine. The control node is configured with 12 GB of memory. The experiment node has been specified for 48 GB of memory. The nodes may be extended by the use of two Peripheral Component Interconnect (PCI) express cards.

The primary network device shipped with the InstaGENI rack is the HP ProCurve (now E-Series) 5406 switch with v2 linecards. The 5406 offers rich OpenFlow matching capabilities.

The control connection for the wide area goes through the HP 2610-24 switch. The ProCurve 2610-24 provides 24-port 10/100Base-TX connectivity, and includes two dual personality (RJ-45 10/100/1000 or Small Form-factor Pluggable (SFP)) slots for Gigabit uplink connectivity. An optional redundant external power supply also is available to provide redundancy in the event of a power supply failure. The 2610 switch also carries the six (one control Node and five experiment nodes) iLO connections.

Remote monitoring and management is an especially important InstaGENI consideration. Hence all nodes, experiment and control, ship with HP integrated Lights Out Advanced remote management, version 4 (iLO4). HP iLO is a separately-powered card with separate network connection in the server chassis, with the ability to reboot, setup the server and do power and thermal optimization, and enable embedded health monitoring.

iLO connections to both control and data nodes go through the small 2610 control switch, as does the Control Plane connection into the boss node switch from the external world. The three ProtoGENI/FOAM control connections from the boss node are wired into the HP 5406 rack switch, as are the 15 (3/node x 5 experiment nodes) experiment node data connections and 5 (1/node x 5 nodes) experiment node control connections. Finally, the data plane egress link to the wide area is hosted on the 5406 rack switch.

6 Deployment of InstaGENI

A critical design objective for the InstaGENI racks was that InstaGENI live up to its portmanteau: the InstaGENI racks had to be up and running out of the box, and instantly connected through appropriate communication services to GENI. This is the cornerstone of the InstaGENI design: the goal of InstaGENI was to have a working, at-scale GENI network up and running and ready for experiments, with each node up and on the network within a couple of weeks from hardware delivery. We felt that this was achievable: PlanetLab went from 0 to 300 nodes in its first 2 years [60].

Careful preparation of both the racks and the sites were required for this. We began in the proposal stage: each prospective site filled out an extensive survey and questionnaire before the proposal went into the GENI Project Office, which determined both physical and cyber characteristics of the sites: proposed physical location of the rack, needs regarding power supply, details of incoming connectivity including available VLANs, availability of routable IP addresses, details of boundary and firewall configuration, etc. Key personnel for both technical and administrative support were identified and briefed on the installation needs for the racks. These surveys were renewed as deployment approached.

7 Operations and Maintenance

Software installation for the ProtoGENI control nodes is accomplished through virtual machine images. The local administrator first configures the iLO on the control node (e.g. its IP address, default router, etc.) Generic control nodes images, to run inside the Xen VM, are provided by the ProtoGENI team—in particular, ProtoGENI has developed software that allows the local administrators to fill in a configuration file describing the local network environment (such as IP addresses, routers, DNS severs, etc.), and to generate from that a set of virtual machine images customized to the rack. This functionality can also be used to move an InstaGENI rack to another part of the hosting institution’s network, if needed. A default Xen image running the FOAM software is also supplied.

Racks arrive at sites pre-wired. The ProtoGENI stack tests connections by enabling all switch ports, booting all nodes, and sending specially crafted packets on all interfaces. Learned MAC addresses are harvested from the switches and compared against the specified wiring list. This detects mis-wired ports and potentially failed interfaces, so that they can be corrected. The ongoing health of the network is monitored by running Emulab’s “linktest” program after each slice is created; this program tests the actual configured topology against the experimenter’s requests.

InstaGENI racks’ control software is updated frequently and in accordance with an announced schedule to keep up to date on GENI functionality and security patches; the “frequent update” strategy has proved effective on the Utah ProtoGENI site, which rarely suffers downtime due to software updates. All updates are tested first on the InstaGENI rack at the University of Utah for a minimum of 1 week before being rolled out to other sites. All racks receive at least 1 week of warning before software updates, and updates may be postponed in the face of upcoming paper deadlines, course projects, and other high-priority events. Most updates involve no disruption of running slices; updates that do carry this risk are announced ahead of time to the GENI community and scheduled for specific (off-peak) times.

A snapshot of the control VM is taken before upgrades are undertaken, so that in the case of update problems, the control node can be returned to a working state quickly. Backwards compatibility with the two previous versions of the GENI APIs is preserved at all times to avoid the need for flag days.

Most administration of InstaGENI racks is undertaken through the Emulab/ProtoGENI web interface and via command line tools on the control node; physical access to the racks for administration is therefore not required.

All nodes in InstaGENI racks, including control nodes, include HP’s iLO technology, which includes power control and console access. This allows both InstaGENI and local personnel to administer the nodes without requiring a physical presence. iLO console capabilities are used for diagnosing faulty nodes (iLO continues to function in the presence of many type of hardware failures) and during the upgrade of control software. Access to iLO on experiment nodes is accomplished through the control nodes so that public IP addresses are not required. iLO on the control node itself requires a public address; this enables remote administration and minimizes downtime in the case of software failures (and many types of hardware failures) on the control node.

Full logs of resource allocations, including information about slices and users who requested them, are available to the local administrators via a web interface. The raw data used in this interface are stored in a database on the control node, should local administrators wish to process this information in their own way. Using existing ProtoGENI APIs, the GMOC are given credentials for each rack that allow them to poll the rack for slice and sliver allocation status. InstaGENI racks use the logging service that is provided by the GENI-wide clearinghouse.

InstaGENI Racks follow the Emergency Stop procedures outlined by the GMOC in [35], and will follow newer versions as they become available.

Emergency stop of slices that are suspected of misbehavior are provided through three interfaces:

  • A web interface for rack administrators for cases in which they are made aware of misbehavior

  • A GENI API call for use by the owner of a slice or the leader of a project, for cases when the slice may be compromised and used for purposes not intended by the experimenters.

  • A GENI API call for use by the GMOC, for cases when misbehavior is GENI-wide, is reported through GMOC channels, or occurs when local administrators are not reachable

The GMOC is given a credential for each InstaGENI rack giving them full privileges to execute emergency shutdown on any slice. The GMOC is the primary point of contact for any detected problematic behavior that occurs after hours or on weekends or holidays. Three levels of emergency stop are provided:

  • Cutting off the experiment from the control plane, but not the data plane: this is appropriate for cases in which as slice is having unwanted interactions with the outside Internet, but there is believed to be state within the slice worth preserving. This particular level of emergency shutdown is for cases where the unwanted communication is on the control plane, e.g. scanning/attacking external networks.

  • Powering off the affected nodes and/or shutting down the affected virtual machine

  • Deletion of the slice and all associated slivers

When emergency stop is invoked on a slice, the owner of the slice is prevented from manipulating it further, and administrative action is required to complete the shutdown. This property can be used to preserve forensic evidence.

8 Experience and Status

The InstaGENI deployment is now essentially complete in the Mesoscale infrastructure, and a full map of the deployed racks can be seen in Fig. 14.6. The racks in general are the minimum configured with five worker nodes, though the Utah Downtown Data Center rack has over 30 worker nodes. Since the basic software stack is ProtoGENI, and since the Emulab stack from which it has descended has managed clusters up to 1000 nodes, we are confident that the basic architecture of the InstaGENI rack can scale to 1000 nodes and above.

Fig. 14.6
figure 6

The InstaGENI deployment

The primary obstacle to installing InstaGENI racks turned out to be the varying types of infrastructure and policies at each site. Different sites had differing types and topologies of connectivity (both to the public Internet and to the Mesoscale), different types of firewalls, different policies regarding connectivity to outside and use of resources by users external to campus, different methods of assigning public IP addresses, etc. While these did not affect the rack itself, they did affect things such as how its external connectivity had to be configured, what domain name it could be under, etc. and often involved delays while network administrators had to approve firewall bypasses, configure campus and/or regional networks, etc. Once these issues were resolved, installation of the rack software itself typically took a few days.

The InstaGENI racks are in constant use by GENI experimenters. Typical usage will have approximately 500 Xen VMs, 300 OpenVZ containers, and 30 bare-metal nodes in constant operation. This still represents a somewhat light load on the overall system; our experiments indicate that we could accommodate 2000 VMs or OpenVZ containers simultaneously with 60 bare-metal nodes in the racks themselves. Currently, about 2500 individual GENI users are creating roughly 4000 slivers monthly on the InstaGENI racks, and using them in a wide variety of experiments.

The nesting strategy has proved to be successful as well, with PlanetLab on InstaGENI and its own nested Platform-as-a-Service offering, the GENI Experiment Engine, maintaining 24/7 service at www.gee-project.net.

9 Related Work

The most prominent related work is our sister project at the GENI Project, ExoGENI, described in an adjacent chapter in this volume [4]. ExoGENI is aimed at a different design point from InstaGENI: ExoGENI supports slivers only as VM’s and containers, rather than supporting the allocation of bare-metal nodes as well. In addition, ExoGENI’s basic rack is somewhat richer, offering ten worker nodes rather than five, 10 Gb/s uplinks in every rack, and incorporated a storage-area network.

Each ExoGENI rack thus more resembles a conventional OpenStack-based cloud rather than the meta-cloud that forms the primary design motivation of InstaGENI. This incorporates some tradeoffs: on the one hand, InstaGENI enables some services and experiments that would be more difficult to do on ExoGENI. Conversely, ExoGENI’s design permits it to easily and efficiently allocate resources for conventional cloud services and applications.

In addition to the GENI infrastructures, several other research clouds have adopted models similar to GENI and InstaGENI. BonFIRE [42] in the EU’s FIRE project offers a distributed cloud with six sites, and on-request access to a substantial site at INRIA. Like InstaGENI, BonFIRE offers physical node access. Japan’s V-Node [53] project under the umbrella of JGN-X, using a rack similar in many ways to the InstaGENI rack but with a different control framework. Canada’s Smart Applications over Virtual Infrastructure (SAVI) [5, 28, 43, 44, 47] project operates a distributed cloud with similarities to both ExoGENI and InstaGENI. Like ExoGENI, SAVI is a VM-only infrastructure based on OpenStack. Like InstaGENI (and, we believe, like ExoGENI as well), each rack (or “site” in the SAVI terminology) can operate as a standalone cloud. SAVI is described in a subsequent chapter in this book [47]. Koren [45] is a Korean testbed primarily focused on multi-site OpenFlow experimentation, but a VM creation and orchestration capability exists at each of the six Koren sites. Ofelia [50, 56] is an EU testbed, similarly focused on multi-site OpenFlow, with VMs available at each site. The G-Lab architecture [52, 64] featured a similar distributed cloud architecture to InstaGENI, relying more heavily on the central node in Kaiserslautern than InstaGENI does: in G-Lab, boot management and resource management was done at the central node, and the local boss—HeadNode in G-Lab terminology—was mostly focused on housekeeping and low-level node management activities. NorNet [37, 55] is a PlanetLab-like infrastructure which consists of two tiers of service. NorNet Core is a twenty-site testbed, primarily of sites in Norway, each multi-homed to several network providers. NorNet Edge consists of several hundreds of smaller nodes that are connected to all mobile broadband providers in Norway. FITS [30, 51] is a joint Brazilian-European project with more than 20 sites across the two continents.

Under the Federation API all three of the FIRE, V-Node, and SAVI infrastructures should be fully interoperable with the GENI racks, creating the possibility of a instantiating a worldwide slice across these infrastructures. Indeed, full integration of GENI and SAVI has already been demonstrated this year.

10 Conclusions, Extensions and Further Work

The initial goal of the InstaGENI project was to provide a workhorse cluster design for the GENI Mesoscale project. In that, it has succeeded, as demonstrated by its usage. Installation of the racks is complete, and ongoing maintenance and troubleshooting have proven to be smooth. The software stack is stable and largely trouble-free. Nesting control frameworks has been a successful experiment, with InstaGENI PlanetLab and the GENI Experiment Engine running seamlessly under the basic ProtoGENI infrastructure.

Difficulties have largely been site-related, and specifically related to site network policies. An ongoing issue is the paucity of public IPv4 addresses at the various InstaGENI sites. Our hosts, primarily Universities, have often been reluctant to allocate IPv4 addresses. We require only the bare minimum number of addresses to give the InstaGENI maintainers, GPO staff, and GENI experimenters control plane access to the boss node on the rack. In the ideal case, each sliver could have two routable v4 addresses—one for the control plane interface, so an experimenter could directly ssh into his sliver and host public-facing services, and one for the data plane, so that localized services could be offered from each sliver. Various strategies have been employed to get around the lack of v4 addresses, primarily using application-level port-sharing and multiplexing. The GENI Experiment Engine is planning to do this with a shared http reverse proxy, to permit GEE slivers to offer public services. At the moment, the GEE offers routable ports to individual slivers on a per-request basis.

The v4 address shortage is an area that needs significant attention over the coming months and years. While the primary networking needs of GENI and other experimental testbeds can largely be met with private networks—where the private /8, combined with the VLAN address space, is more than sufficient—services offered to end-users require access to the routable Internet, since users typically don’t have access to the private GENI network. There are many examples of such services: end-system multicast trees [22], wide-area stores [54, 74], virtual shared worlds [70], Content Distribution Networks [32, 71], and collaborative visualization systems [15, 16] to name five. Use of centralized servers in places where v4’s are plentiful is not the answer: the whole point of putting these services on GENI, instead of, say, EC2, is to offer low-latency access to local end-users. Wholesale adoption of IPv6 would solve the problem, of course, as would the availability of more advanced network architectures such as content-centric networking [41, 58, 73]. However, it’s important to note that the reason we have this problem is we’re trying to offer services to people over the routable Internet, which we don’t control. The problem with v6 is not that we’d have any trouble implementing or enforcing it; it’s that an end-user, transiting multiple academic and commercial networks to reach his local GENI node, must be able to do so reliably. Thus, we need v6 implemented by every network, and this presents some challenges.

A second strategy is to canonize the port-sharing work that PlanetLab pioneered, using a combination of OpenFlow switching and unused header bits to run realtime NAT transparently to the external Internet.

Use and accommodation of OpenFlow, and specifically restricted forms of OpenFlow, is an area of ongoing investigation. There are a large number of use cases where developers want to direct routing at a high level, but don’t need to access the full machinery of an OpenFlow controller. A restricted, high-level, easy-to-use northbound API to OpenFlow (and, more generally, the network allocation substrate) is under active investigation.

We are actively investigating both expanding the capabilities of the current GENI rack, by adding more worker nodes and by adding heterogeneous resources, in concert with related projects such as CloudLab. We are further working with partners such as US Ignite to investigate applications of GENI racks in the domain sciences, smart cities, and distributed education arenas. The fundamental purpose of the InstaGENI cluster is to permit people to create virtual machines anywhere, to reduce data, reduce latency to the end user, add application resiliency in the face of network or physical outage, or increase bandwidth to a sensor or application. In our view, the set of such applications is very large.