From 36b66f69e698b33c44bd27f5d909435e5be4d084 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Tue, 25 Mar 2025 14:05:00 -0700 Subject: [PATCH 01/23] update onf info --- preface.rst | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/preface.rst b/preface.rst index a3d4f56..94ec07c 100644 --- a/preface.rst +++ b/preface.rst @@ -79,12 +79,11 @@ can be assembled to help manage cloud platforms and scalable applications built on those platforms. That's also the bad news. With several dozen cloud-related projects available at open source consortia like the Linux Foundation, Cloud Native Computing -Foundation, Apache Foundation, and Open Networking Foundation, -navigating the project space is one of the biggest challenges we faced -in putting together a cloud management platform. This is in large part -because these projects are competing for mindshare, with both -significant overlap in the functionality they offer and extraneous -dependencies on each other. +Foundation, and Apache Foundation, navigating the project space is one +of the biggest challenges we faced in putting together a cloud +management platform. This is in large part because these projects are +competing for mindshare, with both significant overlap in the +functionality they offer and extraneous dependencies on each other. One way to read this book is as a guided tour of the open source landscape for cloud control and management. And in that spirit, we do @@ -112,21 +111,22 @@ foundational. Acknowledgements ------------------ -The software described in this book is due to the hard work of the ONF -engineering team and the open source community that works with -them. We acknowledge their contributions, with a special thank-you to -Hyunsun Moon, Sean Condon, and HungWei Chiu for their significant +The software described in this book is due to the hard work of the +Open Networking Foundation (ONF) engineering team and the open source +community that worked with them to build the *Aether* edge cloud. We +acknowledge their contributions, with a special thank-you to Hyunsun +Moon, Sean Condon, and HungWei Chiu for their significant contributions to Aether's control and management platform, and to Oguz -Sunay for his influence on its overall design. Suchitra Vemuri's +Sunay for his influence on Aether's overall design. Suchitra Vemuri's insights into testing and quality assurance were also invaluable. -This book is still very much a work-in-progress, and we will happily -acknowledge everyone that provides feedback. Please send us your -comments using the `Issues Link -`__. Also see the -`Wiki `__ for the TODO -list we're currently working on. +The ONF is no longer active, but Aether continues as an open source +project of the Linux Foundation. Visit https://aetherproject.org to +learn about the ongoing project. We will also happily accept feedback +to this book. Please send us your comments using the `Issues Link +`__, or submit a Pull +Request with suggested changes. | Larry Peterson, Scott Baker, Andy Bavier, Zack Williams, and Bruce Davie -| June 2022 +| April 2025 From 2d83b67501cdc82ef4cfed25ec83d070bf4001b9 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Wed, 2 Apr 2025 14:55:12 -0700 Subject: [PATCH 02/23] clean up role/playbook --- provision.rst | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/provision.rst b/provision.rst index 3993c7d..4dd438a 100644 --- a/provision.rst +++ b/provision.rst @@ -323,7 +323,7 @@ deployment currently include: * Configure the Management Server so it boots from a provided USB key. -* Run Ansible roles and playbooks needed to complete configuration +* Run Ansible playbooks needed to complete configuration onto the Management Server. * Configure the Compute Servers so they boot from the Management @@ -364,14 +364,11 @@ parameters that NetBox maintains. The general idea is as follows. For every network service (e.g., DNS, DHCP, iPXE, Nginx) and every per-device subsystem (e.g., network -interfaces, Docker) that needs to be configured, there is a corresponding -Ansible role and playbook.\ [#]_ These configurations are applied to the -Management Server during the manual configuration stage summarized above, once -the management network is online. - -.. [#] We gloss over the distinction between *roles* and *playbooks* - in Ansible, and focus on the general idea of there being a - *script* that runs with a set of input parameters. +interfaces, Docker) that needs to be configured, there is a +corresponding Ansible role (set of related playbooks). These +configurations are applied to the Management Server during the manual +configuration stage summarized above, once the management network is +online. The Ansible playbooks install and configure the network services on the Management Server. The role of DNS and DHCP are obvious. As for iPXE and Nginx, From 5c81039266c5002468f7d6dc897ec0ee9b8432c9 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Wed, 2 Apr 2025 15:36:27 -0700 Subject: [PATCH 03/23] add DIY sidebar --- lifecycle.rst | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/lifecycle.rst b/lifecycle.rst index 4638f93..68d1dab 100644 --- a/lifecycle.rst +++ b/lifecycle.rst @@ -525,6 +525,34 @@ Section 4.5. These two concerns are at the heart of realizing a sound approach to Continuous Integration. The tooling—in our case Jenkins—is just a means to that end. + +.. sidebar:: Balancing DIY Tools with Cloud Services + + *We use Jenkins as our CI tool, but another popular option is + GitHub Actions. This is a relatively new feature of GitHub (the + cloud service, not the open source software package) that nicely + integrates the code repo with a set of workflows that can be + exectued every time a patch is submitted. In this setting, a + workflow is roughly analogous to a Groovy pipeline.* + + *GitHub actions are especially convenient for open source projects + because they include spinning up containers in which each workflow + runs (for free, but with limits). A mixed strategy would be to run + simple GitHub Actions for unit and smoke tests when code is + checked in, but then use Jenkins to manage complex integration + tests that require additional testing resources (e.g., a full QA + cluster).* + + *GitHub Actions are not unique. Many of the open source options + described in this book are paired with a cloud service + counterpart. The key consideration is how much you want to depend + on a service someone provides versus depend entirely on services + you stand up and manage yourself. The former can be easier, but + comes with the risk that the provider changes (or discontinues) + their service over time. The same can be said of open source + projects, but having access to source code does give you more + control.* + 4.4 Continuous Deployment ------------------------- From a399e52071e75f377e8e4308df30ecf8d6a03eec Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Thu, 3 Apr 2025 16:46:27 -0700 Subject: [PATCH 04/23] we use -> Aether uses --- control.rst | 2 +- dict.txt | 3 +++ intro.rst | 8 ++++---- lifecycle.rst | 48 ++++++++++++++++++++++++------------------------ provision.rst | 8 ++++---- 5 files changed, 36 insertions(+), 33 deletions(-) diff --git a/control.rst b/control.rst index 75821b6..a50ba49 100644 --- a/control.rst +++ b/control.rst @@ -115,7 +115,7 @@ Central to this role is the requirement that Runtime Control be able to represent a set of abstract objects, which is to say, it implements a *data model*. While there are several viable options for the specification language used to represent the data model, for Runtime -Control we use YANG. This is for three reasons. First, YANG is a rich +Control Aether uses YANG. This is for three reasons. First, YANG is a rich language for data modeling, with support for strong validation of the data stored in the models and the ability to define relations between objects. Second, it is agnostic as to how the data is stored (i.e., diff --git a/dict.txt b/dict.txt index f2704bc..a60ca9e 100644 --- a/dict.txt +++ b/dict.txt @@ -55,6 +55,7 @@ POD PODs PaaS Ph +Plugable Proxmox QoS RKE @@ -102,6 +103,7 @@ disaggregate disaggregated disaggregation downlink +eBPF eNB eNBs evolvable @@ -112,6 +114,7 @@ frontend gNMI gNOI gNodeB +gNodeBs gRPC heatmap hoc diff --git a/intro.rst b/intro.rst index 0e04589..503a2b8 100644 --- a/intro.rst +++ b/intro.rst @@ -482,10 +482,10 @@ software components, which we describe next. Collectively, all the hardware and software components shown in the figure form the *platform*. Where we draw the line between what's *in the platform* and what runs *on top of the platform*, and why it is important, will -become clear in later chapters, but the summary is that different -mechanisms will be responsible for (a) bringing up the platform and -prepping it to host workloads, and (b) managing the various workloads -that need to be deployed on that platform. +become clear in later chapters. The summary is that one mechanism is +responsible for bringing up the platform and prepping it to host +workloads, and a different mechanism is responsible for managing the +various workloads that are deployed on that platform. 1.3.2 Software Building Blocks diff --git a/lifecycle.rst b/lifecycle.rst index 68d1dab..7e77230 100644 --- a/lifecycle.rst +++ b/lifecycle.rst @@ -514,29 +514,18 @@ patch set. .. literalinclude:: code/trigger-event.yaml -The important takeaway from this discussion is that there is no -single or global CI job. There are many per-component jobs that -independently publish deployable artifacts when conditions dictate. -Those conditions include: (1) the component passes the required tests, -and (2) the component's version indicates whether or not a new -artifact is warranted. We have already talked about the testing -strategy in Section 4.2 and we describe the versioning strategy in -Section 4.5. These two concerns are at the heart of realizing a sound -approach to Continuous Integration. The tooling—in our case Jenkins—is -just a means to that end. - .. sidebar:: Balancing DIY Tools with Cloud Services - *We use Jenkins as our CI tool, but another popular option is + *Aether uses Jenkins as our CI tool, but another popular option is GitHub Actions. This is a relatively new feature of GitHub (the - cloud service, not the open source software package) that nicely - integrates the code repo with a set of workflows that can be - exectued every time a patch is submitted. In this setting, a - workflow is roughly analogous to a Groovy pipeline.* + cloud service, not the software package) that nicely integrates + the code repo with a set of workflows that can be exectued every + time a patch is submitted. In this setting, a workflow is roughly + analogous to a Groovy pipeline.* *GitHub actions are especially convenient for open source projects - because they include spinning up containers in which each workflow + because they include spinning up a container in which the workflow runs (for free, but with limits). A mixed strategy would be to run simple GitHub Actions for unit and smoke tests when code is checked in, but then use Jenkins to manage complex integration @@ -546,12 +535,23 @@ just a means to that end. *GitHub Actions are not unique. Many of the open source options described in this book are paired with a cloud service counterpart. The key consideration is how much you want to depend - on a service someone provides versus depend entirely on services - you stand up and manage yourself. The former can be easier, but - comes with the risk that the provider changes (or discontinues) - their service over time. The same can be said of open source - projects, but having access to source code does give you more - control.* + on a service someone else provides versus depending entirely on + services you install and manage yourself. The former can be + easier, but comes with the risk that the provider changes (or + discontinues) the service. The same can be said of open source + projects, but having access to source code gives you more + control over your fate.* + +The important takeaway from this discussion is that there is no +single or global CI job. There are many per-component jobs that +independently publish deployable artifacts when conditions dictate. +Those conditions include: (1) the component passes the required tests, +and (2) the component's version indicates whether or not a new +artifact is warranted. We have already talked about the testing +strategy in Section 4.2 and we describe the versioning strategy in +Section 4.5. These two concerns are at the heart of realizing a sound +approach to Continuous Integration. The tooling—in our case Jenkins—is +just a means to that end. 4.4 Continuous Deployment ------------------------- @@ -564,7 +564,7 @@ of microservices (sometimes called applications) that are to be deployed on that infrastructure. We already know about Terraform from Chapter 3: it's the agent that actually "acts on" the infrastructure-related forms. For its counterpart on the application -side we use an open source project called Fleet. +side Aether uses an open source project called Fleet. :numref:`Figure %s ` shows the big picture we are working towards. Notice that both Fleet and Terraform depend on the diff --git a/provision.rst b/provision.rst index 4dd438a..f10c120 100644 --- a/provision.rst +++ b/provision.rst @@ -28,7 +28,7 @@ infrastructure, which has inspired an approach known as *Configuration-as-Code* concept introduced in Chapter 2. The general idea is to document, in a declarative format that can be "executed", exactly what our infrastructure is to look like; how it is to be -configured. We use Terraform as our open source approach to +configured. Aether uses Terraform as its approach to Infrastructure-as-Code. When a cloud is built from a combination of virtual and physical @@ -37,7 +37,7 @@ seamless way to accommodate both. To this end, our approach is to first overlay a *logical structure* on top of hardware resources, making them roughly equivalent to the virtual resources we get from a commercial cloud provider. This results in a hybrid scenario similar -to the one shown in :numref:`Figure %s `. We use NetBox as +to the one shown in :numref:`Figure %s `. NetBox is our open source solution for layering this logical structure on top of physical hardware. NetBox also helps us address the requirement of tracking physical inventory. @@ -216,7 +216,7 @@ purposes: There are other edge prefixes used by Kubernetes, but they do not need to be created in NetBox. Note that ``qsfp0`` and ``qsfp1`` in this example denote transceiver ports connecting the switching fabric, -where *QSFP* stands for Quad (4-channel) Small Form-factor Pluggable. +where *QSFP* stands for Quad (4-channel) Small Form-factor Plugable. With this site-wide information recorded, the next step is to install and document each *Device*. This includes entering a ````, @@ -543,7 +543,7 @@ some running at the edges on bare-metal and some instantiated in GCP) are to be instantiated, and how each is to be configured—and then automate the task of making calls against the programmatic API to make it so. This is the essence of Infrastructure-as-Code, and as we've -already said, we use Terraform as our open source example. +already said, Terraform is our open source example. Since Terraform specifications are declarative, the best way to understand them is to walk through a specific example. In doing so, From fd92539f0884a833a355aa21fe6a38a67c91ea03 Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Wed, 9 Apr 2025 17:30:19 +1000 Subject: [PATCH 05/23] more common spelling of Pluggable --- provision.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/provision.rst b/provision.rst index f10c120..059edde 100644 --- a/provision.rst +++ b/provision.rst @@ -216,7 +216,7 @@ purposes: There are other edge prefixes used by Kubernetes, but they do not need to be created in NetBox. Note that ``qsfp0`` and ``qsfp1`` in this example denote transceiver ports connecting the switching fabric, -where *QSFP* stands for Quad (4-channel) Small Form-factor Plugable. +where *QSFP* stands for Quad (4-channel) Small Form-factor Pluggable. With this site-wide information recorded, the next step is to install and document each *Device*. This includes entering a ````, From 43a71404c2cd7bc64fb9cd989be08af3da9b6eed Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Wed, 9 Apr 2025 17:44:09 +1000 Subject: [PATCH 06/23] minor edits --- preface.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/preface.rst b/preface.rst index 94ec07c..ba83d9a 100644 --- a/preface.rst +++ b/preface.rst @@ -78,12 +78,12 @@ The good news is that there is a wealth of open source components that can be assembled to help manage cloud platforms and scalable applications built on those platforms. That's also the bad news. With several dozen cloud-related projects available at open source -consortia like the Linux Foundation, Cloud Native Computing +consortia such as the Linux Foundation, Cloud Native Computing Foundation, and Apache Foundation, navigating the project space is one of the biggest challenges we faced in putting together a cloud management platform. This is in large part because these projects are competing for mindshare, with both significant overlap in the -functionality they offer and extraneous dependencies on each other. +functionality they offer and dependencies on each other. One way to read this book is as a guided tour of the open source landscape for cloud control and management. And in that spirit, we do @@ -93,7 +93,7 @@ provide, but instead include links to project-specific documentation include snippets of code from those projects, but these examples are chosen to help solidify the main points we're trying to make about the management platform as a whole; they should not be interpreted as an -attempt to document the inner-working of the individual projects. Our +attempt to document the inner working of the individual projects. Our goal is to explain how the various puzzle pieces fit together to build an end-to-end management system, and in doing so, identify both various tools that help and the hard problems that no amount of From 95640f117f4859213684d7b5f021bb102dedffa6 Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Tue, 15 Apr 2025 17:44:30 +1000 Subject: [PATCH 07/23] minor edits --- intro.rst | 25 +++++++++++++------------ preface.rst | 2 +- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/intro.rst b/intro.rst index 503a2b8..9888b5f 100644 --- a/intro.rst +++ b/intro.rst @@ -111,7 +111,7 @@ because each of these three domains brings its own conventions and terminology to the table. But understanding how these three stakeholders approach operationalization gives us a broader perspective on the problem. We return to the confluence of enterprise, -cloud, access technologies later in this chapter, but we start by +cloud, and access technologies later in this chapter, but we start by addressing the terminology challenge. .. _reading_aether: @@ -232,8 +232,9 @@ terminology. process and Operational requirements silos, balancing feature velocity against system reliability. As a practice, it leverages CI/CD methods and is typically associated with container-based - (also known as *cloud native*) systems, as typified by *Site - Reliability Engineering (SRE)* practiced by cloud providers like + (also known as *cloud native*) systems. There is some overlap + between DevOps and *Site + Reliability Engineering (SRE)* as practiced by cloud providers such as Google. * **In-Service Software Upgrade (ISSU):** A requirement that a @@ -374,10 +375,10 @@ manageable: * Zero-Touch Provisioning is more tractable because the hardware is commodity, and hence, (nearly) identical. This also means the vast - majority of configuration involves initiating software parameters, + majority of configuration involves initializng software parameters, which is more readily automated. -* Cloud native implies a set of best-practices for addressing many of +* Cloud native implies a set of best practices for addressing many of the FCAPS requirements, especially as they relate to availability and performance, both of which are achieved through horizontal scaling. Secure communication is also typically built into cloud RPC @@ -386,7 +387,7 @@ manageable: Another way to say this is that by rearchitecting bundled appliances and devices as horizontally scalable microservices running on commodity hardware, what used to be a set of one-off O&M problems are -now solved by widely applied best-practices from distributed systems, +now solved by widely applied best practices from distributed systems, which have in turn been codified in state-of-the-art cloud management frameworks (like Kubernetes). This leaves us with the problem of (a) provisioning commodity hardware, (b) orchestrating the container @@ -483,7 +484,7 @@ hardware and software components shown in the figure form the *platform*. Where we draw the line between what's *in the platform* and what runs *on top of the platform*, and why it is important, will become clear in later chapters. The summary is that one mechanism is -responsible for bringing up the platform and prepping it to host +responsible for bringing up the platform and preparing it to host workloads, and a different mechanism is responsible for managing the various workloads that are deployed on that platform. @@ -504,7 +505,7 @@ commodity processors in the cluster: interconnected to build applications. These are all well known and ubiquitous, and so we only summarize them -here. Links to related information for anyone that is not familiar +here. Links to related information for anyone who is not familiar with them (including excellent hands-on tutorials for the three container-related building blocks) are given below. @@ -578,7 +579,7 @@ these open building blocks can be assembled into a comprehensive cloud management platform. We describe each tool in enough detail to appreciate how all the parts fit together—providing end-to-end coverage by connecting all the dots—plus links to full documentation -for those that want to dig deeper into the details. +for those who want to dig deeper into the details. .. List: NexBox, Ansible, Netplan, Terraform, Rancher, Fleet, @@ -743,7 +744,7 @@ Cloud providers, because of the scale of the systems they build, cannot survive with operational silos, and so they introduced increasingly sophisticated cloud orchestration technologies. Kubernetes and Helm are two high-impact examples. These -cloud best-practices are now available to enterprises as well, but +cloud best practices are now available to enterprises as well, but they are often bundled as a managed service, with the cloud provider playing an ever-greater role in operating the enterprise’s services. Outsourcing portions of the IT responsibility to a cloud provider is an @@ -756,9 +757,9 @@ within the enterprise, deployed as yet another cloud service. The approach this book takes is to explore a best-of-both-worlds opportunity. It does this by walking you through the collection of subsystems, and associated management processes, required to -operationalize an on-prem cloud, and then provide on-going support for +operationalize an on-premises cloud, and then provide on-going support for that cloud and the services it hosts (including 5G connectivity). Our hope is that understanding what’s under the covers of cloud-managed services will help enterprises better share responsibility for -managing their IT infrastructure with cloud providers, and potentially +managing their IT infrastructure with cloud providers, and potentially with MNOs. diff --git a/preface.rst b/preface.rst index ba83d9a..78e0edf 100644 --- a/preface.rst +++ b/preface.rst @@ -9,7 +9,7 @@ Microsoft, Amazon and the other cloud providers do for us, and they do a perfectly good job of it. The answer, we believe, is that the cloud is becoming ubiquitous in -another way, as distributed applications increasing run not just in +another way, as distributed applications increasingly run not just in large, central datacenters but at the edge. As applications are disaggregated, the cloud is expanding from hundreds of datacenters to tens of thousands of enterprises. And while it is clear that the commodity From b83e3161c4cd5de11c8fd14af62719a456d05abc Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Fri, 18 Apr 2025 14:49:11 -0700 Subject: [PATCH 08/23] cleaned up ONF references --- authors.rst | 12 ++++++------ dict.txt | 1 + intro.rst | 6 +++--- monitor.rst | 12 ++++++------ preface.rst | 12 ++++++------ 5 files changed, 22 insertions(+), 21 deletions(-) diff --git a/authors.rst b/authors.rst index 55859dc..822eff2 100644 --- a/authors.rst +++ b/authors.rst @@ -6,12 +6,12 @@ Science, Emeritus at Princeton University, where he served as Chair from 2003-2009. His research focuses on the design, implementation, and operation of Internet-scale distributed systems, including the widely used PlanetLab and MeasurementLab platforms. He is currently -contributing to the Aether access-edge cloud project at the Open -Networking Foundation (ONF), where he serves as Chief Scientist. -Peterson is a member of the National Academy of Engineering, a Fellow -of the ACM and the IEEE, the 2010 recipient of the IEEE Kobayashi -Computer and Communication Award, and the 2013 recipient of the ACM -SIGCOMM Award. He received his Ph.D. degree from Purdue University. +contributing to the Aether access-edge cloud project at the Linux +Foundation. Peterson is a member of the National Academy of +Engineering, a Fellow of the ACM and the IEEE, the 2010 recipient of +the IEEE Kobayashi Computer and Communication Award, and the 2013 +recipient of the ACM SIGCOMM Award. He received his Ph.D. degree from +Purdue University. **Scott Baker** is a Cloud Software Architect at Intel, which he joined as part of Intel's acquisition of the Open Networking diff --git a/dict.txt b/dict.txt index a60ca9e..50c055c 100644 --- a/dict.txt +++ b/dict.txt @@ -56,6 +56,7 @@ PODs PaaS Ph Plugable +Pluggable Proxmox QoS RKE diff --git a/intro.rst b/intro.rst index 9888b5f..9da64e5 100644 --- a/intro.rst +++ b/intro.rst @@ -71,8 +71,8 @@ like. Our approach is to focus on the fundamental problems that must be addressed—design issues that are common to all clouds—but then couple this conceptual discussion with specific engineering choices made while operationalizing a specific enterprise cloud. Our example -is Aether, an ONF project to support 5G-enabled edge clouds as a -managed service. Aether has the following properties that make it an +is Aether, an open source edge cloud that supports 5G connectivity as +a managed service. Aether has the following properties that make it an interesting use case to study: * Aether starts with bare-metal hardware (servers and switches) @@ -375,7 +375,7 @@ manageable: * Zero-Touch Provisioning is more tractable because the hardware is commodity, and hence, (nearly) identical. This also means the vast - majority of configuration involves initializng software parameters, + majority of configuration involves initializing software parameters, which is more readily automated. * Cloud native implies a set of best practices for addressing many of diff --git a/monitor.rst b/monitor.rst index 4eb27ac..257f019 100644 --- a/monitor.rst +++ b/monitor.rst @@ -447,12 +447,12 @@ foreseeable future. `__. With respect to mechanisms, Jaeger is a widely used open source -tracing tool originally developed by Uber. (Jaeger is not currently -included in Aether, but was utilized in a predecessor ONF edge cloud.) -Jaeger includes instrumentation of the runtime system for the -language(s) used to implement an application, a collector, storage, -and a query language that can be used to diagnose performance problems -and do root cause analysis. +tracing tool originally developed by Uber. (Jaeger is not included in +Aether, but was utilized in a predecessor edge cloud.) Jaeger +includes instrumentation of the runtime system for the language(s) +used to implement an application, a collector, storage, and a query +language that can be used to diagnose performance problems and do root +cause analysis. 6.4 Integrated Dashboards ------------------------- diff --git a/preface.rst b/preface.rst index 78e0edf..1697aa2 100644 --- a/preface.rst +++ b/preface.rst @@ -111,13 +111,13 @@ foundational. Acknowledgements ------------------ -The software described in this book is due to the hard work of the -Open Networking Foundation (ONF) engineering team and the open source -community that worked with them to build the *Aether* edge cloud. We -acknowledge their contributions, with a special thank-you to Hyunsun -Moon, Sean Condon, and HungWei Chiu for their significant +*Aether*, the example edge cloud this book uses to illustrate how to +operationalize a cloud, was built by the Open Networking Foundation +(ONF) engineering team and the open source community that worked with +them. We acknowledge their contributions, with a special thank-you to +Hyunsun Moon, Sean Condon, and HungWei Chiu for their significant contributions to Aether's control and management platform, and to Oguz -Sunay for his influence on Aether's overall design. Suchitra Vemuri's +Sunay for his influence on Aether's overall design. Suchitra Vemuri's insights into testing and quality assurance were also invaluable. The ONF is no longer active, but Aether continues as an open source From cf2c4540a1a1a467e12c66ddf9ed95da94f249e9 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Fri, 18 Apr 2025 15:12:16 -0700 Subject: [PATCH 09/23] scrub 'current' qualifier --- arch.rst | 2 +- monitor.rst | 2 +- provision.rst | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch.rst b/arch.rst index a6fd6b0..0f93785 100644 --- a/arch.rst +++ b/arch.rst @@ -355,7 +355,7 @@ Internally, each of these subsystems is implemented as a highly available cloud service, running as a collection of microservices. The design is cloud-agnostic, so AMP can be deployed in a public cloud (e.g., Google Cloud, AWS, Azure), an operator-owned Telco cloud, (e.g, -AT&T’s AIC), or an enterprise-owned private cloud. For the current pilot +AT&T’s AIC), or an enterprise-owned private cloud. For the pilot deployment of Aether, AMP runs in the Google Cloud. The rest of this section introduces these four subsystems, with the diff --git a/monitor.rst b/monitor.rst index 257f019..d03615a 100644 --- a/monitor.rst +++ b/monitor.rst @@ -515,7 +515,7 @@ certainly possible.) Example control dashboard showing the set of Device Groups defined for a fictional set of Aether sites. -For example, :numref:`Figure %s ` shows the current set +For example, :numref:`Figure %s ` shows the set of device groups for a fictional set of Aether sites, where clicking on the "Edit" button pops up a web form that lets the enterprise admin modify the corresponding fields of the `Device-Group` model (not diff --git a/provision.rst b/provision.rst index 059edde..8b26dde 100644 --- a/provision.rst +++ b/provision.rst @@ -316,7 +316,7 @@ goal is to minimize manual configuration required to onboard physical infrastructure like that shown in :numref:`Figure %s `, but *zero-touch* is a high bar. To illustrate, the bootstrapping steps needed to complete provisioning for our example -deployment currently include: +deployment include: * Configure the Management Switch to know the set of VLANs being used. @@ -439,7 +439,7 @@ and using each Kubernetes cluster, and a way to manage independent projects that are to be deployed on a given cluster (i.e., manage namespaces for multiple applications). -As an example, Aether currently uses Rancher to manage Kubernetes on +As an example, Aether uses Rancher to manage Kubernetes on the bare-metal clusters, with one centralized instance of Rancher being responsible for managing all the edge sites. This results in the configuration shown in :numref:`Figure %s `, which to From 9832d3e5adca64abf341ecb9043b904cd6aaa5ed Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Wed, 23 Apr 2025 15:08:05 -0700 Subject: [PATCH 10/23] wordsmithing --- arch.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch.rst b/arch.rst index 0f93785..6f31db8 100644 --- a/arch.rst +++ b/arch.rst @@ -526,13 +526,13 @@ diagnostics and analytics. This overview of the management architecture could lead one to conclude that these four subsystems were architected, in a rigorous, -top-down fashion, to be completely independent. But that is not -the case. It is more accurate to say that the system evolved bottom -up, solving the next immediate problem one at a time, all the while +top-down fashion, to be completely independent. But that is not the +case. It is more accurate to say that the system evolved bottom up, +solving the next immediate problem one at a time, all the while creating a large ecosystem of open source components that can be used -in different combinations. What we are presenting in this book is a -retrospective description of an end result, organized into four -subsystems to help make sense of it all. +in different combinations. What this book presents is a retrospective +description of the end result, organized into four subsystems to help +make sense of it all. There are, in practice, many opportunities for interactions among the four components, and in some cases, there are overlapping concerns From 251f98ffd013bdc9ce38e58e818e838c8e66f186 Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Wed, 7 May 2025 16:36:18 +1000 Subject: [PATCH 11/23] minor edits for clarity --- arch.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch.rst b/arch.rst index 6f31db8..55ad965 100644 --- a/arch.rst +++ b/arch.rst @@ -187,10 +187,10 @@ cluster built out of bare-metal components, each of the SD-Core CP subsystems shown in :numref:`Figure %s ` is actually deployed in a logical Kubernetes cluster on a commodity cloud. The same is true for AMP. Aether’s centralized components are able to run -in Google Cloud Platform, Microsoft Azure, and Amazon’s AWS. They also +in Google Cloud Platform, Microsoft Azure, and Amazon’s AWS. They can also run as an emulated cluster implemented by a system like KIND—Kubernetes in Docker—making it possible for developers to run -these components on their laptop. +these components on their laptops. To be clear, Kubernetes adopts generic terminology, such as “cluster” and “service”, and gives it a very specific meaning. In @@ -239,8 +239,8 @@ There is a potential third stakeholder of note—third-party service providers—which points to the larger issue of how we deploy and manage additional edge applications. To keep the discussion tangible—but remaining in the open source arena—we use OpenVINO as an illustrative -example. OpenVINO is a framework for deploying AI inference models, -which is interesting in the context of Aether because one of its use +example. OpenVINO is a framework for deploying AI inference models. +It is interesting in the context of Aether because one of its use cases is processing video streams, for example to detect and count people who enter the field of view of a collection of 5G-connected cameras. @@ -274,11 +274,11 @@ but for completeness, we take note of two other possibilities. One is that we extend our hybrid architecture to support independent third-party service providers. Each new edge service acquires its own isolated Kubernetes cluster from the edge cloud, and then the -3rd-party provider subsumes all responsibility for managing the +3rd-party provider takes over all responsibility for managing the service running in that cluster. From the perspective of the cloud operator, though, the task just became significantly more difficult because the architecture would need to support Kubernetes as a managed -service, which is sometimes called *Container-as-a-Service (CaaS)*.\ [#]_ +service, which is sometimes called *Containers-as-a-Service (CaaS)*.\ [#]_ Creating isolated Kubernetes clusters on-demand is a step further than we take things in this book, in part because there is a second possible answer that seems more likely to happen. @@ -485,9 +485,9 @@ Given this mediation role, Runtime Control provides mechanisms to model (represent) the abstract services to be offered to users; store any configuration and control state associated with those models; apply that state to the underlying components, ensuring they remain in -sync with the operator’s intentions; and authorize the set API calls -users try to invoke on each service. These details are spelled out in -Chapter 5. +sync with the operator’s intentions; and authorize the set of API +calls that users try to invoke on each service. These details are +spelled out in Chapter 5. 2.4.4 Monitoring and Telemetry @@ -686,7 +686,7 @@ own. The Control and Management Platform now has its own DevOps team(s), who in addition to continually improving the platform, also field operational events, and when necessary, interact with other teams (e.g., the SD-RAN team in Aether) to resolve issues that come -up. They are sometimes called System Reliability Engineers (SREs), and +up. They are sometimes called Site Reliability Engineers (SREs), and in addition to being responsible for the Control and Management Platform, they enforce operational discipline—the third aspect of DevOps discussed next—on everyone else. From 782e2f0ecc9c889a8deebc8534812325f5b2998d Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Mon, 12 May 2025 15:23:18 -0700 Subject: [PATCH 12/23] Open Edge Platform --- intro.rst | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/intro.rst b/intro.rst index 9da64e5..d33d7bc 100644 --- a/intro.rst +++ b/intro.rst @@ -711,22 +711,26 @@ describe how to introduce VMs as an optional way to provision the underlying infrastructure for that PaaS. Finally, the Aether edge cloud we use as an example is similar to many -other edge cloud platforms now being promoted as an enabling -technology for Internet-of-Things. That Kubernetes-based on-prem/edge -clouds are becoming so popular is one reason they make for such a good -case study. For example, *Smart Edge Open* (formerly known as -OpenNESS) is another open source edge platform, unique in that it -includes several Intel-specific acceleration technologies (e.g., DPDK, -SR-IOV, OVS/OVN). For our purposes, however, the exact set of -components that make up the platform is less important than how the -platform, along with all the cloud services that run on top of it, are -managed as a whole. The Aether example allows us to be specific, but -hopefully not at the expense of general applicability. +other cloud platforms being promoted in support of on-prem +deployments. The dominant use case shifts over time—with Artificial +Intelligence (AI) recently overtaking Internet-of-Things (IoT) as the +most compelling justification for edge clouds—but the the operational +challenge remains the same. For example, *Open Edge Platform* recently +open sourced by Intel includes example AI applications and a +collection of AI libraries, but also an *Edge Management Framework* +that mirrors the one describe this book. It starts with a Kubernetes +foundation, and includes tools for provisioning edge clusters and +onboarding and lifecycle managing edge applications. Some of the +engineering choices are the same as in Aether and some are different, +but the important takeaway is that Kubernetes-based edge clouds are +quickly becoming commonplace. That's the reason they are such a good +case study. .. admonition:: Further Reading - `Smart Edge Open - `__. + `Open Edge Platform `__. + + `Edge Management Framework `__. 1.4 Future of the Sysadmin -------------------------- From 6fab5c31bbcc7c39fdca29c9dd22d726d9afe877 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Thu, 29 May 2025 14:09:29 -0700 Subject: [PATCH 13/23] updated bios --- authors.rst | 43 +++++++++++++++++++++---------------------- intro.rst | 21 +++++++++++---------- preface.rst | 3 ++- 3 files changed, 34 insertions(+), 33 deletions(-) diff --git a/authors.rst b/authors.rst index 822eff2..01ec8ee 100644 --- a/authors.rst +++ b/authors.rst @@ -13,30 +13,29 @@ the IEEE Kobayashi Computer and Communication Award, and the 2013 recipient of the ACM SIGCOMM Award. He received his Ph.D. degree from Purdue University. -**Scott Baker** is a Cloud Software Architect at Intel, which he -joined as part of Intel's acquisition of the Open Networking -Foundation (ONF) engineering team. While at ONF, he led the Aether -DevOps team. Prior to ONF, he worked on cloud-related research -projects at Princeton and the University of Arizona, including -PlanetLab, GENI, and VICCI. Baker received his Ph.D. in Computer -Science from the University of Arizona in 2005. +**Scott Baker** is a Cloud Software Architect at Intel, where he works +on the Open Edge Platform. Prior to joining Intel, he was on the Open +Networking Foundation (ONF) engineering team that built Aether, +leading the runtime control effort. Baker has also worked on +cloud-related research projects at Princeton and the University of +Arizona, including PlanetLab, GENI, and VICCI. He received his +Ph.D. in Computer Science from the University of Arizona in 2005. -**Andy Bavier** is a Cloud Software Engineer at Intel, which he joined -as part of Intel's acquisition of the Open Networking Foundation (ONF) -engineering team. While at ONF, he worked on the Aether project. Prior -to joining ONF, he was a Research Scientist at Princeton University, -where he worked on the PlanetLab project. Bavier received a BA in -Philosophy from William & Mary in 1990, and MS in Computer Science -from the University of Arizona in 1995, and a PhD in Computer Science -from Princeton University in 2004. +**Andy Bavier** is a Cloud Software Engineer at Intel, where he works +on the Open Edge Platform. Prior to joining Intel, he was on the Open +Networking Foundation (ONF) engineering team that built Aether, +leading the observability effort. Bavier has also been a Research +Scientist at Princeton University, where he worked on the PlanetLab +project. He received a BA in Philosophy from William & Mary in 1990, +and MS in Computer Science from the University of Arizona in 1995, and +a PhD in Computer Science from Princeton University in 2004. -**Zack Williams** is a Cloud Software Engineer at Intel, which he -joined as part of Intel's acquisition of the Open Networking -Foundation (ONF) engineering team. While at ONF, he worked on the -Aether project, and led the Infrastructure team. Prior to joining ONF, -he was a systems programmer at the University of Arizona. Williams -received his BS in Computer Science from the University of Arizona -in 2001. +**Zack Williams** is a Cloud Software Engineer at Intel, where he +works on the Open Edge Platform. Prior to joining Intel, he was on the +Open Networking Foundation (ONF) engineering team that built +Aether, leading the infrastructure provisioning effort. Williams has also +been a systems programmer at the University of Arizona. He received +his BS in Computer Science from the University of Arizona in 2001. **Bruce Davie** is a computer scientist noted for his contributions to the field of networking. He is a former VP and CTO for the Asia diff --git a/intro.rst b/intro.rst index d33d7bc..c299011 100644 --- a/intro.rst +++ b/intro.rst @@ -711,20 +711,21 @@ describe how to introduce VMs as an optional way to provision the underlying infrastructure for that PaaS. Finally, the Aether edge cloud we use as an example is similar to many -other cloud platforms being promoted in support of on-prem -deployments. The dominant use case shifts over time—with Artificial -Intelligence (AI) recently overtaking Internet-of-Things (IoT) as the -most compelling justification for edge clouds—but the the operational +other cloud platforms being built to support of on-prem deployments. +The dominant use case shifts over time—with Artificial Intelligence +(AI) recently overtaking Internet-of-Things (IoT) as the most +compelling justification for edge clouds—but the the operational challenge remains the same. For example, *Open Edge Platform* recently open sourced by Intel includes example AI applications and a collection of AI libraries, but also an *Edge Management Framework* that mirrors the one describe this book. It starts with a Kubernetes -foundation, and includes tools for provisioning edge clusters and -onboarding and lifecycle managing edge applications. Some of the -engineering choices are the same as in Aether and some are different, -but the important takeaway is that Kubernetes-based edge clouds are -quickly becoming commonplace. That's the reason they are such a good -case study. +foundation, and includes tools for provisioning edge servers, +orchestrating edge clusters using those servers, lifecycle managing +edge applications, and enabling observability. Many of the engineering +choices are the same as in Aether (some are different), but the +important takeaway is that Kubernetes-based edge clouds are quickly +becoming commonplace. That's the reason they are such a good case +study. .. admonition:: Further Reading diff --git a/preface.rst b/preface.rst index 1697aa2..5f5f6bf 100644 --- a/preface.rst +++ b/preface.rst @@ -15,7 +15,8 @@ disaggregated, the cloud is expanding from hundreds of datacenters to tens of thousands of enterprises. And while it is clear that the commodity cloud providers are eager to manage those edge clouds as a logical extension of their datacenters, they do not have a monopoly on the -know-how for making that happen. +know-how for making that happen. The increasing importance being +placed on *digital sovereignty* only only accentuates this point. This book lays out a roadmap that a small team of engineers followed over the course of a year to stand up and operationalize an edge cloud From 86fe4ffda390f1ef1bb4d38082fa8b362adbf2b3 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Thu, 29 May 2025 14:33:10 -0700 Subject: [PATCH 14/23] fixed typo --- preface.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/preface.rst b/preface.rst index 5f5f6bf..3239642 100644 --- a/preface.rst +++ b/preface.rst @@ -16,7 +16,7 @@ thousands of enterprises. And while it is clear that the commodity cloud providers are eager to manage those edge clouds as a logical extension of their datacenters, they do not have a monopoly on the know-how for making that happen. The increasing importance being -placed on *digital sovereignty* only only accentuates this point. +placed on *digital sovereignty* only accentuates this point. This book lays out a roadmap that a small team of engineers followed over the course of a year to stand up and operationalize an edge cloud From 062576f7755f06991937bdf500a9be85cbee4d89 Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Fri, 30 May 2025 08:45:39 +1000 Subject: [PATCH 15/23] typos --- intro.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/intro.rst b/intro.rst index c299011..16b27c1 100644 --- a/intro.rst +++ b/intro.rst @@ -711,10 +711,10 @@ describe how to introduce VMs as an optional way to provision the underlying infrastructure for that PaaS. Finally, the Aether edge cloud we use as an example is similar to many -other cloud platforms being built to support of on-prem deployments. +other cloud platforms being built to support on-prem deployments. The dominant use case shifts over time—with Artificial Intelligence (AI) recently overtaking Internet-of-Things (IoT) as the most -compelling justification for edge clouds—but the the operational +compelling justification for edge clouds—but the operational challenge remains the same. For example, *Open Edge Platform* recently open sourced by Intel includes example AI applications and a collection of AI libraries, but also an *Edge Management Framework* From 974d36ea85ac50b1aac41e5be39343acc129ba63 Mon Sep 17 00:00:00 2001 From: Bruce Davie Date: Sun, 1 Jun 2025 16:56:20 +1000 Subject: [PATCH 16/23] clarify introduction of CI/CD --- lifecycle.rst | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/lifecycle.rst b/lifecycle.rst index 7e77230..0d4bd4d 100644 --- a/lifecycle.rst +++ b/lifecycle.rst @@ -10,15 +10,14 @@ assume the base platform includes Linux running on each server and switch, plus Docker, Kubernetes, and Helm, with SD-Fabric controlling the network. -While we could take a narrow view of Lifecycle Management, and assume -the software we want to roll out has already gone through an off-line -integration-and-testing process (this is the traditional model of -vendors releasing a new version of their product), we take a more -expansive approach that starts with the development process—the creation -of new features and capabilities. Including the “innovation” step -closes the virtuous cycle depicted in :numref:`Figure %s`, -which the cloud industry has taught us leads to greater *feature -velocity*. +Traditionally, software would go through an offline integration and +testing process before any effort to roll it out in production could +begin. However, the approach taken in most modern cloud environments, +including ours, is more expansive: it starts with the development +process—the creation of new features and capabilities. Including the +“innovation” step closes the virtuous cycle depicted in +:numref:`Figure %s`, which the cloud industry has taught us +leads to greater *feature velocity*. .. _fig-cycle: .. figure:: figures/Slide9.png From 413ea609ca167c58456f107ab8745a0bd90baac4 Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Mon, 2 Jun 2025 16:14:32 +1000 Subject: [PATCH 17/23] minor edits for clarity and typos --- lifecycle.rst | 40 +++++++++++++++++++--------------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/lifecycle.rst b/lifecycle.rst index 0d4bd4d..7a3a210 100644 --- a/lifecycle.rst +++ b/lifecycle.rst @@ -184,7 +184,7 @@ effective use of automation. This section introduces an approach to test automation, but we start by talking about the overall testing strategy. -The best-practice for testing in the Cloud/DevOps environment is to +The best practice for testing in the Cloud/DevOps environment is to adopt a *Shift Left* strategy, which introduces tests early in the development cycle, that is, on the left side of the pipeline shown in :numref:`Figure %s `. To apply this principle, you first @@ -311,16 +311,14 @@ switches). Example Testing Frameworks used in Aether. -Some of the frameworks shown in :numref:`Figure %s -` were co-developed with the corresponding software -component. This is true of TestVectors and TestON, which put -customized workloads on Stratum (SwitchOS) and ONOS (NetworkOS), -respectively. Both are open source, and hence available to pursue for -insights into the challenges of building a testing framework. In -contrast, NG40 is a proprietary framework for emulating 3GPP-compliant -cellular network traffic, which due to the complexity and value in -demonstrating adherence to the 3GPP standard, is a closed, commercial -product. +Some of the frameworks shown in :numref:`Figure %s ` were +co-developed with the corresponding software component. This is true +of TestVectors and TestON, which put customized workloads on Stratum +(SwitchOS) and ONOS (NetworkOS), respectively. Both are open source, +and hence available to be perused for insights into the challenges of +building a testing framework. In contrast, NG40 is a +close source, proprietary framework for emulating 3GPP-compliant +cellular network traffic. Selenium and Robot are the most general of the five examples. Each is an open source project with an active developer community. Selenium is a @@ -475,7 +473,7 @@ publish a new Docker image, triggered by a change to a ``VERSION`` file stored in the code repo. (We'll see why in Section 4.5.) As an illustrative example, the following is from a Groovy script that -defines the pipeline for testing the Aether API, which as we'll see in +defines the pipeline for testing the Aether API, which, as we'll see in the next chapter, is auto-generated by the Runtime Control subsystem. We're interested in the general form of the pipeline, so omit most of the details, but it should be clear from the example what @@ -518,8 +516,8 @@ patch set. *Aether uses Jenkins as our CI tool, but another popular option is GitHub Actions. This is a relatively new feature of GitHub (the - cloud service, not the software package) that nicely integrates - the code repo with a set of workflows that can be exectued every + cloud service, not to be confused with the software tool Git). GitHub Actions augment + the code repo with a set of workflows that can be executed every time a patch is submitted. In this setting, a workflow is roughly analogous to a Groovy pipeline.* @@ -560,7 +558,7 @@ Config Repo, which includes both the set of Terraform Templates that specify the underlying infrastructure (we've been calling this the cloud platform) and the set of Helm Charts that specify the collection of microservices (sometimes called applications) that are to be -deployed on that infrastructure. We already know about Terraform from +deployed on that infrastructure. We discussed Terraform in Chapter 3: it's the agent that actually "acts on" the infrastructure-related forms. For its counterpart on the application side Aether uses an open source project called Fleet. @@ -704,8 +702,8 @@ Our starting point is to adopt the widely-accepted practice of version number *MAJOR.MINOR.PATCH* (e.g., ``3.2.4``), where the *MAJOR* version increments whenever you make an incompatible API change, the *MINOR* version increments when you add functionality in a -backward compatible way, and the *PATCH* corresponds to a backwards -compatible bug fix. +backward-compatible way, and the *PATCH* corresponds to a +backward-compatible bug fix. .. _reading_semver: .. admonition:: Further Reading @@ -731,7 +729,7 @@ the software lifecycle: * The commit that does correspond to a finalized patch is also tagged (in the repo) with the corresponding semantic version number. In - git, this tag is bound to a hash that unambiguously identifies the + Git, this tag is bound to a hash that unambiguously identifies the commit, making it the authoritative way of binding a version number to a particular instance of the source code. @@ -797,7 +795,7 @@ chapter. 4.6 Managing Secrets -------------------- -The discussion up this point has glossed over one important detail, +The discussion up to this point has glossed over one important detail, which is how secrets are managed. These include, for example, the credentials Terraform needs to access remote services like GCP, as well as the keys used to secure communication among microservices @@ -852,7 +850,7 @@ Controller to use its sealing key to help them unlock those secrets. While this approach is less general than the first (i.e., it is specific to protecting secrets within a Kubernetes cluster), it has -the advantage of taking humans completely out-of-the-loop, with the +the advantage of taking humans completely out of the loop, with the sealing key being programmatically generated at runtime. One complication, however, is that it is generally preferable for that secret to be written to persistent storage, to protect against having @@ -898,7 +896,7 @@ with a particular set of use cases in mind, but it is later integrated with other software to build entirely new cloud apps that have their own set of abstractions and features, and correspondingly, their own collection of configuration state. This is true for Aether, where the -SD-Core subsystem was originally implemented for use in global +SD-Core subsystem, for example, was originally implemented for use in global cellular networks, but is being repurposed to support private 4G/5G in enterprises. From 0bec646036e7704f9232fdd4bdef7fff7e7b00c1 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Mon, 2 Jun 2025 13:14:31 -0700 Subject: [PATCH 18/23] fixed adaptor --- control.rst | 34 +++++++++++++++++----------------- dict.txt | 4 ++++ lifecycle.rst | 9 +++++---- provision.rst | 2 +- 4 files changed, 27 insertions(+), 22 deletions(-) diff --git a/control.rst b/control.rst index a50ba49..d64b855 100644 --- a/control.rst +++ b/control.rst @@ -155,7 +155,7 @@ that we can build upon. from (1) a GUI, which is itself typically built using another framework, such as AngularJS; (2) a CLI; or (3) a closed-loop control program. There are other differences—for example, - Adapters (a kind of Controller) use gNMI as a standard + Adaptors (a kind of Controller) use gNMI as a standard interface for controlling backend components, and persistent state is stored in a key-value store instead of a SQL DB—but the biggest difference is the use of a declarative rather than an @@ -168,11 +168,11 @@ x-config, in turn, uses Atomix (a key-value store microservice), to make configuration state persistent. Because x-config was originally designed to manage configuration state for devices, it uses gNMI as its southbound interface to communicate configuration changes to -devices (or in our case, software services). An Adapter has to be +devices (or in our case, software services). An Adaptor has to be written for any service/device that does not support gNMI -natively. These adapters are shown as part of Runtime Control in +natively. These adaptors are shown as part of Runtime Control in :numref:`Figure %s `, but it is equally correct to view each -adapter as part of the backend component, responsible for making that +adaptor as part of the backend component, responsible for making that component management-ready. Finally, Runtime Control includes a Workflow Engine that is responsible for executing multi-step operations on the data model. This happens, for example, when a change @@ -467,15 +467,15 @@ the case of Aether, Open Policy Agent (OPA) serves this role. `__. -5.2.4 Adapters +5.2.4 Adaptors ~~~~~~~~~~~~~~ Not every service or subsystem beneath Runtime Control supports gNMI, -and in the case where it is not supported, an adapter is written to +and in the case where it is not supported, an adaptor is written to translate between gNMI and the service’s native API. In Aether, for -example, a gNMI :math:`\rightarrow` REST adapter translates between +example, a gNMI :math:`\rightarrow` REST adaptor translates between the Runtime Control’s southbound gNMI calls and the SD-Core -subsystem’s RESTful northbound interface. The adapter is not +subsystem’s RESTful northbound interface. The adaptor is not necessarily just a syntactic translator, but may also include its own semantic layer. This supports a logical decoupling of the models stored in x-config and the interface used by the southbound @@ -484,15 +484,15 @@ Control to evolve independently. It also allows for southbound devices/services to be replaced without affecting the northbound interface. -An adapter does not necessarily support only a single service. An -adapter is one means of taking an abstraction that spans multiple +An adaptor does not necessarily support only a single service. An +adaptor is one means of taking an abstraction that spans multiple services and applying it to each of those services. An example in Aether is the *User Plane Function* (the main packet-forwarding module in the SD-Core User Plane) and *SD-Core*, which are jointly -responsible for enforcing *Quality of Service*, where the adapter +responsible for enforcing *Quality of Service*, where the adaptor applies a single set of models to both services. Some care is needed to deal with partial failure, in case one service accepts the change, -but the other does not. In this case, the adapter keeps trying the +but the other does not. In this case, the adaptor keeps trying the failed backend service until it succeeds. 5.2.5 Workflow Engine @@ -519,7 +519,7 @@ ongoing development. gNMI naturally lends itself to mutual TLS for authentication, and that is the recommended way to secure communications between components that speak gNMI. For example, communication between x-config and -its adapters uses gNMI, and therefore, uses mutual TLS. Distributing +its adaptors uses gNMI, and therefore, uses mutual TLS. Distributing certificates between components is a problem outside the scope of Runtime Control. It is assumed that another tool will be responsible for distributing, revoking, and renewing certificates. @@ -738,7 +738,7 @@ that it supports the option of spinning up an entirely new copy of the SD-Core rather than sharing an existing UPF with another Slice. This is done to ensure isolation, and illustrates one possible touch-point between Runtime Control and the Lifecycle Management subsystem: -Runtime Control, via an Adapter, engages Lifecycle Management to +Runtime Control, via an Adaptor, engages Lifecycle Management to launch the necessary set of Kubernetes containers that implement an isolated slice. @@ -802,7 +802,7 @@ Giving enterprises the ability to set isolation and QoS parameters is an illustrative example in Aether. Auto-generating that API from a set of models is an attractive approach to realizing such a control interface, if for no other reason than it forces a decoupling of the -interface definition from the underlying implementation (with Adapters +interface definition from the underlying implementation (with Adaptors bridging the gap). .. sidebar:: UX Considerations @@ -839,7 +839,7 @@ configuration change requires a container restart, then there may be little choice. But ideally, microservices are implemented with their own well-defined management interfaces, which can be invoked from either a configuration-time Operator (to initialize the component at -boot time) or a control-time Adapter (to change the component at +boot time) or a control-time Adaptor (to change the component at runtime). For resource-related operations, such as spinning up additional @@ -847,7 +847,7 @@ containers in response to a user request to create a *Slice* or activate an edge service, a similar implementation strategy is feasible. The Kubernetes API can be called from either Helm (to initialize a microservice at boot time) or from a Runtime Control -Adapter (to add resources at runtime). The remaining challenge is +Adaptor (to add resources at runtime). The remaining challenge is deciding which subsystem maintains the authoritative copy of that state, and ensuring that decision is enforced as a system invariant.\ [#]_ Such decisions are often situation-dependent, but our experience is diff --git a/dict.txt b/dict.txt index 50c055c..bc1a8f3 100644 --- a/dict.txt +++ b/dict.txt @@ -1,4 +1,6 @@ Acknowledgements +Adaptor +Adaptors Aether Alertmanager Ansible @@ -85,6 +87,8 @@ VMware Vemuri Weaveworks absorber +adaptor +adaptors analytics architected auth diff --git a/lifecycle.rst b/lifecycle.rst index 7a3a210..9a5d6da 100644 --- a/lifecycle.rst +++ b/lifecycle.rst @@ -655,10 +655,11 @@ when. overloaded the repo. A "polling-frequency" parameter change improved the situation, but led people to wonder why Jenkins' trigger mechanism hadn't caused the same problem. The answer is - that Jenkins is better integrated with the repo (specifically, - Gerrit running on top of Git), with the repo pushing event - notifications to Jenkins when a file check-in actually occurs. - There is no polling.* + that Jenkins is better integrated with the repo, with a GitHub + webhook pushing event notifications to Jenkins when a file + check-in actually occurs. There is no polling. (Polling can also + be disabled in Fleet, in favor of webhooks, but polling is the + default.)* This focus on Fleet as the agent triggering the execution of Helm Charts should not distract from the central role of the charts diff --git a/provision.rst b/provision.rst index 8b26dde..8158778 100644 --- a/provision.rst +++ b/provision.rst @@ -432,7 +432,7 @@ Kubernetes cluster. For starters, the API needs to provide a means to install and configure Kubernetes on each physical cluster. This includes specifying which version of Kubernetes to run, selecting the right combination of Container Network Interface (CNI) plugins -(virtual network adapters), and connecting Kubernetes to the local +(virtual network adaptors), and connecting Kubernetes to the local network (and any VPNs it might need). This layer also needs to provide a means to set up accounts (and associated credentials) for accessing and using each Kubernetes cluster, and a way to manage From 64b2acac8a93387da2bb9a992d8fe19a3e229bc7 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Mon, 2 Jun 2025 13:35:34 -0700 Subject: [PATCH 19/23] digital sovereignty --- preface.rst | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/preface.rst b/preface.rst index 3239642..37e8553 100644 --- a/preface.rst +++ b/preface.rst @@ -11,12 +11,18 @@ a perfectly good job of it. The answer, we believe, is that the cloud is becoming ubiquitous in another way, as distributed applications increasingly run not just in large, central datacenters but at the edge. As applications are -disaggregated, the cloud is expanding from hundreds of datacenters to tens of -thousands of enterprises. And while it is clear that the commodity -cloud providers are eager to manage those edge clouds as a logical -extension of their datacenters, they do not have a monopoly on the -know-how for making that happen. The increasing importance being -placed on *digital sovereignty* only accentuates this point. +disaggregated, the cloud is expanding from hundreds of datacenters to +tens of thousands of enterprises. And while it is clear that the +commodity cloud providers are eager to manage those edge clouds as a +logical extension of their datacenters, they do not have a monopoly on +the know-how for making that happen. + +At the same time edge applications are moving to the forefront, +increasing importance is also being placed on *digital sovereignty*, +the ability of nations and organizations to control their own destiny. +Cloud technology is important for running today's workloads, but +access to that technology does not necessarily have to be bundled with +outsourcing operational control. This book lays out a roadmap that a small team of engineers followed over the course of a year to stand up and operationalize an edge cloud From ba8c6ab08c8df23b6c6889a72a7d06836fd3115a Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Tue, 3 Jun 2025 10:29:45 +1000 Subject: [PATCH 20/23] minor edits --- control.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/control.rst b/control.rst index d64b855..bd63be8 100644 --- a/control.rst +++ b/control.rst @@ -81,7 +81,7 @@ deployments of 5G, and to that end, defines a *user* to be a principal that accesses the API or GUI portal with some prescribed level of privilege. There is not necessarily a one-to-one relationship between users and Core-defined subscribers, and more importantly, not all -devices have subscribers, as would be the case with IoT devices that +devices have subscribers; a concrete example would be IoT devices that are not typically associated with a particular person. 5.1 Design Overview @@ -428,8 +428,8 @@ models are changing due to volatility in the backend systems they control, then it is often the case that the models can be distinguished as "low-level" or "high-level", with only the latter directly visible to clients via the API. In semantic versioning terms, -a change to a low-level model would then effectively be a backwards -compatible PATCH. +a change to a low-level model would then effectively be a +backward-compatible PATCH. 5.2.3 Identity Management From ab7a3427a51a89bda1c37e1f8970dd7f8e649d64 Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Tue, 3 Jun 2025 17:35:37 +1000 Subject: [PATCH 21/23] editorial pass, bio update --- authors.rst | 23 +++++++++++++---------- monitor.rst | 24 ++++++++++++------------ 2 files changed, 25 insertions(+), 22 deletions(-) diff --git a/authors.rst b/authors.rst index 01ec8ee..ec8d718 100644 --- a/authors.rst +++ b/authors.rst @@ -38,14 +38,17 @@ been a systems programmer at the University of Arizona. He received his BS in Computer Science from the University of Arizona in 2001. **Bruce Davie** is a computer scientist noted for his contributions to -the field of networking. He is a former VP and CTO for the Asia -Pacific region at VMware. He joined VMware during the acquisition of -Software Defined Networking (SDN) startup Nicira. Prior to that, he -was a Fellow at Cisco Systems, leading a team of architects -responsible for Multiprotocol Label Switching (MPLS). Davie has over -30 years of networking industry experience and has co-authored 17 -RFCs. He was recognized as an ACM Fellow in 2009 and chaired ACM -SIGCOMM from 2009 to 2013. He was also a visiting lecturer at the -Massachusetts Institute of Technology for five years. Davie is the -author of multiple books and the holder of more than 40 U.S. Patents. +the field of networking. He began his networking career at Bellcore +where he worked on the Aurora Gigabit testbed and collaborated with +Larry Peterson on high-speed host-network interfaces. He then went to +Cisco where he led a team of architects responsible for Multiprotocol +Label Switching (MPLS). He worked extensively at the IETF on +standardizing MPLS and various quality of service technologies. He +also spent five years as a visiting lecturer at the Massachusetts +Institute of Technology. In 2012 he joined Software Defined Networking +(SDN) startup Nicira and was then a principal engineer at VMware +following the acquisition of Nicira. In 2017 he took on the role of VP +and CTO for the Asia Pacific region at VMware. He is a Fellow of the +ACM and chaired ACM SIGCOMM from 2009 to 2013. Davie is the author of +multiple books and the holder of more than 40 U.S. patents. diff --git a/monitor.rst b/monitor.rst index d03615a..034558e 100644 --- a/monitor.rst +++ b/monitor.rst @@ -77,7 +77,7 @@ closed-loop control where the automated tool not only detects problems but is also able to issue corrective control directives. For the purpose of this chapter, we give examples of the first two (alerts and dashboards), and declare the latter two (analytics and close-loop -control) as out-of-scope (but likely running as applications that +control) as out of scope (but likely running as applications that consume the telemetry data outlined in the sections that follow). Third, when viewed from the perspective of lifecycle management, @@ -96,9 +96,9 @@ Finally, because the metrics, logs, and traces collected by the various subsystems are timestamped, it is possible to establish correlations among them, which is helpful when debugging a problem or deciding whether or not an alert is warranted. We give examples of how -such telemetry-wide functions are implemented in practice today, as -well as discuss the future future of generating and using telemetry -data, in the final two sections of this chapter. +such telemetry-wide functions are implemented in practice today, and +discuss the future of generating and using telemetry data, in the +final two sections of this chapter. 6.1 Metrics and Alerts ------------------------------- @@ -170,7 +170,7 @@ to the central location (e.g., to be displayed by Grafana as described in the next subsection). This is appropriate for metrics that are both high-volume and seldom viewed. One exception is the end-to-end tests described in the previous paragraph. These results are immediately -pushed to the central site (bypassing the local Prometheus), because +pushed to the central site (bypassing the local Prometheus instance), because they are low-volume and may require immediate attention. 6.1.2 Creating Dashboards @@ -179,7 +179,7 @@ they are low-volume and may require immediate attention. The metrics collected by Prometheus are visualized using Grafana dashboards. In Aether, this means the Grafana instance running as part of AMP in the central cloud sends queries to some combination of -the central Prometheus and a subset of the Prometheus instances +the central Prometheus instance and a subset of the Prometheus instances running on edge clusters. For example, :numref:`Figure %s ` shows the summary dashboard for a collection of Aether edge sites. @@ -497,9 +497,9 @@ SD-Core, which augments the UPF performance data shown in in a Grafana dashboard. Second, the runtime control interface described in Chapter 5 provides -a means to change various parameters of a running system, but having -access to the data needed to know what changes (if any) need to be -made is a prerequisite for making informed decisions. To this end, it +a means to change various parameters of a running system, but to make +informed decisions about what changes (if any) need to be +made, it is necessary to have access to the right data. To this end, it is ideal to have access to both the "knobs" and the "dials" on an integrated dashboard. This can be accomplished by incorporating Grafana frames in the Runtime Control GUI, which, in its simplest form, @@ -584,9 +584,9 @@ Chapter 1. A Service Mesh framework such as Istio provides a means to enforce fine-grained security policies and collect telemetry data in cloud native applications by injecting "observation/enforcement points" between microservices. These injection points, called -*sidecars*, are typically implemented by a container that "runs along -side" the containers that implement each microservice, with all RPC -calls from Service A to Service B passing through their associated +*sidecars*, are typically implemented by a container that "runs +alongside" the containers that implement each microservice, with all +RPC calls from Service A to Service B passing through their associated sidecars. As shown in :numref:`Figure %s `, these sidecars then implement whatever policies the operator wants to impose on the application, sending telemetry data to a global collector and From 0a6c029f9218411c3b86cdac2d4587627966825f Mon Sep 17 00:00:00 2001 From: Bruce Davie <3101026+drbruced12@users.noreply.github.com> Date: Tue, 3 Jun 2025 17:38:56 +1000 Subject: [PATCH 22/23] control of data --- preface.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/preface.rst b/preface.rst index 37e8553..33eacaf 100644 --- a/preface.rst +++ b/preface.rst @@ -19,8 +19,8 @@ the know-how for making that happen. At the same time edge applications are moving to the forefront, increasing importance is also being placed on *digital sovereignty*, -the ability of nations and organizations to control their own destiny. -Cloud technology is important for running today's workloads, but +the ability of nations and organizations to control their own destiny +and their data. Cloud technology is important for running today's workloads, but access to that technology does not necessarily have to be bundled with outsourcing operational control. From 4d05b74ed9818369d95be0c59fcb7a1796217c76 Mon Sep 17 00:00:00 2001 From: Larry Peterson Date: Tue, 3 Jun 2025 12:58:20 -0700 Subject: [PATCH 23/23] destiny & data --- preface.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/preface.rst b/preface.rst index 37e8553..fee4128 100644 --- a/preface.rst +++ b/preface.rst @@ -19,10 +19,10 @@ the know-how for making that happen. At the same time edge applications are moving to the forefront, increasing importance is also being placed on *digital sovereignty*, -the ability of nations and organizations to control their own destiny. -Cloud technology is important for running today's workloads, but -access to that technology does not necessarily have to be bundled with -outsourcing operational control. +the ability of nations and organizations to control their destiny and +their data. Cloud technology is important for running today's +workloads, but access to that technology does not necessarily have to +be bundled with outsourcing operational control. This book lays out a roadmap that a small team of engineers followed over the course of a year to stand up and operationalize an edge cloud