OneAgent Operator

Rolling out DESK full-stack monitoring to a Kubernetes or OpenShift cluster using a DaemonSet is easy and straightforward. Managing the full lifecycle of OneAgent deployments can, however, become a bit cumbersome as there’s no proper out-of-the-box lifecycle management available for Kubernetes DaemonSets that allows for easy OneAgent updates. In a typical scenario, the team responsible for Day 2 operations is responsible for keeping track of new OneAgent versions as they are introduced, taking care of rolling out the new versions and restarting all pods in order to pick up the updates.

In alignment with the automation mantra of the Kubernetes community, DESK introduces DESK OneAgent Operator.

What is an operator in Kubernetes?

Kubernetes version 1.7 introduced the concept of custom resources and controllers, which allow for extending the Kubernetes API. These extension capabilities enable the Kubernetes community to implement domain-specific applications as first-class Kubernetes objects in a cloud-native style. This means you can define the desired state of workloads in a declarative manner and create custom controller logic that takes continuous action to achieve and maintain the desired state.

An operator makes use of these capabilities and extends the Kubernetes API by utilizing this concept of custom resources and corresponding resource controllers. The term "operator" was coined by CoreOS and announced as a means to more efficiently and reliably manage the lifecycle of stateful applications.

RedHat and CoreOS refined this concept and evolved the idea to provide an Operator Framework (including an Operator SDK and Lifecycle Manager) to make the process of implementing operators as easy as possible.

DESK OneAgent Operator for ‘Day 2’ operations

DESK makes use of the Operator concept by putting operational knowledge into software and automating the management, updates, and roll-outs of new DESK OneAgent versions.

Value-add of DESK OneAgent Operator at a glance:

OneAgent Operator offers fine-grained control of OneAgent roll-outs. You can thereby select nodes based on node labels. This enables you to monitor selected nodes using different DESK environments. OneAgent Operator also supports tolerations so you can deploy OneAgent on tainted nodes.
DESK OneAgent updates are performed automatically, as soon as they’re available. When pending updates are available, OneAgent Operator takes care of recycling all pods that haven't yet picked up the latest version.
OneAgent Operator ensures that you always monitor your OpenShift cluster with the latest OneAgent version.

How does OneAgent Operator work?

DESK OneAgent Operator registers itself as a controller that watches for resources of type OneAgent as defined by a CustomResourceDefinition. This allows you to define a configuration that describes your OneAgent deployment. By loading the configuration into Kubernetes or OpenShift, the configuration is automatically passed to the custom controller which ensures the rollout of OneAgent based on your specification.

The following diagram outlines the involved components and objects.

OneAgent Operator Overview

By creating the OneAgent CustomResource entity in Kubernetes, the object is automatically passed to DESK OneAgent Operator. First, it’s determined if a corresponding DaemonSet already exists. If not, DESK OneAgent Operator creates a new one. The DaemonSet is responsible for rolling out OneAgent to selected nodes.

Further, OneAgent Operator constantly queries the DESK API to check if a new version is available for a given deployment. In the event of a pending update, all Pods belonging to a certain custom resource that don’t have the updated version are recycled in order to pick up the latest drivers.

This reconciliation loop constantly revises the actual state with the desired state and takes appropriate actions when needed.