Process group availability monitoring and alerting

Alerting on process availability by default tends to introduce a huge amount of alerting noise from processes that aren’t central to the success of your digital business. Typically, individual process availability isn’t crucial within dynamic microservices-based clusters, and so it doesn’t make sense to alert on the availability of individual processes, except when DESK determines that real users are affected by the issue.

However, you may have a number of highly critically-important processes that you want to monitor for availability. Or possibly you want to ensure that a cluster never has less than a specified minimum number of processes. For such scenarios, DESK allows you to select the availability alerting strategy that best meets your needs. This means that you can configure DESK to proactively alert you if any process or number of processes within a specific process group goes offline or crashes. Alerts include links to related DESK Problem pages (see example below), making it easy for you to access all the details related to an unavailable process so that you can quickly resolve the issue.

Problems page

Enable process-group availability alerting

Rather than enabling availability alerting globally across all process groups, we’ve made it so that you can enable availability alerting only for specific mission-critical process groups. To set up availability alerting for a process group:

  1. Select Technologies from the navigation menu.
  2. Select the process-group technology type.
  3. Scroll down and select the process group you’re interested in.
  4. Click the Process group details button to open the process group page.
  5. Click the Edit button to access the Process group settings page.
  6. Click Availability monitoring.
  7. From the drop list select If any process becomes unavailable or If service requests are impacted in case you want a process unavailable event to be generated if DESK detects active client requests hitting the selected process.

Alternatively, you can enable availability alerting for any individual process using the same Process group settings page.

  1. Select Hosts from the menu.
  2. Select the host that hosts the process you’re interested in.
  3. Scroll down and click the Consuming processes button.
  4. From the Process list, select the process group (or individual process) you’re interested in.
  5. Click the Browse () button and select Edit button to access the Process group settings page.
  6. Click Availability monitoring.
  7. From the drop list select If any process becomes unavailable.

Process group anomaly detection within clusters

A “process group” is a group of processes that each do the same thing (for example, a group of identical Apache Tomcat web server processes that run concurrently on separate hosts). Each process in a process group (aka, a “cluster”) serves the same type of request. Typically, a load balancer is run in front of such a cluster.

In such scenarios, it makes sense to define a minimum number of process instances for each process group—the minimum number of processes that comprise a healthy cluster.

To define a minimum number of instances for a process group:

  1. Select Technologies from the navigation menu.
  2. Select the tile of the process-group technology you want to configure (for example, Apache Tomcat). The related process groups are then displayed at the bottom of the page.
  3. Select the process group you want to configure. The process group view expands.
  4. Click the Process group details button to open the process group page.
  5. Click the Edit button to access the Process group settings page.
  6. Click Availability monitoring.
  7. From the drop list select If minimum threshold is not met and then enter the minimum number of process instances required for a healthy system.

Once defined, you will be alerted going forward if the number of processes in the process group falls below the minimum that you’ve set. Once the defined minimum number of instances is met, the problem is closed automatically.

process group availability