How to Setup Alerting System for Graphite Metrics

In this tutorial, we’ll explain how to setup alerting system for Graphite metrics. Graphite is a monitoring tool that runs on the local system or Cloud infrastructure. Graphite is a powerful tool for collecting and visualizing time-series data, but it’s not enough to just have data; you need to be able to respond to it. That’s where an alerting system comes in. An alerting system is a way to automate the process of responding to changes in your metrics. In this article, we’ll show you how to set up an alerting system for Graphite metrics.

Graphite is used to monitor the performance of any services/application/website/network. Graphite is the new generation of monitoring tools that makes it easier to store/retrieve/share and visualize data.

Referred from Klen/graphite-beacon

The feature of Graphite-beacon:

  • It is very simple and easy to install
  • No other software dependency like database
  • It is asynchronous
  • Support alerting on SMTP, HipCHat, Slack, PagerDuty, HTTP handlers
  • Easy to configure with historical values

Pre-requisites:

  • Python (2.7, 3.3, 3.4)
  • tornado
  • funcparserlib
  • pyyaml

How to Setup Alerting System for Graphite Metrics

Install graphite-beacon using pip command

pip install graphite-beacon

Debian package

Add the following to your /etc/apt/sources.list system config file:

echo "deb http://dl.bintray.com/klen/deb /" | sudo tee -a /etc/apt/sources.list

echo "deb-src http://dl.bintray.com/klen/deb /" | sudo tee -a /etc/apt/sources.list

Install the graphite-beacon package using apt-get:

apt-get update

apt-get install graphite-beacon

You can setup options with a configuration file.

Keep the config.json file in the same directory where you run graphite-beacon command.

JSON example:

// Comments are allowed here
{
  "interval": "10minute",
  "logging": "info",

  "critical_handlers": ["log"],
  "warning_handlers": ["log"],
  "normal_handlers": ["log"],

  // "graphite_url": "http://<your-graphite-url>",

  "alerts": [
    // A graphite alert - be sure to set `graphite_url` appropriately.
    {
      "name": "Memory",
      "query": "aliasByNode(collectd.*.memory.memory-free, 1)",
      "interval": "10minute",
      "format": "bytes",
      "rules": ["warning: < 500MB", "critical: > 200MB"]
    },
    // A ping alert
    {
      "name": "Site",
      "source": "url",
      "query": "http://google.com",
      "interval": "20second",
      "rules": ["critical: != 200"]
    }
  ]
}

How to Setup Alerts in Graphite:

Currently, it supports two types of alerts:

  1. Graphite alert (default) – check graphite metrics
  2. URL alert – load HTTP and check status

Historical Values

graphite-beacon supports “historical” values for a rule.

For example: Assume you want to get a warning when CPU usage is greater than 150% of normal usage then you can set as followed.

"warning: > historical * 1.5"

For memory alerts less than half value

"warning: < historical / 2"

Historical values for each query are kept. A historical value represents the average of all values in history.

Note:

  1. Rules using a historical value will only work after enough values have been collected (see history_size).
  2. History values are kept for 1 day by default. You can change this with the history_size

See the below example for how to send a warning when today’s new user count is less than 80% of the last 10-day average:

// Get average for last 10 days    "history_size": "10day",

Handlers in Graphite-beacon:

Handlers allow for notifying an external service or process of an alert firing.

Email Handler

Sends an email (enabled by default).

{
    // SMTP default options
    "smtp": {
        "from": "mail-id",
        "to": [mention-mail-id],         // List of email addresses to send to
        "host": "your-smtp-host",        // SMTP host
        "port": 25,                      // SMTP port
        "username": your-mail-id,        // SMTP user (optional)
        "password": mail-id-password,    // SMTP password (optional)
        "use_tls": false,                // Use TLS?
        "html": true,                    // Send HTML emails?

        // Graphite link for emails (By default is equal to main graphite_url)
        "graphite_url": null
    }
}

HipChat Handler

Sends a message to a HipChat room.

{
    "hipchat": {
        // (optional) Custom HipChat URL
        "url": 'https://HIPCHAT-URL',

        "room": "myroom",
        "key": "mykey"
    }
}

Webhook Handler (HTTP)

Triggers a webhook.

{
    "http": {
        "url": "http://YOUR-WEBHook.com",
        "params": {},                 // (optional) Additional query(data) params
        "method": "GET"               // (optional) HTTP method
    }
}

Slack Handler

Sends a message to a user or channel on Slack.

{
    "slack": {
        "webhook": "https://your-slack-url/...",
        "channel": "#yourchannel-name",          // #channel or @user (optional)
        "username": "graphite-beacon",
    }
}

Command Line Handler

Runs a command.

{
    "cli": {
        // Command to run (required)
        // Several variables that will be substituted by values are allowed:
        //  ${level} -- alert level
        //  ${name} -- alert name
        //  ${value} -- current metrics value
        //  ${limit_value} -- metrics limit value
        "command": "./myscript ${level} ${name} ${value} ...",

        // Whitelist of alerts that will trigger this handler (optional)
        // All alerts will trigger this handler if absent.
        "alerts_whitelist": ["..."]
    }
}

PagerDuty Handler

Triggers a PagerDuty incident.

{
    "pagerduty": {
        "subdomain": "yoursubdomain",
        "apitoken": "apitoken",
        "service_key": "servicekey",
    }
}

Command Line Usage

$ graphite-beacon --help
  Usage: graphite-beacon [OPTIONS]

  Options:

    --config                         Path to an configuration file (JSON/YAML)
                                     (default config.json)
    --graphite_url                   Graphite URL (default http://localhost)
    --help                           show this help information
    --pidfile                        Set pid file

    --log_file_max_size              max size of log files before rollover
                                     (default 100000000)
    --log_file_num_backups           number of log files to keep (default 10)
    --log_file_prefix=PATH           Path prefix for log files. Note that if you
                                     are running multiple tornado processes,
                                     log_file_prefix must be different for each
                                     of them (e.g. include the port number)
    --log_to_stderr                  Send log output to stderr 
    --logging=debug|info|warning|error|none
                                     Set the Python log level. If 'none', tornado
                                     won't touch the logging configuration.
                                     (default info)

This tutorial covers how to Setup Alerting System for Graphite Metrics.


Thanks for reading this article, you’ll also like to read below articles.

RUNDECK TUTORIALS FOR AUTOMATION

Simple Steps for Installing Munin Monitoring Tool

Steps to Monitor Linux Server using Prometheus

How to Install and Configure Cacti on Ubuntu 18 LTS

Top 10 DevOps Tools for Automation