Operational Dashboards with Grafana#
We use Grafana for creating dashboards from our operational metrics. Dashboards are useful for understanding the current status of the system and all its components at a glance. They are also very useful to try and debug what is going wrong during / after an outage.
What is it?#
A dashboard is a set of pre-defined graphs in a particular layout that provide an overview of a system. In our case, they provide an overview of the operational metrics of the components that make up mybinder.org.
Where is it?#
Our dashboards are at grafana.mybinder.org. It is public for everyone to view - but to edit, you need an admin password that is private. Open an issue in jupyterhub/mybinder.org-deploy if you want write access to the dashboards.
You can click the button to the right of the Grafana logo in the top left, and it will open a drop-down menu of dashboards for the mybinder.org deployment.
Modifying dashboards#
Each dashboard is edited directly from the user interface (if you have
access to edit it). You can click on any graph and select the Edit
option
to see what queries make up the dashboard, and how you can edit it.
All the dashboard definitions are stored in an sqlite
database on a
disk attached to the running grafana instance.
The Grafana documentation has more info on the various concepts in Grafana, and how you can use them. You can also create a new dashboard and play with it. Be careful before editing currently used dashboards!
Installation & Configuration#
Grafana is installed with the Grafana helm chart.
You can see the options for configuring it documented in its
values.yaml
file. You can also see the specific ways we have configured it
in the grafana
section of mybinder/values.yaml
, config/prod.yaml
and config/staging.yaml
.
Annotations#
Annotations are a cool feature of Grafana that lets us add arbitrary markers tagged to all graphs marking an event that has happened. For example, you can create an annotation each time a deployment happens - this puts a marker with info about the deployment on each graph, so you can easily tell if a particular deployment has caused changes in any metric. This is very useful for debugging!
We use the script in travis/post-grafana-annotation.py
to
create annotations just before each deployment. See the docstring in
the script for more details.