# Metrics collection with Prometheus We collect operational metrics about all the components of mybinder.org and create dashboards from them. This document details the components involved in collecting, storing and querying the metrics. This is only for operational metrics - **not** for analytics on repositories built or traffic. ## Metrics Storage + Querying We use [Prometheus](https://prometheus.io/) to store and query our metrics. ### What is Prometheus? Prometheus is a [Time Series Database](https://en.wikipedia.org/wiki/Time_series_database) optimized for storing operational metrics. It stores all data as streams of timestamped values belonging to the same **metric** and the same set of **labels**. The **metric name** specifies the general feature of a system that is measured (e.g. `http_requests_total` - the total number of HTTP requests received). A set of labels for the same metric name identifies a particular dimensional instantiation of that metric (for example: all HTTP requests that used the method `POST` to the `/api/tracks` handler would be represented as the time series `http_requests_total{method="POST", handler="/api/tracks"}`). The prometheus documentation has more information on its [data model](https://prometheus.io/docs/concepts/data_model/) and the different [kinds of time series](https://prometheus.io/docs/concepts/metric_types/) available. These two pages are fairly short and are highly recommended reading! ### Querying Prometheus has its own query language called [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/), optimized for time series queries. The prometheus documentation has fairly clear and thorough documentation on PromQL - [basics](https://prometheus.io/docs/prometheus/latest/querying/basics/), [operators](https://prometheus.io/docs/prometheus/latest/querying/operators/) and [functions](https://prometheus.io/docs/prometheus/latest/querying/functions/). You do not need to become an expert, but a basic understanding is useful. There are also [examples](https://prometheus.io/docs/prometheus/latest/querying/examples/) to pick up and play with! [prometheus.mybinder.org](https://prometheus.mybinder.org/graph) is our public prometheus installation, and you can practice your queries there! ### Metrics Ingestion Prometheus uses a **pull** model for metrics. It has a list of targets, and constantly polls them for their current state, and records what it gets back. The targets are supposed to respond to these HTTP requests with data in the [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/). Our data is currently sourced from the following targets. #### Node information The [node_exporter](https://github.com/prometheus/node_exporter) exports information about each node we run - CPU usage, memory left, disk space, etc. It provides fairly detailed info, usually prefixed with `node_`. This is not kubernetes specific. #### Kubernetes information [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) exposes information about the kubernetes cluster - such as number of pods and the states they are in, number of nodes, etc. These are usually prefixed with `kube_`. These only contain information from kubernetes API itself. For example, 'how much RAM are these containers using' is not recorded by `kube-state-metrics`, since that is not information that is available to the Kubernetes API. 'how much RAM are these pods requesting' is, however, available. #### Container information [cadvisor](https://github.com/google/cadvisor) provides detailed runtime information about all the containers running in the cluster. This is information mostly not available from `kube-state-metrics` - such as 'how much RAM are these containers using right now', etc. These are usually prefixed with `container_`. #### HTTP request information We use the [nginx-ingress helm chart](https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx) to let all HTTP traffic into our cluster. This allows us to use the [nginx VTS exporter](https://hnlq715.github.io/nginx-vts-exporter/) to collect information in prometheus about requests / responses. These metrics are prefixed with `nginx_`. #### BinderHub information [BinderHub](https://github.com/jupyterhub/binderhub) itself exposes metrics about its operations in the prometheus format, using the [python prometheus client library](https://github.com/prometheus/client_python). These are currently somewhat limited, and prefixed with `binderhub_` ### Configuration Prometheus is installed using the [prometheus helm chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus). This installs the following components: 1. Prometheus server (storage + querying) 2. `node_exporter` on every node 3. A `kube-state-metrics` instance `cadvisor` is already present on all nodes (it ships with the `kubelet` kubernetes component), and the prometheus helm chart has configuration that adds those as targets. You can see the available options for configuring the prometheus helm chart in its [values.yaml](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus/values.yaml) file. You can see the current configuration we have under the `prometheus` section of `mybinder/values.yaml`, `config/prod.yaml` and `config/staging.yaml`.