logo
  • Home
  • Getting started
  • Deployment and Operation
  • Operations Guide
  • Components
  • Analytics
  • Incident reporting
On this page
  • Principles and guidelines for incident reporting
  • Example template for incident report
  • Incident history

Incident reporting¶

This page contains information and guidelines for how the Binder team handles incidents and incident reports. Remember, incidents are opportunities to learn!

Principles and guidelines for incident reporting¶

  • Inspiration for our guidelines: Google SRE guide, Managing Incidents.

  • Team management and takeaways from incidents: Etsy Debriefing Facilitation Guide.

Example template for incident report¶

  • Example template for incident report

Incident history¶

(in reverse chronological order)

  • 2020-07-09, Simultaneous launches (aka, SciPy gives Binder a lot of hugs at the same time)
  • 2019-04-03, 30min outage during node pool upgrade
  • 2019-03-24, repo2docker upgrade and docker image cache wipe
  • incident date: 2019-02-20, kubectl logs unavailable
  • 2018-07-30 JupyterLab builds saturate BinderHub CPU
  • 2018-07-08, too many pods
  • 2018-04-18, Culler flood
  • 2018-03-31, Server launch failures
  • 2018-03-26, “no space left on device”
  • 2018-03-13, PVC for hub is locked
  • 2018-02-22 NGINX crash
  • 2018-02-20, JupyterLab Announcement swamps Binder
  • 2018-02-12, Hub Launch Fail
  • 2018-01-18, reddit hugs mybinder
  • 2018-01-11, Warning from letsencrypt about outdated SSL certificate
  • 2018-01-17, Emergency Aardvark bump
  • 2018-01-04, Failed deploy to staging
  • 2017-11-30 4:23PM PST, OOM (Out of Memory) Proxy
  • 2017-10-17, Cluster Full
  • 2017-09-29, 504
  • 2017-09-27, Hub 403
  • Template for reports
  • {{ incident date: yyyy-mm-dd }}, {{ incident name }}
Cloud Costs Data 2020-07-09, Simultaneous launches (aka, SciPy gives Binder a lot of hugs at the same time)

© Copyright 2017 - 2020, Binder Team.
Created using Sphinx 3.3.1.