# Deploy a new mybinder.org federation member on a bare VM with `k3s`

[k3s](https://k3s.io/) is a popular kubernetes distribution that we can use
to build _single node_ kubernetes installations that satisfy the needs of the
mybinder project. By focusing on the simplest possible kubernetes installation,
we can get all the benefits of kubernetes (simplified deployment, cloud agnosticity,
unified tooling, etc) **except** autoscaling, and deploy **anywhere we can get a VM
with root access**. This is vastly simpler than managing an autoscaling kubernetes
cluster, and allows expansion of the mybinder federation in ways that would otherwise
be more difficult.

## VM requirements

The k3s project publishes [their requirements](https://docs.k3s.io/installation/requirements?),
but we have a slightly more opinionated list.

1. We must have full `root` access.
2. Runs latest Ubuntu LTS (currently 24.04). Debian is acceptable.
3. Direct internet access, inbound (public IP) and outbound.
4. "As big as possible", as we will be using all the capacity of this one VM
5. Ability to grant same access to the VM to all the operators of the mybinder federation.

## VM configuration

1. Allow clock synchronization based on [Network Time Protocol (NTP)](https://en.wikipedia.org/wiki/Network_Time_Protocol).

   The VM provider might have its own NTP server and enforce the use of it.

### Node setup on OVH

We have [OpenTofu](http://opentofu.org) configuration for deploying a new registry on OVH.
The cheapest way to deploy a node on OVH is via [VPS](https://www.ovhcloud.com/en/vps/).
A VPS-6 (24 core, 92GB RAM) with backups and an extra disk costs $90/month, whereas a _smaller_ b3-64 (16 core, 64GB) costs over $300.

Because we deploy harbor ourselves in the helm chart, tofu needs to be split in steps.

Steps:

1. setup k3s on the VM (steps below)
2. create a secret file like `secrets/ovh-creds.sh` with credentials for the OVH API
3. create an s3 bucket for terraform state in the OVH project
4. create an s3 user with access to the bucket
5. create a `.tfvars` file like `bids-ovh.tfvars` with the variables for the deployment.
   `service_name` is the UUID of the cloud project.
6. set `TF_CLI_ARGS=-var-file=my-file.tfvars`

Now you're ready to start deploying to OVH.
It's a little tricky because we can't deploy all at once, we have to:

1. deploy the s3 bucket for the registry:
   ```
   tofu apply -target=ovh_cloud_project_user_s3_policy.harbor
   ```
2. configure harbor s3 secrets in `secrets/config/${name}.yaml` from
   ```
   tofu output registry_s3
   ```
3. deploy via helm (`CI=1 python3 deploy.py ${name}`). (This is safe to do for `KUBECONFIG` clusters).
4. finally complete the terraform deployment configuring harbor with Tofu:
   ```
   tofu apply
   ```
5. Add registry account secrets into `secrets/config/${name}.yaml` from
   ```
   tofu output -show-sensitive
   ```

```{todo}
we should separate harbor to its own tofu deployment.
This is done on hetzner, which doesn't use tofu to create the s3 buckets themselves,
they are created manually, and _only_ harbor is deployed with s3.
```

### Attaching a disk

If the VM has an additional disk for dind, it needs to be partitioned and mounted, [following this guide](https://help.ovhcloud.com/csm/en-gb-vps-config-additional-disk?id=kb_article_view&sysparm_article=KB0047555).
We made only the following changes:

- use `mkfs.xfs` instead of `mkfs.ext4`

This disk is where dind state should live, so set:

```yaml
binderhub:
  dind:
    hostLibDir: /mnt/disk/dind
```

to put dind state on the external disk.

## Create a new ssh key for mybinder team members

For easy access to this node for mybinder team members, we create and check-in an ssh key as
a secret.

1. Run `ssh-keygen -t ed25519 -f secrets/<cluster-name>.key` to create the ssh key. Leave the passphrase blank.
2. Set appropriate permissions with `chmod 0400 secrets/<cluster-name>.key`.
3. Copy `secrets/<cluster-name>.key.pub` (**NOTE THE .pub**) and paste it as a **new line** in `/root/.ssh/authorized_keys` on your server. Do not replace any existing lines in this file.

## Increase some fs limits

To avoid errors like

> failed to create fsnotify watcher: too many open files

Increase the fs.inotify limits:

```bash
sudo sysctl -w fs.inotify.max_user_instances=8192
sudo sysctl -w fs.inotify.max_user_watches=524288
```

## Setup DNS entries

There's only one IP to set DNS entries for - the public IP of the VM. No loadbalancers or similar here.

mybinder.org's DNS is managed via Cloudflare. You should have access, or ask someone in the mybinder team who does!

Add the following entries:

- An `A` record for `X.mybinder.org` pointing to wards the public IP. `X` should be an organizational identifier that identifies and thanks whoever is donating this.
- Another `A` record for `*.X.mybinder.org` to the same public IP

Give this a few minutes because it may take a while to propagate.

## Installing `k3s`

We can use the [quickstart](https://docs.k3s.io/quick-start) on the `k3s` website, with the added
config of _disabling traefik_ that comes built in. We deploy an ingress controller as part of our deployment,
so we do not need the managed traefik.

1. Create a Kubelet Config file in `/var/lib/rancher/k3s/agent/etc/kubelet.conf.d/99-kubelet.conf` so we can
   tweak various kubelet options, including maximum number of pods on a single node and when to cleanup unused images:

   ```yaml
   apiVersion: kubelet.config.k8s.io/v1beta1
   kind: KubeletConfiguration
   maxPods: 300
   # Clean up images pulled by kubernetes anytime we are over
   # 40% disk usage until we hit 20%
   imageGCHighThresholdPercent: 40
   imageGCLowThresholdPercent: 20
   ```

   We will need to develop better intuition for how many pods per node, but given we offer about
   450M of RAM per user, and RAM is the limiting factor (not CPU), let's roughly start with the
   following formula to determine this:

   maxPods = 1.75 \* amount of ram in GB

   This adds a good amount of margin. We can tweak this later

2. disable traefik (because we deploy the ingress controller as part of our chart):

   ```bash
   mkdir -p /var/lib/rancher/k3s/server/manifests
   touch /var/lib/rancher/k3s/server/manifests/traefik.yaml.skip
   ```

3. Install `k3s`!

   ```bash
   curl -sfL https://get.k3s.io | sh -s -
   ```

   This runs for a minute, but should set up latest `k3s` on that node! You can verify that by running
   `kubectl get node` and `kubectl version`.

## Extracting authentication information via a `KUBECONFIG` file

Next, we extract the `KUBECONFIG` file that the `mybinder.org-deploy` repo and team members can use to access
this cluster externally by following [upstream documentation](https://docs.k3s.io/cluster-access#accessing-the-cluster-from-outside-with-kubectl).

We have a script for this in `scripts/fetch_k3s_kubeconfig.py`.
If DNS is setup and we have a `config/{cluster_name}.yaml` with at least:

```yaml
binderhub:
  ingress:
    hosts:
      - some-hostname
```

the script should run:

```
python3 scripts/fetch_k3s_clusters.py CLUSTER_NAME
```

What this script does:

1. Copy the `/etc/rancher/k3s/k3s.yaml` into the `secrets/` directory in this repo:

   ```bash
   scp root@<public-ip>:/etc/rancher/k3s/k3s.yaml secrets/<cluster-name>-kubeconfig.yaml
   ```

   Pick a `<cluster-name>` that describes what cluster this is - we will be consistently using it for other files too.

2. Change the `server` field under `clusters.0.cluster` from `https://127.0.0.1:6443` to `https://<public-ip>:6443`.

3. Find-replace `default` to the cluster name, and add `namespace: CLUSTERNAME` to the default context, e.g. changing

   ```yaml
      name: default
    contexts:
    - context:
        cluster: default
        user: default
      name: default
    current-context: default
    kind: Config
    users:
    - name: default
   ```

to:

```yaml
   name: staging
 contexts:
 - context:
     cluster: staging
     namespace: staging
     user: staging
   name: staging
 current-context: staging
 kind: Config
 users:
 - name: staging
```

You should now be able to:

```
KUBECONFIG=$PWD/secrets/$name-kubeconfig.yaml kubectl get node
```

## Enable k3s auto-upgrade

k3s supports automatic upgrades.
We follow [the documented auto-upgrade setup](https://docs.k3s.io/upgrades/automated).
First, enable the automatic upgrade components:

```bash
export KUBECONFIG=$PWD/secrets/$name-kubeconfig.yml
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/crd.yaml -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
```

Next, apply our auto-upgrade configuration:

```
kubectl apply -f config/k3s/k3s-upgrade-plan.yaml
```

Now k3s should self-update every Sunday.
If there's a problem, we'll see it Monday.

## Prepare registry storage

We use [Harbor] to operate our registry, because it includes retention rules which let us use the registry as a _cache_, expiring unused images.

Our Harbor deployments use local S3-compatible storage.
We need a bucket and s3 credentials to access the bucket.
On OVH, this is handled in Tofu above.
On Hetzner, it is manual via the Console.

1. Create a bucket
2. Create S3 access credentials for the bucket.
   Ideally, these credentials should only have access to this particular bucket.
3. Configure multi-part upload expiration rules, in `config/k3s/`

For example, for the Hetzner Nuremburg datacenter:

```
# from credentials you created
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_KEY=...
# nuremburg s3 endpoint
export AWS_ENDPOINT_URL=https://nbg1.your-objectstorage.com
# create the bucket lifecycle configuration
aws s3api put-bucket-lifecycle-configuration --bucket bucket-name  --lifecycle-configuration file://config/k3s/s3-bucket-lifecycle.json
```

[Harbor]: https://goharbor.io

## Make a config + secret copy for this new member

Now we gotta start a config file and a secret config file for this new member. We can start off by copying an existing one!

Let's copy `config/hetzner-2i2c.yaml` to `config/<cluster-name>.yaml` and make changes!

1. Find all hostnames, and change them to point to the DNS entries you made in the previous step.
2. Adjust the following parameters based on the size of the server:
   a. `binderhub.config.LaunchQuota.total_quota`
   b. `dind.resources`
   c. `imageCleaner`
3. Configure Harbor registry storage in `config/cluster.yaml`, under `harbor.persistence.imageChartStorage.s3`:
   a. `bucket` (bucket name)
   b. `regionendpoint` (provider endpoint)
   c. `rootdirectory` (usually `/harbor`)
   d. `region` (depends on provider, not usually necessary)

We also need a secrets file, so let's copy `secrets/config/hetzner-2i2c.yaml` to `secrets/config/<cluster-name>.yaml` and make changes!

1. Find all hostnames, and change them to point to the DNS entries you made in the previous step.
2. add your s3 credentials to:
   a. `harbor.persistence.imageChartStorage.s3.accesskey`
   b. `harbor.persistence.imageChartStorage.s3.secretkey`
3. generate fresh random secrets:
   a. `grafana.adminPassword`
   b. `harbor.harborAdminPassword`
4. _remove_ most of Harbor's secret config, other than the above (we'll populate this later)

## Deploy binder!

Let's tell `deploy.py` script that we have a new cluster by adding `<cluster-name>` to `KUBECONFIG_CLUSTERS` variable in `deploy.py`.

Once done, you can do a deployment with `./deploy.py <cluster-name>`! If it errors out, tweak and debug until it works.

## Configure harbor registry

Harbor requires some configuration _after_ it has been deployed for the first time.

### Stabilize Harbor secrets to avoid churn

Harbor has several configured secret values,
but generates these automatically on each deploy if left unspecified,
causing restart of Harbor pods on each deploy.
To avoid that, we can retrieve and store the values generated on the first deploy.

Add these to `secrets/config/${name}.yaml`:

```bash
name=cluster_name
# gets core.secret, core.xsrfKey, core.tokenKey, core.tokenCert, registry.credentials.password
for key in secret CSRF_KEY tls.key tls.crt REGISTRY_CREDENTIAL_PASSWORD; do
  echo $key
  kubectl get secret ${name}-harbor-core -o json | jq -r ".data[\"${key}\"]" | base64 --decode
  echo
done
# jobservice.secret
kubectl get secret ${name}-harbor-jobservice -o json | jq -r .data.JOBSERVICE_SECRET | base64 --decode
# registry.secret
kubectl get secret hetzner-2i2c-harbor-registry -o json | jq -r .data.REGISTRY_HTTP_SECRET | base64 --decode
# registry.credentials.htpasswdString
kubectl get secret ${name}-harbor-registry -o json | jq -r .data.REGISTRY_HTPASSWD | base64 --decode
# registry.credentials.htpasswdString
kubectl get secret ${name}-harbor-registry-htpasswd -o json | jq -r .data.REGISTRY_HTPASSWD | base64 --decode
```

### Configure Harbor project, quota, and accounts with Tofu

We have Harbor configuration in `terraform/modules/harbor`, which are configured from `terraform/hetzner`.

If, for example, deploying a new hetzner node:

1. `cd terraform/hetzner`
2. copy `hetzner-2i2c.tfvars` to `${cluster_name}.tfvars` and edit name, endpoint, and registry_users as appropriate.
3. `source secrets/creds.sh`
4. ` export TF_CLI_ARGS="-var-file=${cluster_name}.tfvars"`
5. `tofu init`
6. `tofu apply` - check that it makes sense
7. `tofu output --show-sensitive` to see the registry credentials created
8. copy the `robot$mybinder-builds+{name}-builder` username and password to `binderhub.registry.username,password` in `secrets/config/${name}.yaml`
9. copy the `robot$mybinder-builds+{name}-user-puller` username and password to `  jupyterhub.imagePullSecret.username,password` in `secrets/config/${name}.yaml`
10. if replicating, add appropriate credentials by editing the Registry entry at
    `https://registry.{host}.mybinder.org/harbor/registries` from the target registry configuration.

## Test and validate

## Add to the redirector