docs: glitchtip runbook + setup + RFC §3.1 dev amendment

Operational docs for the GlitchTip stack landed in the previous two
commits.

- dev-docs/GLITCHTIP.md: new runbook covering local dev, project
  provisioning + DSN-to-vault flow, production deploy on
  monitoring.hausdesign.nl (DNS, DirectAdmin Let's Encrypt, Apache
  reverse proxy with WS upgrade), backup install + restore drill,
  smoke tests, troubleshooting.
- dev-docs/SETUP.md: services table now includes GlitchTip; new
  docker/glitchtip/.env subsection points at the runbook.
- dev-docs/RFC-WS-7-OBSERVABILITY.md §3.1: amended to record that the
  same compose file drives local dev (Mailpit at bm_mailpit:1025), so
  prod and dev cannot drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-06 08:15:27 +02:00
parent 5f6fc075ed
commit 932788c643
3 changed files with 300 additions and 1 deletions

283
dev-docs/GLITCHTIP.md Normal file
View File

@@ -0,0 +1,283 @@
# GlitchTip — operations runbook
Self-hosted error tracking for Crewli. GlitchTip implements the Sentry
event protocol; the official Sentry SDKs (`sentry-laravel`, `@sentry/vue`,
`@sentry/cli`) work against it without modification.
Reference: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md).
This file documents how to run the stack — locally and on the production
monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs
provisioned via the steps below.
---
## 1. Overview
| Service | Image | Role |
|---------|-------|------|
| `glitchtip-web` | `glitchtip/glitchtip:6.1.6` | Django web UI + ingest API |
| `glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery worker + beat (event processing, alerts, partition maintenance) |
| `glitchtip-postgres` | `postgres:16-alpine` | Primary datastore |
| `glitchtip-redis` | `valkey/valkey:7-alpine` | Celery broker + cache |
The same `docker-compose.glitchtip.yml` runs both locally (merged with
`docker-compose.yml`) and on the production host (standalone). Container
names are identical in both environments to avoid configuration drift.
---
## 2. Local development
```bash
# Once
cp docker/glitchtip/.env.example docker/glitchtip/.env
# Boot the full stack (MySQL, Redis, Mailpit, GlitchTip)
make services
# First boot takes ~60s while migrations run. Tail progress:
make services-glitchtip-status
```
Web UI: <http://localhost:8200>. Outbound mail goes to Mailpit
(`http://localhost:8025`).
Create the first admin user:
```bash
docker exec -it glitchtip-web ./manage.py createsuperuser
```
Stop the stack with `make services-stop`. Volumes (`glitchtip_postgres_data`,
`glitchtip_redis_data`, `glitchtip_uploads`) survive a stop. Wipe with
`docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v`
**never on production**.
---
## 3. Project provisioning
Once the web UI is reachable and the superuser exists:
1. Sign in at `/`.
2. Create an Organization called **Crewli**.
3. Create two projects:
- **`crewli-api`** — platform: Python / Django, alert rules: default.
- **`crewli-app`** — platform: JavaScript / Vue, alert rules: default.
4. For each project, copy the auto-generated DSN from
*Settings → Client Keys (DSN)*.
5. Store both DSNs in 1Password under `Crewli / GlitchTip / DSNs`:
- `SENTRY_DSN_BACKEND``crewli-api` DSN
- `SENTRY_DSN_FRONTEND``crewli-app` DSN
PR-2 wires `SENTRY_DSN_BACKEND` into `api/.env.example`; PR-3 wires
`SENTRY_DSN_FRONTEND` into `apps/app/.env.example`. Empty DSN = SDK no-op
(verified for both `sentry-laravel` and `@sentry/vue`), so dev environments
without a DSN are silent.
---
## 4. Production deployment
GlitchTip runs on a separate host (`monitoring.hausdesign.nl`) and is **not**
deployed via the Crewli `deploy.sh` pipeline.
### 4.1 Prerequisites
- Docker + Docker Compose v2 on the monitoring host.
- DirectAdmin with the Let's Encrypt module enabled.
- DNS A-record `monitoring.hausdesign.nl` pointing at the host IP.
### 4.2 Place the stack
```bash
sudo install -d -o crewli -g crewli /opt/glitchtip
sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip
# Copy compose file + env example to the host (e.g. via scp or git checkout).
# /opt/glitchtip/docker-compose.glitchtip.yml
# /opt/glitchtip/docker/glitchtip/.env.example
```
### 4.3 Configure `.env`
```bash
cd /opt/glitchtip
cp docker/glitchtip/.env.example docker/glitchtip/.env
chmod 0600 docker/glitchtip/.env
```
Fill in the production values (header of `.env.example` lists the
checklist):
```env
SECRET_KEY=<python -c "import secrets; print(secrets.token_urlsafe(50))">
DATABASE_URL=postgres://postgres:<STRONG>@glitchtip-postgres:5432/glitchtip
POSTGRES_PASSWORD=<STRONG> # MUST match the password in DATABASE_URL
GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl
DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl
EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT
```
Source the `<STRONG>` password from the 1Password vault.
### 4.4 DNS + TLS
1. Create the A-record for `monitoring.hausdesign.nl` in DNS.
2. In DirectAdmin: add the subdomain, then enable Let's Encrypt
(Domain Setup → SSL Certificates → "Free & automatic certificate from
Let's Encrypt"). Wait for the cert to issue.
### 4.5 Apache reverse proxy
DirectAdmin generates the vhost. Add a custom config (DirectAdmin →
Custom HTTPD Configurations) for the `monitoring.hausdesign.nl` HTTPS
vhost:
```apache
ProxyPreserveHost On
ProxyRequests Off
ProxyPass / http://127.0.0.1:8200/
ProxyPassReverse / http://127.0.0.1:8200/
# WebSocket upgrade — GlitchTip uses WS for live event streaming.
RewriteEngine On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L]
```
Reload Apache.
### 4.6 First boot
```bash
cd /opt/glitchtip
docker compose -f docker-compose.glitchtip.yml up -d
# Wait for healthchecks (~60s).
docker compose -f docker-compose.glitchtip.yml ps
# Create the admin user.
docker exec -it glitchtip-web ./manage.py createsuperuser
```
Open <https://monitoring.hausdesign.nl>, sign in, and **enable 2FA** on
the account immediately (acceptance criterion 1). Profile → Security →
Two-Factor Authentication.
Then provision the two projects (§3) and capture DSNs into 1Password.
---
## 5. Backup & restore
### 5.1 Daily backup
`scripts/glitchtip-backup.sh` runs `pg_dump --format=custom`, streams it
through gzip, writes to `./backups/glitchtip/glitchtip-<ts>.dump.gz` with
`0600` permissions, and prunes dumps older than 30 days.
Install the cron entry on the production host:
```cron
# /etc/cron.d/glitchtip-backup
0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1
```
(Replace `/opt/crewli` with wherever the Crewli repo checkout lives on
the monitoring host. The script is portable — only the `docker exec`
target container needs to exist.)
The script exits non-zero on dump failure so cron's `MAILTO` catches
silent regressions.
### 5.2 Restore drill
```bash
# Pick the dump to restore from.
DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz
# Stream the restore into the postgres container.
gunzip < "$DUMP" \
| docker exec -i glitchtip-postgres pg_restore \
-U postgres -d glitchtip --clean --if-exists
```
`--clean --if-exists` drops existing objects before recreating them, so
the database ends up exactly as it was at dump time. Run after a
`docker compose stop glitchtip-web glitchtip-worker` to avoid concurrent
writes during the restore.
Bert should drill the restore at least once after the production stack
is live (acceptance criterion 11).
---
## 6. Monitoring the monitor
Quick smoke tests:
```bash
# API responds with JSON (not 502).
curl -sS http://localhost:8200/api/0/
# Worker reporting in (look for "celery@... ready").
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \
logs --tail=50 glitchtip-worker
# All services healthy.
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps
```
In production, replace `localhost:8200` with `https://monitoring.hausdesign.nl`.
Email-alerting is configured in PR-4; until then alerts surface only in
the GlitchTip web UI (Issues view).
---
## 7. Troubleshooting
### Web container unhealthy on first boot
Migrations take ~60s on a fresh volume. The healthcheck `start_period`
is set accordingly. If the container is still unhealthy after two
minutes, tail logs:
```bash
docker logs glitchtip-web
```
Most common cause: `DATABASE_URL` password ≠ `POSTGRES_PASSWORD`. The
postgres container creates the user with the password it sees, GlitchTip
authenticates with the password embedded in the URL — they MUST match.
### Worker idle / events stuck in queue
Check that `REDIS_URL` resolves and the worker is connected:
```bash
docker logs glitchtip-worker | grep -E "ready|connected|error"
```
### Volume permission errors on Linux hosts
`postgres:16-alpine` runs as UID 70 internally. If `/var/lib/postgresql/data`
is bind-mounted from the host with mismatched ownership, postgres refuses
to start. The default named volume avoids this — only relevant if you
later switch to a host bind-mount.
### Right-to-erasure (Art. 17)
Currently manual. Locate events for a user ULID via the web UI search,
delete via the UI or directly on the postgres container. An automated
erasure script is on the BACKLOG (per RFC §4).
---
## 8. References
- RFC: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md)
- GlitchTip docs: <https://glitchtip.com/documentation>
- GlitchTip self-hosting: <https://glitchtip.com/documentation/install>

View File

@@ -30,6 +30,8 @@ Twee afwijkingen van charter §3 besluit 8, beide bewust:
Self-hosted GlitchTip op productie VPS via Docker Compose (`glitchtip-web`, `glitchtip-worker`, `glitchtip-postgres`, `glitchtip-redis`). Reverse proxy via DirectAdmin Apache; SSL via DirectAdmin Let's Encrypt op `monitoring.hausdesign.nl` (consistent met bestaande subdomain-pattern). Self-hosted GlitchTip op productie VPS via Docker Compose (`glitchtip-web`, `glitchtip-worker`, `glitchtip-postgres`, `glitchtip-redis`). Reverse proxy via DirectAdmin Apache; SSL via DirectAdmin Let's Encrypt op `monitoring.hausdesign.nl` (consistent met bestaande subdomain-pattern).
**Lokale ontwikkeling:** dezelfde `docker-compose.glitchtip.yml` draait lokaal als `make services` (gecombineerd met de bestaande `docker-compose.yml` via `-f`). Web-UI op `http://localhost:8200`, e-mail naar Mailpit op `bm_mailpit:1025`. Dev-stack en prod-stack delen één compose-file zodat configuratie-drift uitgesloten is.
### 3.2 Twee projecten / DSNs ### 3.2 Twee projecten / DSNs
- `crewli-api` — Laravel - `crewli-api` — Laravel

View File

@@ -70,11 +70,18 @@ Three terminal tabs, plus an optional fourth for the queue worker:
| Terminal | Command | Where it runs | Port | | Terminal | Command | Where it runs | Port |
|----------|---------|---------------|------| |----------|---------|---------------|------|
| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit) | | 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit), 8200 (GlitchTip) |
| 2. API | `make api` (from repo root) | Laravel dev server | 8000 | | 2. API | `make api` (from repo root) | Laravel dev server | 8000 |
| 3. SPA | `make app` (from repo root) | Vite dev server | 5174 | | 3. SPA | `make app` (from repo root) | Vite dev server | 5174 |
| 4. Queue worker (optional) | `cd api && php artisan queue:listen redis --queue=emails` | Local PHP | n/a | | 4. Queue worker (optional) | `cd api && php artisan queue:listen redis --queue=emails` | Local PHP | n/a |
Web UIs available once `make services` is up:
| Service | URL |
|---------|-----|
| Mailpit | <http://localhost:8025> |
| GlitchTip | <http://localhost:8200> (admin UI; first boot ~60s while migrations run) |
The queue worker is only needed when you're triggering email flows (registration, password reset, email change, invitations). Routine UI work doesn't require it. The queue worker is only needed when you're triggering email flows (registration, password reset, email change, invitations). Routine UI work doesn't require it.
Stop services when done: `make services-stop`. Stop services when done: `make services-stop`.
@@ -116,6 +123,13 @@ VITE_APP_NAME="Crewli"
For production: `VITE_API_URL=https://api.crewli.app`. For production: `VITE_API_URL=https://api.crewli.app`.
### `docker/glitchtip/.env`
Generated by copying `docker/glitchtip/.env.example`. Dev defaults are
functional out of the box — no edits needed for `make services`. See
[`GLITCHTIP.md`](./GLITCHTIP.md) for first-boot steps (creating the
superuser, creating the two projects, copying DSNs to 1Password).
## Common tasks ## Common tasks
### Run tests ### Run tests