docs: glitchtip runbook + setup + RFC §3.1 dev amendment

Operational docs for the GlitchTip stack landed in the previous two
commits.

- dev-docs/GLITCHTIP.md: new runbook covering local dev, project
  provisioning + DSN-to-vault flow, production deploy on
  monitoring.hausdesign.nl (DNS, DirectAdmin Let's Encrypt, Apache
  reverse proxy with WS upgrade), backup install + restore drill,
  smoke tests, troubleshooting.
- dev-docs/SETUP.md: services table now includes GlitchTip; new
  docker/glitchtip/.env subsection points at the runbook.
- dev-docs/RFC-WS-7-OBSERVABILITY.md §3.1: amended to record that the
  same compose file drives local dev (Mailpit at bm_mailpit:1025), so
  prod and dev cannot drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-06 08:15:27 +02:00
parent 5f6fc075ed
commit 932788c643
3 changed files with 300 additions and 1 deletions

283
dev-docs/GLITCHTIP.md Normal file
View File

@@ -0,0 +1,283 @@
# GlitchTip — operations runbook
Self-hosted error tracking for Crewli. GlitchTip implements the Sentry
event protocol; the official Sentry SDKs (`sentry-laravel`, `@sentry/vue`,
`@sentry/cli`) work against it without modification.
Reference: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md).
This file documents how to run the stack — locally and on the production
monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs
provisioned via the steps below.
---
## 1. Overview
| Service | Image | Role |
|---------|-------|------|
| `glitchtip-web` | `glitchtip/glitchtip:6.1.6` | Django web UI + ingest API |
| `glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery worker + beat (event processing, alerts, partition maintenance) |
| `glitchtip-postgres` | `postgres:16-alpine` | Primary datastore |
| `glitchtip-redis` | `valkey/valkey:7-alpine` | Celery broker + cache |
The same `docker-compose.glitchtip.yml` runs both locally (merged with
`docker-compose.yml`) and on the production host (standalone). Container
names are identical in both environments to avoid configuration drift.
---
## 2. Local development
```bash
# Once
cp docker/glitchtip/.env.example docker/glitchtip/.env
# Boot the full stack (MySQL, Redis, Mailpit, GlitchTip)
make services
# First boot takes ~60s while migrations run. Tail progress:
make services-glitchtip-status
```
Web UI: <http://localhost:8200>. Outbound mail goes to Mailpit
(`http://localhost:8025`).
Create the first admin user:
```bash
docker exec -it glitchtip-web ./manage.py createsuperuser
```
Stop the stack with `make services-stop`. Volumes (`glitchtip_postgres_data`,
`glitchtip_redis_data`, `glitchtip_uploads`) survive a stop. Wipe with
`docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v`
**never on production**.
---
## 3. Project provisioning
Once the web UI is reachable and the superuser exists:
1. Sign in at `/`.
2. Create an Organization called **Crewli**.
3. Create two projects:
- **`crewli-api`** — platform: Python / Django, alert rules: default.
- **`crewli-app`** — platform: JavaScript / Vue, alert rules: default.
4. For each project, copy the auto-generated DSN from
*Settings → Client Keys (DSN)*.
5. Store both DSNs in 1Password under `Crewli / GlitchTip / DSNs`:
- `SENTRY_DSN_BACKEND``crewli-api` DSN
- `SENTRY_DSN_FRONTEND``crewli-app` DSN
PR-2 wires `SENTRY_DSN_BACKEND` into `api/.env.example`; PR-3 wires
`SENTRY_DSN_FRONTEND` into `apps/app/.env.example`. Empty DSN = SDK no-op
(verified for both `sentry-laravel` and `@sentry/vue`), so dev environments
without a DSN are silent.
---
## 4. Production deployment
GlitchTip runs on a separate host (`monitoring.hausdesign.nl`) and is **not**
deployed via the Crewli `deploy.sh` pipeline.
### 4.1 Prerequisites
- Docker + Docker Compose v2 on the monitoring host.
- DirectAdmin with the Let's Encrypt module enabled.
- DNS A-record `monitoring.hausdesign.nl` pointing at the host IP.
### 4.2 Place the stack
```bash
sudo install -d -o crewli -g crewli /opt/glitchtip
sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip
# Copy compose file + env example to the host (e.g. via scp or git checkout).
# /opt/glitchtip/docker-compose.glitchtip.yml
# /opt/glitchtip/docker/glitchtip/.env.example
```
### 4.3 Configure `.env`
```bash
cd /opt/glitchtip
cp docker/glitchtip/.env.example docker/glitchtip/.env
chmod 0600 docker/glitchtip/.env
```
Fill in the production values (header of `.env.example` lists the
checklist):
```env
SECRET_KEY=<python -c "import secrets; print(secrets.token_urlsafe(50))">
DATABASE_URL=postgres://postgres:<STRONG>@glitchtip-postgres:5432/glitchtip
POSTGRES_PASSWORD=<STRONG> # MUST match the password in DATABASE_URL
GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl
DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl
EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT
```
Source the `<STRONG>` password from the 1Password vault.
### 4.4 DNS + TLS
1. Create the A-record for `monitoring.hausdesign.nl` in DNS.
2. In DirectAdmin: add the subdomain, then enable Let's Encrypt
(Domain Setup → SSL Certificates → "Free & automatic certificate from
Let's Encrypt"). Wait for the cert to issue.
### 4.5 Apache reverse proxy
DirectAdmin generates the vhost. Add a custom config (DirectAdmin →
Custom HTTPD Configurations) for the `monitoring.hausdesign.nl` HTTPS
vhost:
```apache
ProxyPreserveHost On
ProxyRequests Off
ProxyPass / http://127.0.0.1:8200/
ProxyPassReverse / http://127.0.0.1:8200/
# WebSocket upgrade — GlitchTip uses WS for live event streaming.
RewriteEngine On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L]
```
Reload Apache.
### 4.6 First boot
```bash
cd /opt/glitchtip
docker compose -f docker-compose.glitchtip.yml up -d
# Wait for healthchecks (~60s).
docker compose -f docker-compose.glitchtip.yml ps
# Create the admin user.
docker exec -it glitchtip-web ./manage.py createsuperuser
```
Open <https://monitoring.hausdesign.nl>, sign in, and **enable 2FA** on
the account immediately (acceptance criterion 1). Profile → Security →
Two-Factor Authentication.
Then provision the two projects (§3) and capture DSNs into 1Password.
---
## 5. Backup & restore
### 5.1 Daily backup
`scripts/glitchtip-backup.sh` runs `pg_dump --format=custom`, streams it
through gzip, writes to `./backups/glitchtip/glitchtip-<ts>.dump.gz` with
`0600` permissions, and prunes dumps older than 30 days.
Install the cron entry on the production host:
```cron
# /etc/cron.d/glitchtip-backup
0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1
```
(Replace `/opt/crewli` with wherever the Crewli repo checkout lives on
the monitoring host. The script is portable — only the `docker exec`
target container needs to exist.)
The script exits non-zero on dump failure so cron's `MAILTO` catches
silent regressions.
### 5.2 Restore drill
```bash
# Pick the dump to restore from.
DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz
# Stream the restore into the postgres container.
gunzip < "$DUMP" \
| docker exec -i glitchtip-postgres pg_restore \
-U postgres -d glitchtip --clean --if-exists
```
`--clean --if-exists` drops existing objects before recreating them, so
the database ends up exactly as it was at dump time. Run after a
`docker compose stop glitchtip-web glitchtip-worker` to avoid concurrent
writes during the restore.
Bert should drill the restore at least once after the production stack
is live (acceptance criterion 11).
---
## 6. Monitoring the monitor
Quick smoke tests:
```bash
# API responds with JSON (not 502).
curl -sS http://localhost:8200/api/0/
# Worker reporting in (look for "celery@... ready").
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \
logs --tail=50 glitchtip-worker
# All services healthy.
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps
```
In production, replace `localhost:8200` with `https://monitoring.hausdesign.nl`.
Email-alerting is configured in PR-4; until then alerts surface only in
the GlitchTip web UI (Issues view).
---
## 7. Troubleshooting
### Web container unhealthy on first boot
Migrations take ~60s on a fresh volume. The healthcheck `start_period`
is set accordingly. If the container is still unhealthy after two
minutes, tail logs:
```bash
docker logs glitchtip-web
```
Most common cause: `DATABASE_URL` password ≠ `POSTGRES_PASSWORD`. The
postgres container creates the user with the password it sees, GlitchTip
authenticates with the password embedded in the URL — they MUST match.
### Worker idle / events stuck in queue
Check that `REDIS_URL` resolves and the worker is connected:
```bash
docker logs glitchtip-worker | grep -E "ready|connected|error"
```
### Volume permission errors on Linux hosts
`postgres:16-alpine` runs as UID 70 internally. If `/var/lib/postgresql/data`
is bind-mounted from the host with mismatched ownership, postgres refuses
to start. The default named volume avoids this — only relevant if you
later switch to a host bind-mount.
### Right-to-erasure (Art. 17)
Currently manual. Locate events for a user ULID via the web UI search,
delete via the UI or directly on the postgres container. An automated
erasure script is on the BACKLOG (per RFC §4).
---
## 8. References
- RFC: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md)
- GlitchTip docs: <https://glitchtip.com/documentation>
- GlitchTip self-hosting: <https://glitchtip.com/documentation/install>

View File

@@ -30,6 +30,8 @@ Twee afwijkingen van charter §3 besluit 8, beide bewust:
Self-hosted GlitchTip op productie VPS via Docker Compose (`glitchtip-web`, `glitchtip-worker`, `glitchtip-postgres`, `glitchtip-redis`). Reverse proxy via DirectAdmin Apache; SSL via DirectAdmin Let's Encrypt op `monitoring.hausdesign.nl` (consistent met bestaande subdomain-pattern).
**Lokale ontwikkeling:** dezelfde `docker-compose.glitchtip.yml` draait lokaal als `make services` (gecombineerd met de bestaande `docker-compose.yml` via `-f`). Web-UI op `http://localhost:8200`, e-mail naar Mailpit op `bm_mailpit:1025`. Dev-stack en prod-stack delen één compose-file zodat configuratie-drift uitgesloten is.
### 3.2 Twee projecten / DSNs
- `crewli-api` — Laravel

View File

@@ -70,11 +70,18 @@ Three terminal tabs, plus an optional fourth for the queue worker:
| Terminal | Command | Where it runs | Port |
|----------|---------|---------------|------|
| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit) |
| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit), 8200 (GlitchTip) |
| 2. API | `make api` (from repo root) | Laravel dev server | 8000 |
| 3. SPA | `make app` (from repo root) | Vite dev server | 5174 |
| 4. Queue worker (optional) | `cd api && php artisan queue:listen redis --queue=emails` | Local PHP | n/a |
Web UIs available once `make services` is up:
| Service | URL |
|---------|-----|
| Mailpit | <http://localhost:8025> |
| GlitchTip | <http://localhost:8200> (admin UI; first boot ~60s while migrations run) |
The queue worker is only needed when you're triggering email flows (registration, password reset, email change, invitations). Routine UI work doesn't require it.
Stop services when done: `make services-stop`.
@@ -116,6 +123,13 @@ VITE_APP_NAME="Crewli"
For production: `VITE_API_URL=https://api.crewli.app`.
### `docker/glitchtip/.env`
Generated by copying `docker/glitchtip/.env.example`. Dev defaults are
functional out of the box — no edits needed for `make services`. See
[`GLITCHTIP.md`](./GLITCHTIP.md) for first-boot steps (creating the
superuser, creating the two projects, copying DSNs to 1Password).
## Common tasks
### Run tests