diff --git a/dev-docs/GLITCHTIP.md b/dev-docs/GLITCHTIP.md new file mode 100644 index 00000000..246e97fc --- /dev/null +++ b/dev-docs/GLITCHTIP.md @@ -0,0 +1,283 @@ +# GlitchTip — operations runbook + +Self-hosted error tracking for Crewli. GlitchTip implements the Sentry +event protocol; the official Sentry SDKs (`sentry-laravel`, `@sentry/vue`, +`@sentry/cli`) work against it without modification. + +Reference: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md). + +This file documents how to run the stack — locally and on the production +monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs +provisioned via the steps below. + +--- + +## 1. Overview + +| Service | Image | Role | +|---------|-------|------| +| `glitchtip-web` | `glitchtip/glitchtip:6.1.6` | Django web UI + ingest API | +| `glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery worker + beat (event processing, alerts, partition maintenance) | +| `glitchtip-postgres` | `postgres:16-alpine` | Primary datastore | +| `glitchtip-redis` | `valkey/valkey:7-alpine` | Celery broker + cache | + +The same `docker-compose.glitchtip.yml` runs both locally (merged with +`docker-compose.yml`) and on the production host (standalone). Container +names are identical in both environments to avoid configuration drift. + +--- + +## 2. Local development + +```bash +# Once +cp docker/glitchtip/.env.example docker/glitchtip/.env + +# Boot the full stack (MySQL, Redis, Mailpit, GlitchTip) +make services + +# First boot takes ~60s while migrations run. Tail progress: +make services-glitchtip-status +``` + +Web UI: . Outbound mail goes to Mailpit +(`http://localhost:8025`). + +Create the first admin user: + +```bash +docker exec -it glitchtip-web ./manage.py createsuperuser +``` + +Stop the stack with `make services-stop`. Volumes (`glitchtip_postgres_data`, +`glitchtip_redis_data`, `glitchtip_uploads`) survive a stop. Wipe with +`docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v` +— **never on production**. + +--- + +## 3. Project provisioning + +Once the web UI is reachable and the superuser exists: + +1. Sign in at `/`. +2. Create an Organization called **Crewli**. +3. Create two projects: + - **`crewli-api`** — platform: Python / Django, alert rules: default. + - **`crewli-app`** — platform: JavaScript / Vue, alert rules: default. +4. For each project, copy the auto-generated DSN from + *Settings → Client Keys (DSN)*. +5. Store both DSNs in 1Password under `Crewli / GlitchTip / DSNs`: + - `SENTRY_DSN_BACKEND` ← `crewli-api` DSN + - `SENTRY_DSN_FRONTEND` ← `crewli-app` DSN + +PR-2 wires `SENTRY_DSN_BACKEND` into `api/.env.example`; PR-3 wires +`SENTRY_DSN_FRONTEND` into `apps/app/.env.example`. Empty DSN = SDK no-op +(verified for both `sentry-laravel` and `@sentry/vue`), so dev environments +without a DSN are silent. + +--- + +## 4. Production deployment + +GlitchTip runs on a separate host (`monitoring.hausdesign.nl`) and is **not** +deployed via the Crewli `deploy.sh` pipeline. + +### 4.1 Prerequisites + +- Docker + Docker Compose v2 on the monitoring host. +- DirectAdmin with the Let's Encrypt module enabled. +- DNS A-record `monitoring.hausdesign.nl` pointing at the host IP. + +### 4.2 Place the stack + +```bash +sudo install -d -o crewli -g crewli /opt/glitchtip +sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip + +# Copy compose file + env example to the host (e.g. via scp or git checkout). +# /opt/glitchtip/docker-compose.glitchtip.yml +# /opt/glitchtip/docker/glitchtip/.env.example +``` + +### 4.3 Configure `.env` + +```bash +cd /opt/glitchtip +cp docker/glitchtip/.env.example docker/glitchtip/.env +chmod 0600 docker/glitchtip/.env +``` + +Fill in the production values (header of `.env.example` lists the +checklist): + +```env +SECRET_KEY= +DATABASE_URL=postgres://postgres:@glitchtip-postgres:5432/glitchtip +POSTGRES_PASSWORD= # MUST match the password in DATABASE_URL +GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl +DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl +EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT +``` + +Source the `` password from the 1Password vault. + +### 4.4 DNS + TLS + +1. Create the A-record for `monitoring.hausdesign.nl` in DNS. +2. In DirectAdmin: add the subdomain, then enable Let's Encrypt + (Domain Setup → SSL Certificates → "Free & automatic certificate from + Let's Encrypt"). Wait for the cert to issue. + +### 4.5 Apache reverse proxy + +DirectAdmin generates the vhost. Add a custom config (DirectAdmin → +Custom HTTPD Configurations) for the `monitoring.hausdesign.nl` HTTPS +vhost: + +```apache +ProxyPreserveHost On +ProxyRequests Off +ProxyPass / http://127.0.0.1:8200/ +ProxyPassReverse / http://127.0.0.1:8200/ + +# WebSocket upgrade — GlitchTip uses WS for live event streaming. +RewriteEngine On +RewriteCond %{HTTP:Upgrade} websocket [NC] +RewriteCond %{HTTP:Connection} upgrade [NC] +RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L] +``` + +Reload Apache. + +### 4.6 First boot + +```bash +cd /opt/glitchtip +docker compose -f docker-compose.glitchtip.yml up -d + +# Wait for healthchecks (~60s). +docker compose -f docker-compose.glitchtip.yml ps + +# Create the admin user. +docker exec -it glitchtip-web ./manage.py createsuperuser +``` + +Open , sign in, and **enable 2FA** on +the account immediately (acceptance criterion 1). Profile → Security → +Two-Factor Authentication. + +Then provision the two projects (§3) and capture DSNs into 1Password. + +--- + +## 5. Backup & restore + +### 5.1 Daily backup + +`scripts/glitchtip-backup.sh` runs `pg_dump --format=custom`, streams it +through gzip, writes to `./backups/glitchtip/glitchtip-.dump.gz` with +`0600` permissions, and prunes dumps older than 30 days. + +Install the cron entry on the production host: + +```cron +# /etc/cron.d/glitchtip-backup +0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1 +``` + +(Replace `/opt/crewli` with wherever the Crewli repo checkout lives on +the monitoring host. The script is portable — only the `docker exec` +target container needs to exist.) + +The script exits non-zero on dump failure so cron's `MAILTO` catches +silent regressions. + +### 5.2 Restore drill + +```bash +# Pick the dump to restore from. +DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz + +# Stream the restore into the postgres container. +gunzip < "$DUMP" \ + | docker exec -i glitchtip-postgres pg_restore \ + -U postgres -d glitchtip --clean --if-exists +``` + +`--clean --if-exists` drops existing objects before recreating them, so +the database ends up exactly as it was at dump time. Run after a +`docker compose stop glitchtip-web glitchtip-worker` to avoid concurrent +writes during the restore. + +Bert should drill the restore at least once after the production stack +is live (acceptance criterion 11). + +--- + +## 6. Monitoring the monitor + +Quick smoke tests: + +```bash +# API responds with JSON (not 502). +curl -sS http://localhost:8200/api/0/ + +# Worker reporting in (look for "celery@... ready"). +docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \ + logs --tail=50 glitchtip-worker + +# All services healthy. +docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps +``` + +In production, replace `localhost:8200` with `https://monitoring.hausdesign.nl`. +Email-alerting is configured in PR-4; until then alerts surface only in +the GlitchTip web UI (Issues view). + +--- + +## 7. Troubleshooting + +### Web container unhealthy on first boot + +Migrations take ~60s on a fresh volume. The healthcheck `start_period` +is set accordingly. If the container is still unhealthy after two +minutes, tail logs: + +```bash +docker logs glitchtip-web +``` + +Most common cause: `DATABASE_URL` password ≠ `POSTGRES_PASSWORD`. The +postgres container creates the user with the password it sees, GlitchTip +authenticates with the password embedded in the URL — they MUST match. + +### Worker idle / events stuck in queue + +Check that `REDIS_URL` resolves and the worker is connected: + +```bash +docker logs glitchtip-worker | grep -E "ready|connected|error" +``` + +### Volume permission errors on Linux hosts + +`postgres:16-alpine` runs as UID 70 internally. If `/var/lib/postgresql/data` +is bind-mounted from the host with mismatched ownership, postgres refuses +to start. The default named volume avoids this — only relevant if you +later switch to a host bind-mount. + +### Right-to-erasure (Art. 17) + +Currently manual. Locate events for a user ULID via the web UI search, +delete via the UI or directly on the postgres container. An automated +erasure script is on the BACKLOG (per RFC §4). + +--- + +## 8. References + +- RFC: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md) +- GlitchTip docs: +- GlitchTip self-hosting: diff --git a/dev-docs/RFC-WS-7-OBSERVABILITY.md b/dev-docs/RFC-WS-7-OBSERVABILITY.md index d0467987..1ed75cdf 100644 --- a/dev-docs/RFC-WS-7-OBSERVABILITY.md +++ b/dev-docs/RFC-WS-7-OBSERVABILITY.md @@ -30,6 +30,8 @@ Twee afwijkingen van charter §3 besluit 8, beide bewust: Self-hosted GlitchTip op productie VPS via Docker Compose (`glitchtip-web`, `glitchtip-worker`, `glitchtip-postgres`, `glitchtip-redis`). Reverse proxy via DirectAdmin Apache; SSL via DirectAdmin Let's Encrypt op `monitoring.hausdesign.nl` (consistent met bestaande subdomain-pattern). +**Lokale ontwikkeling:** dezelfde `docker-compose.glitchtip.yml` draait lokaal als `make services` (gecombineerd met de bestaande `docker-compose.yml` via `-f`). Web-UI op `http://localhost:8200`, e-mail naar Mailpit op `bm_mailpit:1025`. Dev-stack en prod-stack delen één compose-file zodat configuratie-drift uitgesloten is. + ### 3.2 Twee projecten / DSNs - `crewli-api` — Laravel diff --git a/dev-docs/SETUP.md b/dev-docs/SETUP.md index 114cf420..12d75f03 100644 --- a/dev-docs/SETUP.md +++ b/dev-docs/SETUP.md @@ -70,11 +70,18 @@ Three terminal tabs, plus an optional fourth for the queue worker: | Terminal | Command | Where it runs | Port | |----------|---------|---------------|------| -| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit) | +| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit), 8200 (GlitchTip) | | 2. API | `make api` (from repo root) | Laravel dev server | 8000 | | 3. SPA | `make app` (from repo root) | Vite dev server | 5174 | | 4. Queue worker (optional) | `cd api && php artisan queue:listen redis --queue=emails` | Local PHP | n/a | +Web UIs available once `make services` is up: + +| Service | URL | +|---------|-----| +| Mailpit | | +| GlitchTip | (admin UI; first boot ~60s while migrations run) | + The queue worker is only needed when you're triggering email flows (registration, password reset, email change, invitations). Routine UI work doesn't require it. Stop services when done: `make services-stop`. @@ -116,6 +123,13 @@ VITE_APP_NAME="Crewli" For production: `VITE_API_URL=https://api.crewli.app`. +### `docker/glitchtip/.env` + +Generated by copying `docker/glitchtip/.env.example`. Dev defaults are +functional out of the box — no edits needed for `make services`. See +[`GLITCHTIP.md`](./GLITCHTIP.md) for first-boot steps (creating the +superuser, creating the two projects, copying DSNs to 1Password). + ## Common tasks ### Run tests