# GlitchTip — operations runbook Self-hosted error tracking for Crewli. GlitchTip implements the Sentry event protocol; the official Sentry SDKs (`sentry-laravel`, `@sentry/vue`, `@sentry/cli`) work against it without modification. Reference: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md). This file documents how to run the stack — locally and on the production monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs provisioned via the steps below. --- ## 1. Overview | Service | Image | Role | |---------|-------|------| | `glitchtip-web` | `glitchtip/glitchtip:6.1.6` | Django web UI + ingest API | | `glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery worker + beat (event processing, alerts, partition maintenance) | | `glitchtip-postgres` | `postgres:16-alpine` | Primary datastore | | `glitchtip-redis` | `valkey/valkey:7-alpine` | Celery broker + cache | The same `docker-compose.glitchtip.yml` runs both locally (merged with `docker-compose.yml`) and on the production host (standalone). Container names are identical in both environments to avoid configuration drift. --- ## 2. Local development ```bash # Once cp docker/glitchtip/.env.example docker/glitchtip/.env # Boot the full stack (MySQL, Redis, Mailpit, GlitchTip) make services # First boot takes ~60s while migrations run. Tail progress: make services-glitchtip-status ``` Web UI: . Outbound mail goes to Mailpit (`http://localhost:8025`). Create the first admin user: ```bash docker exec -it glitchtip-web ./manage.py createsuperuser ``` Stop the stack with `make services-stop`. Volumes (`glitchtip_postgres_data`, `glitchtip_redis_data`, `glitchtip_uploads`) survive a stop. Wipe with `docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v` — **never on production**. --- ## 3. Project provisioning Once the web UI is reachable and the superuser exists: 1. Sign in at `/`. 2. Create an Organization called **Crewli**. 3. Create two projects: - **`crewli-api`** — platform: Python / Django, alert rules: default. - **`crewli-app`** — platform: JavaScript / Vue, alert rules: default. 4. For each project, copy the auto-generated DSN from *Settings → Client Keys (DSN)*. 5. Store both DSNs in 1Password under `Crewli / GlitchTip / DSNs`: - `SENTRY_DSN_BACKEND` ← `crewli-api` DSN - `SENTRY_DSN_FRONTEND` ← `crewli-app` DSN PR-2 wires `SENTRY_DSN_BACKEND` into `api/.env.example`; PR-3 wires `SENTRY_DSN_FRONTEND` into `apps/app/.env.example`. Empty DSN = SDK no-op (verified for both `sentry-laravel` and `@sentry/vue`), so dev environments without a DSN are silent. --- ## 4. Production deployment GlitchTip runs on a separate host (`monitoring.hausdesign.nl`) and is **not** deployed via the Crewli `deploy.sh` pipeline. ### 4.1 Prerequisites - Docker + Docker Compose v2 on the monitoring host. - DirectAdmin with the Let's Encrypt module enabled. - DNS A-record `monitoring.hausdesign.nl` pointing at the host IP. ### 4.2 Place the stack ```bash sudo install -d -o crewli -g crewli /opt/glitchtip sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip # Copy compose file + env example to the host (e.g. via scp or git checkout). # /opt/glitchtip/docker-compose.glitchtip.yml # /opt/glitchtip/docker/glitchtip/.env.example ``` ### 4.3 Configure `.env` ```bash cd /opt/glitchtip cp docker/glitchtip/.env.example docker/glitchtip/.env chmod 0600 docker/glitchtip/.env ``` Fill in the production values (header of `.env.example` lists the checklist): ```env SECRET_KEY= DATABASE_URL=postgres://postgres:@glitchtip-postgres:5432/glitchtip POSTGRES_PASSWORD= # MUST match the password in DATABASE_URL GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT ``` Source the `` password from the 1Password vault. ### 4.4 DNS + TLS 1. Create the A-record for `monitoring.hausdesign.nl` in DNS. 2. In DirectAdmin: add the subdomain, then enable Let's Encrypt (Domain Setup → SSL Certificates → "Free & automatic certificate from Let's Encrypt"). Wait for the cert to issue. ### 4.5 Apache reverse proxy DirectAdmin generates the vhost. Add a custom config (DirectAdmin → Custom HTTPD Configurations) for the `monitoring.hausdesign.nl` HTTPS vhost: ```apache ProxyPreserveHost On ProxyRequests Off ProxyPass / http://127.0.0.1:8200/ ProxyPassReverse / http://127.0.0.1:8200/ # WebSocket upgrade — GlitchTip uses WS for live event streaming. RewriteEngine On RewriteCond %{HTTP:Upgrade} websocket [NC] RewriteCond %{HTTP:Connection} upgrade [NC] RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L] ``` Reload Apache. ### 4.6 First boot ```bash cd /opt/glitchtip docker compose -f docker-compose.glitchtip.yml up -d # Wait for healthchecks (~60s). docker compose -f docker-compose.glitchtip.yml ps # Create the admin user. docker exec -it glitchtip-web ./manage.py createsuperuser ``` Open , sign in, and **enable 2FA** on the account immediately (acceptance criterion 1). Profile → Security → Two-Factor Authentication. Then provision the two projects (§3) and capture DSNs into 1Password. --- ## 5. Backup & restore ### 5.1 Daily backup `scripts/glitchtip-backup.sh` runs `pg_dump --format=custom`, streams it through gzip, writes to `./backups/glitchtip/glitchtip-.dump.gz` with `0600` permissions, and prunes dumps older than 30 days. Install the cron entry on the production host: ```cron # /etc/cron.d/glitchtip-backup 0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1 ``` (Replace `/opt/crewli` with wherever the Crewli repo checkout lives on the monitoring host. The script is portable — only the `docker exec` target container needs to exist.) The script exits non-zero on dump failure so cron's `MAILTO` catches silent regressions. ### 5.2 Restore drill ```bash # Pick the dump to restore from. DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz # Stream the restore into the postgres container. gunzip < "$DUMP" \ | docker exec -i glitchtip-postgres pg_restore \ -U postgres -d glitchtip --clean --if-exists ``` `--clean --if-exists` drops existing objects before recreating them, so the database ends up exactly as it was at dump time. Run after a `docker compose stop glitchtip-web glitchtip-worker` to avoid concurrent writes during the restore. Bert should drill the restore at least once after the production stack is live (acceptance criterion 11). --- ## 6. Monitoring the monitor Quick smoke tests: ```bash # API responds with JSON (not 502). curl -sS http://localhost:8200/api/0/ # Worker reporting in (look for "celery@... ready"). docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \ logs --tail=50 glitchtip-worker # All services healthy. docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps ``` In production, replace `localhost:8200` with `https://monitoring.hausdesign.nl`. Email-alerting is configured in PR-4; until then alerts surface only in the GlitchTip web UI (Issues view). --- ## 7. Troubleshooting ### Web container unhealthy on first boot Migrations take ~60s on a fresh volume. The healthcheck `start_period` is set accordingly. If the container is still unhealthy after two minutes, tail logs: ```bash docker logs glitchtip-web ``` Most common cause: `DATABASE_URL` password ≠ `POSTGRES_PASSWORD`. The postgres container creates the user with the password it sees, GlitchTip authenticates with the password embedded in the URL — they MUST match. ### Worker idle / events stuck in queue Check that `REDIS_URL` resolves and the worker is connected: ```bash docker logs glitchtip-worker | grep -E "ready|connected|error" ``` ### Volume permission errors on Linux hosts `postgres:16-alpine` runs as UID 70 internally. If `/var/lib/postgresql/data` is bind-mounted from the host with mismatched ownership, postgres refuses to start. The default named volume avoids this — only relevant if you later switch to a host bind-mount. ### Right-to-erasure (Art. 17) Currently manual. Locate events for a user ULID via the web UI search, delete via the UI or directly on the postgres container. An automated erasure script is on the BACKLOG (per RFC §4). --- ## 8. References - RFC: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md) - GlitchTip docs: - GlitchTip self-hosting: