Operational docs for the GlitchTip stack landed in the previous two commits. - dev-docs/GLITCHTIP.md: new runbook covering local dev, project provisioning + DSN-to-vault flow, production deploy on monitoring.hausdesign.nl (DNS, DirectAdmin Let's Encrypt, Apache reverse proxy with WS upgrade), backup install + restore drill, smoke tests, troubleshooting. - dev-docs/SETUP.md: services table now includes GlitchTip; new docker/glitchtip/.env subsection points at the runbook. - dev-docs/RFC-WS-7-OBSERVABILITY.md §3.1: amended to record that the same compose file drives local dev (Mailpit at bm_mailpit:1025), so prod and dev cannot drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.5 KiB
GlitchTip — operations runbook
Self-hosted error tracking for Crewli. GlitchTip implements the Sentry
event protocol; the official Sentry SDKs (sentry-laravel, @sentry/vue,
@sentry/cli) work against it without modification.
Reference: RFC-WS-7-OBSERVABILITY.md.
This file documents how to run the stack — locally and on the production monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs provisioned via the steps below.
1. Overview
| Service | Image | Role |
|---|---|---|
glitchtip-web |
glitchtip/glitchtip:6.1.6 |
Django web UI + ingest API |
glitchtip-worker |
glitchtip/glitchtip:6.1.6 |
Celery worker + beat (event processing, alerts, partition maintenance) |
glitchtip-postgres |
postgres:16-alpine |
Primary datastore |
glitchtip-redis |
valkey/valkey:7-alpine |
Celery broker + cache |
The same docker-compose.glitchtip.yml runs both locally (merged with
docker-compose.yml) and on the production host (standalone). Container
names are identical in both environments to avoid configuration drift.
2. Local development
# Once
cp docker/glitchtip/.env.example docker/glitchtip/.env
# Boot the full stack (MySQL, Redis, Mailpit, GlitchTip)
make services
# First boot takes ~60s while migrations run. Tail progress:
make services-glitchtip-status
Web UI: http://localhost:8200. Outbound mail goes to Mailpit
(http://localhost:8025).
Create the first admin user:
docker exec -it glitchtip-web ./manage.py createsuperuser
Stop the stack with make services-stop. Volumes (glitchtip_postgres_data,
glitchtip_redis_data, glitchtip_uploads) survive a stop. Wipe with
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v
— never on production.
3. Project provisioning
Once the web UI is reachable and the superuser exists:
- Sign in at
/. - Create an Organization called Crewli.
- Create two projects:
crewli-api— platform: Python / Django, alert rules: default.crewli-app— platform: JavaScript / Vue, alert rules: default.
- For each project, copy the auto-generated DSN from Settings → Client Keys (DSN).
- Store both DSNs in 1Password under
Crewli / GlitchTip / DSNs:SENTRY_DSN_BACKEND←crewli-apiDSNSENTRY_DSN_FRONTEND←crewli-appDSN
PR-2 wires SENTRY_DSN_BACKEND into api/.env.example; PR-3 wires
SENTRY_DSN_FRONTEND into apps/app/.env.example. Empty DSN = SDK no-op
(verified for both sentry-laravel and @sentry/vue), so dev environments
without a DSN are silent.
4. Production deployment
GlitchTip runs on a separate host (monitoring.hausdesign.nl) and is not
deployed via the Crewli deploy.sh pipeline.
4.1 Prerequisites
- Docker + Docker Compose v2 on the monitoring host.
- DirectAdmin with the Let's Encrypt module enabled.
- DNS A-record
monitoring.hausdesign.nlpointing at the host IP.
4.2 Place the stack
sudo install -d -o crewli -g crewli /opt/glitchtip
sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip
# Copy compose file + env example to the host (e.g. via scp or git checkout).
# /opt/glitchtip/docker-compose.glitchtip.yml
# /opt/glitchtip/docker/glitchtip/.env.example
4.3 Configure .env
cd /opt/glitchtip
cp docker/glitchtip/.env.example docker/glitchtip/.env
chmod 0600 docker/glitchtip/.env
Fill in the production values (header of .env.example lists the
checklist):
SECRET_KEY=<python -c "import secrets; print(secrets.token_urlsafe(50))">
DATABASE_URL=postgres://postgres:<STRONG>@glitchtip-postgres:5432/glitchtip
POSTGRES_PASSWORD=<STRONG> # MUST match the password in DATABASE_URL
GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl
DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl
EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT
Source the <STRONG> password from the 1Password vault.
4.4 DNS + TLS
- Create the A-record for
monitoring.hausdesign.nlin DNS. - In DirectAdmin: add the subdomain, then enable Let's Encrypt (Domain Setup → SSL Certificates → "Free & automatic certificate from Let's Encrypt"). Wait for the cert to issue.
4.5 Apache reverse proxy
DirectAdmin generates the vhost. Add a custom config (DirectAdmin →
Custom HTTPD Configurations) for the monitoring.hausdesign.nl HTTPS
vhost:
ProxyPreserveHost On
ProxyRequests Off
ProxyPass / http://127.0.0.1:8200/
ProxyPassReverse / http://127.0.0.1:8200/
# WebSocket upgrade — GlitchTip uses WS for live event streaming.
RewriteEngine On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L]
Reload Apache.
4.6 First boot
cd /opt/glitchtip
docker compose -f docker-compose.glitchtip.yml up -d
# Wait for healthchecks (~60s).
docker compose -f docker-compose.glitchtip.yml ps
# Create the admin user.
docker exec -it glitchtip-web ./manage.py createsuperuser
Open https://monitoring.hausdesign.nl, sign in, and enable 2FA on the account immediately (acceptance criterion 1). Profile → Security → Two-Factor Authentication.
Then provision the two projects (§3) and capture DSNs into 1Password.
5. Backup & restore
5.1 Daily backup
scripts/glitchtip-backup.sh runs pg_dump --format=custom, streams it
through gzip, writes to ./backups/glitchtip/glitchtip-<ts>.dump.gz with
0600 permissions, and prunes dumps older than 30 days.
Install the cron entry on the production host:
# /etc/cron.d/glitchtip-backup
0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1
(Replace /opt/crewli with wherever the Crewli repo checkout lives on
the monitoring host. The script is portable — only the docker exec
target container needs to exist.)
The script exits non-zero on dump failure so cron's MAILTO catches
silent regressions.
5.2 Restore drill
# Pick the dump to restore from.
DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz
# Stream the restore into the postgres container.
gunzip < "$DUMP" \
| docker exec -i glitchtip-postgres pg_restore \
-U postgres -d glitchtip --clean --if-exists
--clean --if-exists drops existing objects before recreating them, so
the database ends up exactly as it was at dump time. Run after a
docker compose stop glitchtip-web glitchtip-worker to avoid concurrent
writes during the restore.
Bert should drill the restore at least once after the production stack is live (acceptance criterion 11).
6. Monitoring the monitor
Quick smoke tests:
# API responds with JSON (not 502).
curl -sS http://localhost:8200/api/0/
# Worker reporting in (look for "celery@... ready").
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \
logs --tail=50 glitchtip-worker
# All services healthy.
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps
In production, replace localhost:8200 with https://monitoring.hausdesign.nl.
Email-alerting is configured in PR-4; until then alerts surface only in
the GlitchTip web UI (Issues view).
7. Troubleshooting
Web container unhealthy on first boot
Migrations take ~60s on a fresh volume. The healthcheck start_period
is set accordingly. If the container is still unhealthy after two
minutes, tail logs:
docker logs glitchtip-web
Most common cause: DATABASE_URL password ≠ POSTGRES_PASSWORD. The
postgres container creates the user with the password it sees, GlitchTip
authenticates with the password embedded in the URL — they MUST match.
Worker idle / events stuck in queue
Check that REDIS_URL resolves and the worker is connected:
docker logs glitchtip-worker | grep -E "ready|connected|error"
Volume permission errors on Linux hosts
postgres:16-alpine runs as UID 70 internally. If /var/lib/postgresql/data
is bind-mounted from the host with mismatched ownership, postgres refuses
to start. The default named volume avoids this — only relevant if you
later switch to a host bind-mount.
Right-to-erasure (Art. 17)
Currently manual. Locate events for a user ULID via the web UI search, delete via the UI or directly on the postgres container. An automated erasure script is on the BACKLOG (per RFC §4).
8. References
- RFC:
RFC-WS-7-OBSERVABILITY.md - GlitchTip docs: https://glitchtip.com/documentation
- GlitchTip self-hosting: https://glitchtip.com/documentation/install