Operational docs for the GlitchTip stack landed in the previous two commits. - dev-docs/GLITCHTIP.md: new runbook covering local dev, project provisioning + DSN-to-vault flow, production deploy on monitoring.hausdesign.nl (DNS, DirectAdmin Let's Encrypt, Apache reverse proxy with WS upgrade), backup install + restore drill, smoke tests, troubleshooting. - dev-docs/SETUP.md: services table now includes GlitchTip; new docker/glitchtip/.env subsection points at the runbook. - dev-docs/RFC-WS-7-OBSERVABILITY.md §3.1: amended to record that the same compose file drives local dev (Mailpit at bm_mailpit:1025), so prod and dev cannot drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
284 lines
8.5 KiB
Markdown
284 lines
8.5 KiB
Markdown
# GlitchTip — operations runbook
|
|
|
|
Self-hosted error tracking for Crewli. GlitchTip implements the Sentry
|
|
event protocol; the official Sentry SDKs (`sentry-laravel`, `@sentry/vue`,
|
|
`@sentry/cli`) work against it without modification.
|
|
|
|
Reference: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md).
|
|
|
|
This file documents how to run the stack — locally and on the production
|
|
monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs
|
|
provisioned via the steps below.
|
|
|
|
---
|
|
|
|
## 1. Overview
|
|
|
|
| Service | Image | Role |
|
|
|---------|-------|------|
|
|
| `glitchtip-web` | `glitchtip/glitchtip:6.1.6` | Django web UI + ingest API |
|
|
| `glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery worker + beat (event processing, alerts, partition maintenance) |
|
|
| `glitchtip-postgres` | `postgres:16-alpine` | Primary datastore |
|
|
| `glitchtip-redis` | `valkey/valkey:7-alpine` | Celery broker + cache |
|
|
|
|
The same `docker-compose.glitchtip.yml` runs both locally (merged with
|
|
`docker-compose.yml`) and on the production host (standalone). Container
|
|
names are identical in both environments to avoid configuration drift.
|
|
|
|
---
|
|
|
|
## 2. Local development
|
|
|
|
```bash
|
|
# Once
|
|
cp docker/glitchtip/.env.example docker/glitchtip/.env
|
|
|
|
# Boot the full stack (MySQL, Redis, Mailpit, GlitchTip)
|
|
make services
|
|
|
|
# First boot takes ~60s while migrations run. Tail progress:
|
|
make services-glitchtip-status
|
|
```
|
|
|
|
Web UI: <http://localhost:8200>. Outbound mail goes to Mailpit
|
|
(`http://localhost:8025`).
|
|
|
|
Create the first admin user:
|
|
|
|
```bash
|
|
docker exec -it glitchtip-web ./manage.py createsuperuser
|
|
```
|
|
|
|
Stop the stack with `make services-stop`. Volumes (`glitchtip_postgres_data`,
|
|
`glitchtip_redis_data`, `glitchtip_uploads`) survive a stop. Wipe with
|
|
`docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v`
|
|
— **never on production**.
|
|
|
|
---
|
|
|
|
## 3. Project provisioning
|
|
|
|
Once the web UI is reachable and the superuser exists:
|
|
|
|
1. Sign in at `/`.
|
|
2. Create an Organization called **Crewli**.
|
|
3. Create two projects:
|
|
- **`crewli-api`** — platform: Python / Django, alert rules: default.
|
|
- **`crewli-app`** — platform: JavaScript / Vue, alert rules: default.
|
|
4. For each project, copy the auto-generated DSN from
|
|
*Settings → Client Keys (DSN)*.
|
|
5. Store both DSNs in 1Password under `Crewli / GlitchTip / DSNs`:
|
|
- `SENTRY_DSN_BACKEND` ← `crewli-api` DSN
|
|
- `SENTRY_DSN_FRONTEND` ← `crewli-app` DSN
|
|
|
|
PR-2 wires `SENTRY_DSN_BACKEND` into `api/.env.example`; PR-3 wires
|
|
`SENTRY_DSN_FRONTEND` into `apps/app/.env.example`. Empty DSN = SDK no-op
|
|
(verified for both `sentry-laravel` and `@sentry/vue`), so dev environments
|
|
without a DSN are silent.
|
|
|
|
---
|
|
|
|
## 4. Production deployment
|
|
|
|
GlitchTip runs on a separate host (`monitoring.hausdesign.nl`) and is **not**
|
|
deployed via the Crewli `deploy.sh` pipeline.
|
|
|
|
### 4.1 Prerequisites
|
|
|
|
- Docker + Docker Compose v2 on the monitoring host.
|
|
- DirectAdmin with the Let's Encrypt module enabled.
|
|
- DNS A-record `monitoring.hausdesign.nl` pointing at the host IP.
|
|
|
|
### 4.2 Place the stack
|
|
|
|
```bash
|
|
sudo install -d -o crewli -g crewli /opt/glitchtip
|
|
sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip
|
|
|
|
# Copy compose file + env example to the host (e.g. via scp or git checkout).
|
|
# /opt/glitchtip/docker-compose.glitchtip.yml
|
|
# /opt/glitchtip/docker/glitchtip/.env.example
|
|
```
|
|
|
|
### 4.3 Configure `.env`
|
|
|
|
```bash
|
|
cd /opt/glitchtip
|
|
cp docker/glitchtip/.env.example docker/glitchtip/.env
|
|
chmod 0600 docker/glitchtip/.env
|
|
```
|
|
|
|
Fill in the production values (header of `.env.example` lists the
|
|
checklist):
|
|
|
|
```env
|
|
SECRET_KEY=<python -c "import secrets; print(secrets.token_urlsafe(50))">
|
|
DATABASE_URL=postgres://postgres:<STRONG>@glitchtip-postgres:5432/glitchtip
|
|
POSTGRES_PASSWORD=<STRONG> # MUST match the password in DATABASE_URL
|
|
GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl
|
|
DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl
|
|
EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT
|
|
```
|
|
|
|
Source the `<STRONG>` password from the 1Password vault.
|
|
|
|
### 4.4 DNS + TLS
|
|
|
|
1. Create the A-record for `monitoring.hausdesign.nl` in DNS.
|
|
2. In DirectAdmin: add the subdomain, then enable Let's Encrypt
|
|
(Domain Setup → SSL Certificates → "Free & automatic certificate from
|
|
Let's Encrypt"). Wait for the cert to issue.
|
|
|
|
### 4.5 Apache reverse proxy
|
|
|
|
DirectAdmin generates the vhost. Add a custom config (DirectAdmin →
|
|
Custom HTTPD Configurations) for the `monitoring.hausdesign.nl` HTTPS
|
|
vhost:
|
|
|
|
```apache
|
|
ProxyPreserveHost On
|
|
ProxyRequests Off
|
|
ProxyPass / http://127.0.0.1:8200/
|
|
ProxyPassReverse / http://127.0.0.1:8200/
|
|
|
|
# WebSocket upgrade — GlitchTip uses WS for live event streaming.
|
|
RewriteEngine On
|
|
RewriteCond %{HTTP:Upgrade} websocket [NC]
|
|
RewriteCond %{HTTP:Connection} upgrade [NC]
|
|
RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L]
|
|
```
|
|
|
|
Reload Apache.
|
|
|
|
### 4.6 First boot
|
|
|
|
```bash
|
|
cd /opt/glitchtip
|
|
docker compose -f docker-compose.glitchtip.yml up -d
|
|
|
|
# Wait for healthchecks (~60s).
|
|
docker compose -f docker-compose.glitchtip.yml ps
|
|
|
|
# Create the admin user.
|
|
docker exec -it glitchtip-web ./manage.py createsuperuser
|
|
```
|
|
|
|
Open <https://monitoring.hausdesign.nl>, sign in, and **enable 2FA** on
|
|
the account immediately (acceptance criterion 1). Profile → Security →
|
|
Two-Factor Authentication.
|
|
|
|
Then provision the two projects (§3) and capture DSNs into 1Password.
|
|
|
|
---
|
|
|
|
## 5. Backup & restore
|
|
|
|
### 5.1 Daily backup
|
|
|
|
`scripts/glitchtip-backup.sh` runs `pg_dump --format=custom`, streams it
|
|
through gzip, writes to `./backups/glitchtip/glitchtip-<ts>.dump.gz` with
|
|
`0600` permissions, and prunes dumps older than 30 days.
|
|
|
|
Install the cron entry on the production host:
|
|
|
|
```cron
|
|
# /etc/cron.d/glitchtip-backup
|
|
0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1
|
|
```
|
|
|
|
(Replace `/opt/crewli` with wherever the Crewli repo checkout lives on
|
|
the monitoring host. The script is portable — only the `docker exec`
|
|
target container needs to exist.)
|
|
|
|
The script exits non-zero on dump failure so cron's `MAILTO` catches
|
|
silent regressions.
|
|
|
|
### 5.2 Restore drill
|
|
|
|
```bash
|
|
# Pick the dump to restore from.
|
|
DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz
|
|
|
|
# Stream the restore into the postgres container.
|
|
gunzip < "$DUMP" \
|
|
| docker exec -i glitchtip-postgres pg_restore \
|
|
-U postgres -d glitchtip --clean --if-exists
|
|
```
|
|
|
|
`--clean --if-exists` drops existing objects before recreating them, so
|
|
the database ends up exactly as it was at dump time. Run after a
|
|
`docker compose stop glitchtip-web glitchtip-worker` to avoid concurrent
|
|
writes during the restore.
|
|
|
|
Bert should drill the restore at least once after the production stack
|
|
is live (acceptance criterion 11).
|
|
|
|
---
|
|
|
|
## 6. Monitoring the monitor
|
|
|
|
Quick smoke tests:
|
|
|
|
```bash
|
|
# API responds with JSON (not 502).
|
|
curl -sS http://localhost:8200/api/0/
|
|
|
|
# Worker reporting in (look for "celery@... ready").
|
|
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \
|
|
logs --tail=50 glitchtip-worker
|
|
|
|
# All services healthy.
|
|
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps
|
|
```
|
|
|
|
In production, replace `localhost:8200` with `https://monitoring.hausdesign.nl`.
|
|
Email-alerting is configured in PR-4; until then alerts surface only in
|
|
the GlitchTip web UI (Issues view).
|
|
|
|
---
|
|
|
|
## 7. Troubleshooting
|
|
|
|
### Web container unhealthy on first boot
|
|
|
|
Migrations take ~60s on a fresh volume. The healthcheck `start_period`
|
|
is set accordingly. If the container is still unhealthy after two
|
|
minutes, tail logs:
|
|
|
|
```bash
|
|
docker logs glitchtip-web
|
|
```
|
|
|
|
Most common cause: `DATABASE_URL` password ≠ `POSTGRES_PASSWORD`. The
|
|
postgres container creates the user with the password it sees, GlitchTip
|
|
authenticates with the password embedded in the URL — they MUST match.
|
|
|
|
### Worker idle / events stuck in queue
|
|
|
|
Check that `REDIS_URL` resolves and the worker is connected:
|
|
|
|
```bash
|
|
docker logs glitchtip-worker | grep -E "ready|connected|error"
|
|
```
|
|
|
|
### Volume permission errors on Linux hosts
|
|
|
|
`postgres:16-alpine` runs as UID 70 internally. If `/var/lib/postgresql/data`
|
|
is bind-mounted from the host with mismatched ownership, postgres refuses
|
|
to start. The default named volume avoids this — only relevant if you
|
|
later switch to a host bind-mount.
|
|
|
|
### Right-to-erasure (Art. 17)
|
|
|
|
Currently manual. Locate events for a user ULID via the web UI search,
|
|
delete via the UI or directly on the postgres container. An automated
|
|
erasure script is on the BACKLOG (per RFC §4).
|
|
|
|
---
|
|
|
|
## 8. References
|
|
|
|
- RFC: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md)
|
|
- GlitchTip docs: <https://glitchtip.com/documentation>
|
|
- GlitchTip self-hosting: <https://glitchtip.com/documentation/install>
|