docs: glitchtip runbook + setup + RFC §3.1 dev amendment
Operational docs for the GlitchTip stack landed in the previous two commits. - dev-docs/GLITCHTIP.md: new runbook covering local dev, project provisioning + DSN-to-vault flow, production deploy on monitoring.hausdesign.nl (DNS, DirectAdmin Let's Encrypt, Apache reverse proxy with WS upgrade), backup install + restore drill, smoke tests, troubleshooting. - dev-docs/SETUP.md: services table now includes GlitchTip; new docker/glitchtip/.env subsection points at the runbook. - dev-docs/RFC-WS-7-OBSERVABILITY.md §3.1: amended to record that the same compose file drives local dev (Mailpit at bm_mailpit:1025), so prod and dev cannot drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
283
dev-docs/GLITCHTIP.md
Normal file
283
dev-docs/GLITCHTIP.md
Normal file
@@ -0,0 +1,283 @@
|
|||||||
|
# GlitchTip — operations runbook
|
||||||
|
|
||||||
|
Self-hosted error tracking for Crewli. GlitchTip implements the Sentry
|
||||||
|
event protocol; the official Sentry SDKs (`sentry-laravel`, `@sentry/vue`,
|
||||||
|
`@sentry/cli`) work against it without modification.
|
||||||
|
|
||||||
|
Reference: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md).
|
||||||
|
|
||||||
|
This file documents how to run the stack — locally and on the production
|
||||||
|
monitoring host. PR-2 (backend SDK) and PR-3 (frontend SDK) consume DSNs
|
||||||
|
provisioned via the steps below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Overview
|
||||||
|
|
||||||
|
| Service | Image | Role |
|
||||||
|
|---------|-------|------|
|
||||||
|
| `glitchtip-web` | `glitchtip/glitchtip:6.1.6` | Django web UI + ingest API |
|
||||||
|
| `glitchtip-worker` | `glitchtip/glitchtip:6.1.6` | Celery worker + beat (event processing, alerts, partition maintenance) |
|
||||||
|
| `glitchtip-postgres` | `postgres:16-alpine` | Primary datastore |
|
||||||
|
| `glitchtip-redis` | `valkey/valkey:7-alpine` | Celery broker + cache |
|
||||||
|
|
||||||
|
The same `docker-compose.glitchtip.yml` runs both locally (merged with
|
||||||
|
`docker-compose.yml`) and on the production host (standalone). Container
|
||||||
|
names are identical in both environments to avoid configuration drift.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Local development
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Once
|
||||||
|
cp docker/glitchtip/.env.example docker/glitchtip/.env
|
||||||
|
|
||||||
|
# Boot the full stack (MySQL, Redis, Mailpit, GlitchTip)
|
||||||
|
make services
|
||||||
|
|
||||||
|
# First boot takes ~60s while migrations run. Tail progress:
|
||||||
|
make services-glitchtip-status
|
||||||
|
```
|
||||||
|
|
||||||
|
Web UI: <http://localhost:8200>. Outbound mail goes to Mailpit
|
||||||
|
(`http://localhost:8025`).
|
||||||
|
|
||||||
|
Create the first admin user:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec -it glitchtip-web ./manage.py createsuperuser
|
||||||
|
```
|
||||||
|
|
||||||
|
Stop the stack with `make services-stop`. Volumes (`glitchtip_postgres_data`,
|
||||||
|
`glitchtip_redis_data`, `glitchtip_uploads`) survive a stop. Wipe with
|
||||||
|
`docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml down -v`
|
||||||
|
— **never on production**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Project provisioning
|
||||||
|
|
||||||
|
Once the web UI is reachable and the superuser exists:
|
||||||
|
|
||||||
|
1. Sign in at `/`.
|
||||||
|
2. Create an Organization called **Crewli**.
|
||||||
|
3. Create two projects:
|
||||||
|
- **`crewli-api`** — platform: Python / Django, alert rules: default.
|
||||||
|
- **`crewli-app`** — platform: JavaScript / Vue, alert rules: default.
|
||||||
|
4. For each project, copy the auto-generated DSN from
|
||||||
|
*Settings → Client Keys (DSN)*.
|
||||||
|
5. Store both DSNs in 1Password under `Crewli / GlitchTip / DSNs`:
|
||||||
|
- `SENTRY_DSN_BACKEND` ← `crewli-api` DSN
|
||||||
|
- `SENTRY_DSN_FRONTEND` ← `crewli-app` DSN
|
||||||
|
|
||||||
|
PR-2 wires `SENTRY_DSN_BACKEND` into `api/.env.example`; PR-3 wires
|
||||||
|
`SENTRY_DSN_FRONTEND` into `apps/app/.env.example`. Empty DSN = SDK no-op
|
||||||
|
(verified for both `sentry-laravel` and `@sentry/vue`), so dev environments
|
||||||
|
without a DSN are silent.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Production deployment
|
||||||
|
|
||||||
|
GlitchTip runs on a separate host (`monitoring.hausdesign.nl`) and is **not**
|
||||||
|
deployed via the Crewli `deploy.sh` pipeline.
|
||||||
|
|
||||||
|
### 4.1 Prerequisites
|
||||||
|
|
||||||
|
- Docker + Docker Compose v2 on the monitoring host.
|
||||||
|
- DirectAdmin with the Let's Encrypt module enabled.
|
||||||
|
- DNS A-record `monitoring.hausdesign.nl` pointing at the host IP.
|
||||||
|
|
||||||
|
### 4.2 Place the stack
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo install -d -o crewli -g crewli /opt/glitchtip
|
||||||
|
sudo install -d -o crewli -g crewli /opt/glitchtip/docker/glitchtip
|
||||||
|
|
||||||
|
# Copy compose file + env example to the host (e.g. via scp or git checkout).
|
||||||
|
# /opt/glitchtip/docker-compose.glitchtip.yml
|
||||||
|
# /opt/glitchtip/docker/glitchtip/.env.example
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.3 Configure `.env`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/glitchtip
|
||||||
|
cp docker/glitchtip/.env.example docker/glitchtip/.env
|
||||||
|
chmod 0600 docker/glitchtip/.env
|
||||||
|
```
|
||||||
|
|
||||||
|
Fill in the production values (header of `.env.example` lists the
|
||||||
|
checklist):
|
||||||
|
|
||||||
|
```env
|
||||||
|
SECRET_KEY=<python -c "import secrets; print(secrets.token_urlsafe(50))">
|
||||||
|
DATABASE_URL=postgres://postgres:<STRONG>@glitchtip-postgres:5432/glitchtip
|
||||||
|
POSTGRES_PASSWORD=<STRONG> # MUST match the password in DATABASE_URL
|
||||||
|
GLITCHTIP_DOMAIN=https://monitoring.hausdesign.nl
|
||||||
|
DEFAULT_FROM_EMAIL=glitchtip@hausdesign.nl
|
||||||
|
EMAIL_URL=smtp+tls://USER:PASSWORD@HOST:PORT
|
||||||
|
```
|
||||||
|
|
||||||
|
Source the `<STRONG>` password from the 1Password vault.
|
||||||
|
|
||||||
|
### 4.4 DNS + TLS
|
||||||
|
|
||||||
|
1. Create the A-record for `monitoring.hausdesign.nl` in DNS.
|
||||||
|
2. In DirectAdmin: add the subdomain, then enable Let's Encrypt
|
||||||
|
(Domain Setup → SSL Certificates → "Free & automatic certificate from
|
||||||
|
Let's Encrypt"). Wait for the cert to issue.
|
||||||
|
|
||||||
|
### 4.5 Apache reverse proxy
|
||||||
|
|
||||||
|
DirectAdmin generates the vhost. Add a custom config (DirectAdmin →
|
||||||
|
Custom HTTPD Configurations) for the `monitoring.hausdesign.nl` HTTPS
|
||||||
|
vhost:
|
||||||
|
|
||||||
|
```apache
|
||||||
|
ProxyPreserveHost On
|
||||||
|
ProxyRequests Off
|
||||||
|
ProxyPass / http://127.0.0.1:8200/
|
||||||
|
ProxyPassReverse / http://127.0.0.1:8200/
|
||||||
|
|
||||||
|
# WebSocket upgrade — GlitchTip uses WS for live event streaming.
|
||||||
|
RewriteEngine On
|
||||||
|
RewriteCond %{HTTP:Upgrade} websocket [NC]
|
||||||
|
RewriteCond %{HTTP:Connection} upgrade [NC]
|
||||||
|
RewriteRule ^/?(.*) "ws://127.0.0.1:8200/$1" [P,L]
|
||||||
|
```
|
||||||
|
|
||||||
|
Reload Apache.
|
||||||
|
|
||||||
|
### 4.6 First boot
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/glitchtip
|
||||||
|
docker compose -f docker-compose.glitchtip.yml up -d
|
||||||
|
|
||||||
|
# Wait for healthchecks (~60s).
|
||||||
|
docker compose -f docker-compose.glitchtip.yml ps
|
||||||
|
|
||||||
|
# Create the admin user.
|
||||||
|
docker exec -it glitchtip-web ./manage.py createsuperuser
|
||||||
|
```
|
||||||
|
|
||||||
|
Open <https://monitoring.hausdesign.nl>, sign in, and **enable 2FA** on
|
||||||
|
the account immediately (acceptance criterion 1). Profile → Security →
|
||||||
|
Two-Factor Authentication.
|
||||||
|
|
||||||
|
Then provision the two projects (§3) and capture DSNs into 1Password.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Backup & restore
|
||||||
|
|
||||||
|
### 5.1 Daily backup
|
||||||
|
|
||||||
|
`scripts/glitchtip-backup.sh` runs `pg_dump --format=custom`, streams it
|
||||||
|
through gzip, writes to `./backups/glitchtip/glitchtip-<ts>.dump.gz` with
|
||||||
|
`0600` permissions, and prunes dumps older than 30 days.
|
||||||
|
|
||||||
|
Install the cron entry on the production host:
|
||||||
|
|
||||||
|
```cron
|
||||||
|
# /etc/cron.d/glitchtip-backup
|
||||||
|
0 3 * * * crewli /opt/crewli/scripts/glitchtip-backup.sh >> /var/log/glitchtip-backup.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
(Replace `/opt/crewli` with wherever the Crewli repo checkout lives on
|
||||||
|
the monitoring host. The script is portable — only the `docker exec`
|
||||||
|
target container needs to exist.)
|
||||||
|
|
||||||
|
The script exits non-zero on dump failure so cron's `MAILTO` catches
|
||||||
|
silent regressions.
|
||||||
|
|
||||||
|
### 5.2 Restore drill
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Pick the dump to restore from.
|
||||||
|
DUMP=./backups/glitchtip/glitchtip-20260506-030000.dump.gz
|
||||||
|
|
||||||
|
# Stream the restore into the postgres container.
|
||||||
|
gunzip < "$DUMP" \
|
||||||
|
| docker exec -i glitchtip-postgres pg_restore \
|
||||||
|
-U postgres -d glitchtip --clean --if-exists
|
||||||
|
```
|
||||||
|
|
||||||
|
`--clean --if-exists` drops existing objects before recreating them, so
|
||||||
|
the database ends up exactly as it was at dump time. Run after a
|
||||||
|
`docker compose stop glitchtip-web glitchtip-worker` to avoid concurrent
|
||||||
|
writes during the restore.
|
||||||
|
|
||||||
|
Bert should drill the restore at least once after the production stack
|
||||||
|
is live (acceptance criterion 11).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Monitoring the monitor
|
||||||
|
|
||||||
|
Quick smoke tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# API responds with JSON (not 502).
|
||||||
|
curl -sS http://localhost:8200/api/0/
|
||||||
|
|
||||||
|
# Worker reporting in (look for "celery@... ready").
|
||||||
|
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml \
|
||||||
|
logs --tail=50 glitchtip-worker
|
||||||
|
|
||||||
|
# All services healthy.
|
||||||
|
docker compose -f docker-compose.yml -f docker-compose.glitchtip.yml ps
|
||||||
|
```
|
||||||
|
|
||||||
|
In production, replace `localhost:8200` with `https://monitoring.hausdesign.nl`.
|
||||||
|
Email-alerting is configured in PR-4; until then alerts surface only in
|
||||||
|
the GlitchTip web UI (Issues view).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Troubleshooting
|
||||||
|
|
||||||
|
### Web container unhealthy on first boot
|
||||||
|
|
||||||
|
Migrations take ~60s on a fresh volume. The healthcheck `start_period`
|
||||||
|
is set accordingly. If the container is still unhealthy after two
|
||||||
|
minutes, tail logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker logs glitchtip-web
|
||||||
|
```
|
||||||
|
|
||||||
|
Most common cause: `DATABASE_URL` password ≠ `POSTGRES_PASSWORD`. The
|
||||||
|
postgres container creates the user with the password it sees, GlitchTip
|
||||||
|
authenticates with the password embedded in the URL — they MUST match.
|
||||||
|
|
||||||
|
### Worker idle / events stuck in queue
|
||||||
|
|
||||||
|
Check that `REDIS_URL` resolves and the worker is connected:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker logs glitchtip-worker | grep -E "ready|connected|error"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Volume permission errors on Linux hosts
|
||||||
|
|
||||||
|
`postgres:16-alpine` runs as UID 70 internally. If `/var/lib/postgresql/data`
|
||||||
|
is bind-mounted from the host with mismatched ownership, postgres refuses
|
||||||
|
to start. The default named volume avoids this — only relevant if you
|
||||||
|
later switch to a host bind-mount.
|
||||||
|
|
||||||
|
### Right-to-erasure (Art. 17)
|
||||||
|
|
||||||
|
Currently manual. Locate events for a user ULID via the web UI search,
|
||||||
|
delete via the UI or directly on the postgres container. An automated
|
||||||
|
erasure script is on the BACKLOG (per RFC §4).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. References
|
||||||
|
|
||||||
|
- RFC: [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md)
|
||||||
|
- GlitchTip docs: <https://glitchtip.com/documentation>
|
||||||
|
- GlitchTip self-hosting: <https://glitchtip.com/documentation/install>
|
||||||
@@ -30,6 +30,8 @@ Twee afwijkingen van charter §3 besluit 8, beide bewust:
|
|||||||
|
|
||||||
Self-hosted GlitchTip op productie VPS via Docker Compose (`glitchtip-web`, `glitchtip-worker`, `glitchtip-postgres`, `glitchtip-redis`). Reverse proxy via DirectAdmin Apache; SSL via DirectAdmin Let's Encrypt op `monitoring.hausdesign.nl` (consistent met bestaande subdomain-pattern).
|
Self-hosted GlitchTip op productie VPS via Docker Compose (`glitchtip-web`, `glitchtip-worker`, `glitchtip-postgres`, `glitchtip-redis`). Reverse proxy via DirectAdmin Apache; SSL via DirectAdmin Let's Encrypt op `monitoring.hausdesign.nl` (consistent met bestaande subdomain-pattern).
|
||||||
|
|
||||||
|
**Lokale ontwikkeling:** dezelfde `docker-compose.glitchtip.yml` draait lokaal als `make services` (gecombineerd met de bestaande `docker-compose.yml` via `-f`). Web-UI op `http://localhost:8200`, e-mail naar Mailpit op `bm_mailpit:1025`. Dev-stack en prod-stack delen één compose-file zodat configuratie-drift uitgesloten is.
|
||||||
|
|
||||||
### 3.2 Twee projecten / DSNs
|
### 3.2 Twee projecten / DSNs
|
||||||
|
|
||||||
- `crewli-api` — Laravel
|
- `crewli-api` — Laravel
|
||||||
|
|||||||
@@ -70,11 +70,18 @@ Three terminal tabs, plus an optional fourth for the queue worker:
|
|||||||
|
|
||||||
| Terminal | Command | Where it runs | Port |
|
| Terminal | Command | Where it runs | Port |
|
||||||
|----------|---------|---------------|------|
|
|----------|---------|---------------|------|
|
||||||
| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit) |
|
| 1. Services | `make services` (from repo root) | Docker | 3306 (MySQL), 6379 (Redis), 8025 (Mailpit), 8200 (GlitchTip) |
|
||||||
| 2. API | `make api` (from repo root) | Laravel dev server | 8000 |
|
| 2. API | `make api` (from repo root) | Laravel dev server | 8000 |
|
||||||
| 3. SPA | `make app` (from repo root) | Vite dev server | 5174 |
|
| 3. SPA | `make app` (from repo root) | Vite dev server | 5174 |
|
||||||
| 4. Queue worker (optional) | `cd api && php artisan queue:listen redis --queue=emails` | Local PHP | n/a |
|
| 4. Queue worker (optional) | `cd api && php artisan queue:listen redis --queue=emails` | Local PHP | n/a |
|
||||||
|
|
||||||
|
Web UIs available once `make services` is up:
|
||||||
|
|
||||||
|
| Service | URL |
|
||||||
|
|---------|-----|
|
||||||
|
| Mailpit | <http://localhost:8025> |
|
||||||
|
| GlitchTip | <http://localhost:8200> (admin UI; first boot ~60s while migrations run) |
|
||||||
|
|
||||||
The queue worker is only needed when you're triggering email flows (registration, password reset, email change, invitations). Routine UI work doesn't require it.
|
The queue worker is only needed when you're triggering email flows (registration, password reset, email change, invitations). Routine UI work doesn't require it.
|
||||||
|
|
||||||
Stop services when done: `make services-stop`.
|
Stop services when done: `make services-stop`.
|
||||||
@@ -116,6 +123,13 @@ VITE_APP_NAME="Crewli"
|
|||||||
|
|
||||||
For production: `VITE_API_URL=https://api.crewli.app`.
|
For production: `VITE_API_URL=https://api.crewli.app`.
|
||||||
|
|
||||||
|
### `docker/glitchtip/.env`
|
||||||
|
|
||||||
|
Generated by copying `docker/glitchtip/.env.example`. Dev defaults are
|
||||||
|
functional out of the box — no edits needed for `make services`. See
|
||||||
|
[`GLITCHTIP.md`](./GLITCHTIP.md) for first-boot steps (creating the
|
||||||
|
superuser, creating the two projects, copying DSNs to 1Password).
|
||||||
|
|
||||||
## Common tasks
|
## Common tasks
|
||||||
|
|
||||||
### Run tests
|
### Run tests
|
||||||
|
|||||||
Reference in New Issue
Block a user