diff --git a/dev-docs/ARCH-OBSERVABILITY.md b/dev-docs/ARCH-OBSERVABILITY.md index 24b92ac7..7f109480 100644 --- a/dev-docs/ARCH-OBSERVABILITY.md +++ b/dev-docs/ARCH-OBSERVABILITY.md @@ -1,56 +1,405 @@ -# ARCH-OBSERVABILITY +# ARCH — Observability (v1.0) -> Crewli's observability architecture — logging, monitoring, alerting, -> metrics. +> **Source of truth** for Crewli's observability implementation +> (sentry-laravel + @sentry/vue + GlitchTip). This document supersedes +> [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md) for tag +> taxonomy, binding semantics, and operational patterns. The RFC +> remains the historical implementation-spec; ARCH is the +> post-implementation reference for developers maintaining or extending +> the stack. > -> Status: SKELETON. Section §3 (`$dontReport`) is concrete; other -> sections are structured placeholders for WS-7 sessie 1 decisions. +> **Status:** WS-7 implementation complete. Code criteria 3, 4, 5, 6, +> 11, 12, 13 satisfied; documentation criteria 8, 14 satisfied via this +> ARCH plus the runbooks under [`runbooks/`](./runbooks/). Manual +> closure criteria (1, 2, 7, 9, 10) remain on Bert's checklist. +> +> **Version:** 1.0 (initial post-implementation reference, mei 2026, +> after PR-1 → PR-4 landed in `feat/ws-7-observability`). +> +> **Pre-WS-7 skeleton:** earlier versions of this document (v0.1, +> april 2026) carried placeholder sections for log levels, metrics, +> alerting and dashboards. Those decisions were taken during WS-7 and +> implemented as code; this document captures the as-built outcome. +> The historical skeleton sections about metrics (§5), alerting (§6), +> and dashboards (§7) are intentionally not carried forward — Crewli +> has settled on errors-only observability via GlitchTip; no Statsd / +> Prometheus / Grafana stack is planned (RFC §2 amendment B). -## Document history +--- -- 2026-04-28 — v0.1 — Initial skeleton (WS-6 sessie 3b). Only §3 - concrete; remainder placeholdered for WS-7. +## §1 Doel & scope -## §1 — Logging strategy +Observability in Crewli levert geautomatiseerde error-detection en +service-availability monitoring. Stack traces, tags, breadcrumbs en +release-correlation worden verzameld via [GlitchTip](https://glitchtip.com/), +self-hosted op `monitoring.hausdesign.nl`. GlitchTip is binary-compatible +met het Sentry event-protocol; we gebruiken `sentry/sentry-laravel` op de +backend en `@sentry/vue` op de frontend ongewijzigd. -[WS-7: define log levels with explicit criteria. Example questions to -answer in WS-7 sessie 1: +**Wel in scope:** -- When does code use `Log::error` vs `Log::warning` vs `Log::info`? -- Are unhandled exceptions automatically `error`? -- Is `Log::debug` allowed in production, or stripped in deploy? -- How do structured payload conventions tie to log keys (see §4)? -] +- Programmer errors uit Laravel controllers en queue jobs (`Throwable`, + `RuntimeException`, `TypeError`, `QueryException` etc.) +- Infrastructure failures (database connection drop, redis unavailable, + external HTTP timeouts in Crewli's eigen client-code) +- Frontend runtime errors uit Vue componenten en composables +- Unhandled promise rejections in de SPA +- Vue Router navigatie-context als breadcrumbs -## §2 — Sentry decisions +**Niet in scope** (per [RFC §3.10](./RFC-WS-7-OBSERVABILITY.md)): -[WS-7: Sentry SDK install + configuration decisions. Skeleton: +- **Performance / tracing / profiling.** Hard-pinned op 0.0 sample rate + in zowel `config/sentry.php` als `apps/app/src/observability/sentry.ts`. +- **Verwachte business-uitkomsten.** ValidationException, + AuthenticationException, AuthorizationException, sub-500 + HttpExceptions worden bewust niet gecaptured. Die hebben eigen + audit-paden (`form_submission_action_failures`, `activity_log`). +- **Audit trails.** `activity_log`, `impersonation_audit_logs`, + `form_webhook_deliveries` blijven authoritative voor security en + compliance audit. GlitchTip is voor defectdetectie. +- **Replay / Web Vitals / User Feedback.** GlitchTip ondersteunt dit + niet; we gebruiken het ook niet als roadmap-item. +- **Metrics / dashboards / alerting beyond GlitchTip's email.** Geen + Statsd / Prometheus / Grafana. Alerting initieel email-naar-Bert via + GlitchTip's eigen rule-engine; Slack-integratie staat op BACKLOG. -- Which environments report to Sentry? (dev / staging / production) -- Sample rate per environment? -- Source map upload to Sentry CI? -- User context injection (auth user ID + organisation ID, opt-in - redaction for PII)? -- Breadcrumbs strategy (which events generate breadcrumbs)? -- Release tagging convention (commit SHA? semver? both?)? -] +**Boundary met bestaande systemen:** -## §3 — `$dontReport` exceptions (concrete) +| Systeem | Wat het doet | Wat GlitchTip NIET overneemt | +|---|---|---| +| Telescope (`/telescope`) | Dev-only debugging dashboard. Local + testing. | GlitchTip is voor production-incidents; Telescope blijft voor lokale debug. | +| `activity_log` (Spatie) | Audit trail van user-acties op tenant data. Authoritative voor "wie deed wat wanneer." | GlitchTip captured nooit business-events zoals `form.submitted`. | +| `form_webhook_deliveries` | Webhook delivery audit met retry / dead-letter. | Bij dead-letter NIET via GlitchTip; alleen als de dispatcher zelf een programmer-error gooit. | +| `form_submission_action_failures` | Apply-pipeline failures per submission, action, en organisatie. Org-admin operational handling via WS-6 admin UI. | GlitchTip ziet runtime apply-pipeline exceptions (zie §5.4) parallel — engineering visibility, niet operational fix UI. | +| `impersonation_audit_logs` | Wie impersoneerde wie wanneer (security audit). | GlitchTip tagt actieve impersonation als context op gecaptured events; vervangt audit niet. | +| Laravel default log channel | Operationele runtime logs (info/warning/error). | Beide systemen krijgen dezelfde events; correlation via `request_id`. | -The following exception classes are **expected business outcomes**, -not bugs. They are caught and handled in the application; reporting -them to Sentry would generate noise that drowns the signal. +--- -When the Sentry SDK lands (WS-7), add the following classes to -Laravel's `app/Exceptions/Handler.php` `$dontReport` array: +## §2 Componenten-overzicht -| Class | Reason | +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Crewli production / dev │ +│ │ +│ ┌─────────────────────┐ ┌──────────────────────────┐ │ +│ │ Laravel API │ │ apps/app SPA (Vue 3) │ │ +│ │ (api.crewli.app) │ │ (crewli.app) │ │ +│ │ │ │ │ │ +│ │ ┌───────────────┐ │ │ ┌────────────────────┐ │ │ +│ │ │sentry-laravel │ │ │ │@sentry/vue 10.x │ │ │ +│ │ │4.25 SDK │ │ │ │ │ │ │ +│ │ └───────┬───────┘ │ │ └─────────┬──────────┘ │ │ +│ │ │ │ │ │ │ │ +│ │ ┌───────▼───────┐ │ │ ┌─────────▼──────────┐ │ │ +│ │ │SentryEvent │ │ │ │scrubEvent │ │ │ +│ │ │Scrubber (PHP) │ │ │ │(TypeScript) │ │ │ +│ │ └───────┬───────┘ │ │ └─────────┬──────────┘ │ │ +│ │ │ │ │ │ │ │ +│ └──────────┼──────────┘ └────────────┼─────────────┘ │ +│ │ │ │ +│ │ HTTPS POST /api//envelope/ │ │ +│ └────────────┬────────────────────┘ │ +│ ▼ │ +│ ┌───────────────────────────────────────┐ │ +│ │ GlitchTip │ │ +│ │ monitoring.hausdesign.nl (prod) │ │ +│ │ localhost:8200 (dev) │ │ +│ │ │ │ +│ │ ┌─────────┐ ┌─────────┐ ┌───────┐ │ │ +│ │ │ web │ │ worker │ │ pg/ │ │ │ +│ │ │ (django)│ │ (celery)│ │ redis │ │ │ +│ │ └─────────┘ └─────────┘ └───────┘ │ │ +│ └───────────────────────────────────────┘ │ +│ │ +└──────────────────────────────────────────────────────────────────────────┘ +``` + +**Twee projecten in één GlitchTip-instance:** + +- `crewli-api` — Laravel events (`app=api` tag) +- `crewli-app` — SPA events (`app=app` tag) + +DSN per project; beide keys liggen in 1Password vault onder +`Crewli / GlitchTip / DSNs`. Backend leest `SENTRY_DSN_BACKEND`, +frontend `VITE_SENTRY_DSN_FRONTEND`. + +CSP `connect-src` whitelist voor de ingest-host is verplicht — zonder +deze whitelist blokkeert de browser elke `@sentry/vue`-egress stilletjes. +Zie [§7](#7-csp-whitelist-kritisch). + +--- + +## §3 Tag-taxonomie + +Deze tabel vervangt de tabel in [`RFC-WS-7-OBSERVABILITY.md §3.6`](./RFC-WS-7-OBSERVABILITY.md) +als source-of-truth. Wanneer een tag wordt toegevoegd of de bron-locatie +wijzigt, wordt deze tabel bijgewerkt; de RFC blijft historisch +document. + +### 3.1 Backend tags (sentry-laravel) + +| Tag | Locatie | Always / conditional | Bron | +|---|---|---|---| +| `app` | initial scope (`config/sentry.php`) of `BindSentryRouteContext` | always | constant `'api'` | +| `release` | sentry-laravel built-in | when `SENTRY_RELEASE` env set | `crewli-api@` injected by `deploy.sh` | +| `environment` | sentry-laravel built-in | always | `APP_ENV` | +| `route_name` | `BindSentryRouteContext` middleware | conditional (named routes only) | `$request->route()->getName()` | +| `http.method` | `BindSentryRouteContext` middleware | always (HTTP requests) | `$request->method()` | +| `actor_scope` | `AuthScopeContextListener` | always (authenticated events) | resolution chain — see §3.3 | +| `actor_type` | `AuthScopeContextListener` | always (authenticated events) | `ActorType::resolve()` — see §3.4 | +| `user_id` | `AuthScopeContextListener` (overridden by `HandleImpersonation`) | always (authenticated) | ULID | +| `username` | `AuthScopeContextListener` user object | always (authenticated) | ULID — RFC §3.8: never email | +| `organisation_id` | `AuthScopeContextListener` | conditional — only when `actor_scope=organisation` | route param / portal-token resolution — see §3.3 | +| `event_id` | `AuthScopeContextListener` (via `{event}` route param) | conditional | route binding | +| `impersonation.active` | `AuthScopeContextListener` baseline + `HandleImpersonation` override | **always** (authenticated) | binary `'true'`/`'false'` — RFC §3.6 invariant | +| `impersonation.impersonator_user_id` | `HandleImpersonation` middleware | conditional — only when impersonating | ULID | +| `impersonation.session_id` | `HandleImpersonation` middleware | conditional — only when impersonating | ULID | +| `queue.attempt` | `TagJobAttemptOnSentry` listener | conditional — within queue jobs | `$event->job->attempts()` | + +### 3.2 Frontend tags (@sentry/vue) + +| Tag | Locatie | Always / conditional | Bron | +|---|---|---|---| +| `app` | initial scope (`sentry.ts`) | always | constant `'app'` | +| `release` | sentry-vue built-in | when `VITE_SENTRY_RELEASE` set | `crewli-app@` injected by `deploy.sh` build-time | +| `environment` | sentry-vue built-in | always | `import.meta.env.MODE` | +| `route_name` | `installContextBinding` Vue Router guard | always | `route.name ?? 'unnamed'` | +| `actor_scope` | `installContextBinding` | always | one of `organisation` / `platform` / `user` / `portal` / `anonymous` | +| `actor_type` | `installContextBinding` | always | `super_admin` / `organizer_admin` / `org_member` / `portal_token` / `unauthenticated` | +| `user_id` | `installContextBinding` | conditional — **never** when `actor_scope=portal` | `useAuthStore().user.id` | +| `organisation_id` | `installContextBinding` | conditional — only when `actor_scope=organisation` | `useOrganisationStore().activeOrganisationId` | + +`http.method` is afwezig in de frontend-tabel; Vue Router-routes zijn +page-level navigation events, niet HTTP-requests. De backend-tabel +heeft `http.method` per request; de frontend laat die over aan +fetch-breadcrumbs die `@sentry/vue` automatisch attached. + +### 3.3 `actor_scope` resolution + +Beide implementaties volgen dezelfde priority chain. De backend doet +de resolution in `AuthScopeContextListener::resolveTenantContext()`, +de frontend in `contextBinding.ts::bindScope()`. + +| Priority | Backend signaal | Frontend signaal | Resulterende `actor_scope` | +|---|---|---|---| +| 1 | `{organisation}` route-param | route binding op `/organisations/:id` | `organisation` | +| 2 | `{event}` route-param | `{event}` route-param | `organisation` (via event.organisation_id) | +| 3 | `portal_event` request attribute (set by `PortalTokenMiddleware`) | `route.meta.public === true && route.meta.context === 'portal'` | backend: `organisation`, frontend: `portal` | +| 4 | `super_admin` role + route name starts with `admin.` | `super_admin` role + `route.path` starts with `/platform` | `platform` (no `organisation_id`) | +| 5 | Authenticated, no org context | Authenticated, no `activeOrganisationId` | `user` (no `organisation_id`) | +| 6 | Unauthenticated | Unauthenticated | (backend: not bound; frontend: `anonymous`) | + +**Belangrijke noot voor frontend portal-zone:** `actor_scope=portal` +kent **geen** `user_id` of `username`. RFC §3.7 frontend-block punt 5 +expliciet — token-based flows (artist advance, public form fill) +krijgen geen user-context omdat de identifier (ULID-token) zelf +gevoelig is en de bezoeker niet permanent met Crewli is verbonden. + +**Multi-tenant invariant:** wanneer `actor_scope=organisation` MOET +`organisation_id` aanwezig zijn als valide ULID. Wanneer +`actor_scope=platform`, `user`, of `anonymous`, IS `organisation_id` +afwezig. Niet "altijd aanwezig" maar "altijd correct gerelateerd aan +`actor_scope`." Geverifieerd in +`AuthScopeContextListenerTest::test_organisation_id_present_when_actor_scope_is_organisation`. + +### 3.4 `actor_type` enum + +Backend: `App\Enums\Observability\ActorType` (PHP enum). Frontend: +inline string-mapping in `contextBinding.ts::resolveActorType()`. Beide +geven dezelfde waarden: + +| Waarde | Wanneer | |---|---| -| `\App\Exceptions\FormBuilder\PublishGuardViolationException` | Publish-time validation: schema fails a guard. Returned as 422 with field-level errors. Not a system bug. | -| `\App\Exceptions\FormBuilder\PurposeRequirementsNotMetException` | Schema lacks required bindings for its purpose. Returned as 422. Not a system bug. | -| `\App\Exceptions\FormBuilder\IdempotencyConflictException` | Duplicate idempotency key on submission. Returned as 409. Not a system bug. | +| `super_admin` | User has Spatie role `super_admin` | +| `organizer_admin` | User has Spatie role `org_admin` | +| `org_member` | Authenticated user, no admin role (covers volunteers — Crewli has no dedicated `volunteer` role today; see [BACKLOG OBS-1](./BACKLOG.md)) | +| `portal_token` | Token-based portal request (`portal_event` attribute / `route.meta.public` + `context=portal`) | +| `unauthenticated` | No auth (e.g. login page, public form fill) | -**Out of scope for `$dontReport` (these DO go to Sentry):** +--- + +## §4 Tag-binding architectuur + +Drie patronen die we bewust hebben gekozen tijdens WS-7 implementation; +gedocumenteerd hier zodat toekomstige uitbreidingen consistent zijn. + +### 4.1 Backend split — middleware × event-listener + +Route-scope tags binden **per HTTP request** via middleware. Auth-scope +tags binden **per authenticatie-event** via een listener. Reden: +route-context bestaat alleen tijdens HTTP handling, auth-context wordt +geëmit door élke authenticator (Laravel's `SessionGuard`, Sanctum's +bearer-token Guard, toekomstige authenticators). + +| Concern | Implementatie | +|---|---| +| Route-scope (`app`, `route_name`, `http.method`) | `App\Http\Middleware\BindSentryRouteContext` — registered globally on the api group via `$middleware->api(prepend: [...])` in `bootstrap/app.php` | +| Auth-scope (`user_id`, `actor_type`, `actor_scope`, `organisation_id`) | `App\Listeners\Observability\AuthScopeContextListener` — listens to BOTH `Illuminate\Auth\Events\Authenticated` (SessionGuard) AND `Laravel\Sanctum\Events\TokenAuthenticated` (Sanctum) | +| Impersonation override + escalation | `App\Http\Middleware\HandleImpersonation` — re-binds Sentry scope after the user-swap, sets `impersonation.active='true'` plus impersonator/session ids | +| Queue context (`queue.attempt`) | `App\Listeners\Observability\TagJobAttemptOnSentry` — listens to `Illuminate\Queue\Events\JobProcessing` | + +**Waarom dual-event listener?** Crewli's HTTP-flow is bearer-token via +`CookieBearerToken` middleware → `auth:sanctum` → Sanctum's `Guard` +fires only `TokenAuthenticated`, NOT `Authenticated`. Listening only +to the Authenticated event would silently miss every authenticated +HTTP request. Discovered by the live smoke test that PR-3 follow-up +fixed (commit `adab3be`). + +### 4.2 Frontend split — Vue Router guard + Pinia store reads + +Vue heeft geen Sentry-equivalent van Laravel's Authenticated event; de +natuurlijke tag-binding momenten zijn **route-transitions**. Een +`router.beforeEach` guard in `apps/app/src/observability/contextBinding.ts`: + +1. Roept `Sentry.getCurrentScope().clear()` aan op elke navigatie. + Voorkomt cross-zone leakage (e.g. user logt uit in portal-zone maar + Sentry houdt user_id van de organizer-context vast). +2. Leest `useAuthStore()` en `useOrganisationStore()` voor identity en + tenant-context. +3. Past dezelfde resolution chain toe als de backend (§3.3). + +### 4.3 Default-in-listener / override-in-middleware pattern + +Voor binary tags die altijd aanwezig moeten zijn maar door specifieke +middleware-stappen worden geëscaleerd, gebruiken we een twee-fase +pattern: + +``` +AuthScopeContextListener::bindForUser() → scope.setTag('impersonation.active', 'false') + ↓ +HandleImpersonation::handle() → scope.setTag('impersonation.active', 'true') + scope.setTag('impersonation.impersonator_user_id', $admin->id) + scope.setTag('impersonation.session_id', $session->id) +``` + +De listener seedt **altijd** een baseline (`'false'`). Wanneer +impersonation actief is, draait `HandleImpersonation` ná auth en +overschrijft de scope met de target user en de escalation-tags. Als +toekomstige refactors per `actor_scope` branch shortcuts maken die de +baseline overslaan, vangt +`AuthScopeContextListenerTest::test_impersonation_active_default_false_across_every_actor_scope_branch` +de regressie. + +Dit pattern is herbruikbaar voor andere binary signals; tot nu toe +alleen toegepast op `impersonation.active`. + +### 4.4 Listener registration discipline + +Laravel 12's listener auto-discovery is uitgeschakeld in +`bootstrap/app.php` via `->withEvents(discover: false)`. Reden: +auto-discovery + explicit `Event::listen()` veroorzaakt silent +double-registration (vandaag idempotent door scope-tag overwrite +semantics, morgen niet meer wanneer een listener additive operations +doet). Gevangen door +`tests/Feature/Observability/EventListenerRegistrationTest`. + +**Voor élke nieuwe observability-listener:** + +1. Maak listener-class in `app/Listeners/Observability/`. +2. Registreer **expliciet** in `AppServiceProvider::boot()` met + array-callable form `[Class::class, 'method']`. Class-string vorm + verbergt method-binding in `php artisan event:list`. +3. Voeg een case toe aan + `EventListenerRegistrationTest::test_*_listener_registered_exactly_once` + met de juiste event-class + method-naam. + +--- + +## §5 Scrubbing semantics + +### 5.1 Backend — `App\Services\Observability\SentryEventScrubber` + +Geregistreerd als `before_send` hook in `config/sentry.php` via +array-callable static-method notation. Stateless; geen +container-resolution per event. + +**Wat wordt gescrubt:** + +1. **Request body keys** (recursief, key-name match, depth-limited): + `password`, `password_confirmation`, `current_password`, `token`, + `api_key`, `secret`, `webhook_secret`, `dsn`, `signature`, + `authorization`, `cookie`, `bearer`, `iban`, `bic`, + `passport_number`, `bsn`. Replace value met `[scrubbed]`. + +2. **Request headers** (case-insensitive): `authorization`, `cookie`, + `set-cookie`, `x-api-key`, `x-impersonation-token`. Replace met + `[scrubbed]`. + +3. **Form submissions:** élke payload-key `form_values` wordt + wholesale replaced met `[scrubbed_form_values]`. Reden: Crewli's + form-builder genereert dynamische form-values waar elke key PII + kan zijn (email, telefoon, dietary, medical). Selectief op key + matchen is niet veilig. + +4. **URL query string:** `token=`, `api_key=` worden gescrubt. + +5. **Cookies wholesale:** `event.request.cookies` wordt vervangen door + `[scrubbed]`. + +6. **Max-depth guard** op recursie: na 10 levels wordt subtree + replaced met `['[max_depth]']` om malicious deeply-nested payloads + te beperken. + +**Sub-500 HttpException filter:** wanneer +`$hint?->exception instanceof HttpException && $hint->exception->getStatusCode() < 500`, +returnt de scrubber `null` → event wordt niet gestuurd. Reden: 404, +403, 422 etc. zijn verwachte business-uitkomsten (RFC §3.10), niet +programmer-errors. `ignore_exceptions` in `config/sentry.php` doet +class-only filtering; status-based filtering moet hier. + +### 5.2 Frontend — `apps/app/src/observability/scrubber.ts` + +TypeScript port van de backend-scrubber met identieke semantics. Plus: + +7. **Storage context strip:** `event.contexts.storage` wordt gestript. + Sentry doesn't add this by default but defensively. RFC §3.7 + frontend point 2 — localStorage / sessionStorage **never** in event + context (Crewli's portal-state in sessionStorage MAG NIET lekken). + +8. **`event.user.cookies` strip:** als sentry's BrowserSession + integration `document.cookie` exposure via user-context injecteert, + wordt het weggehaald. + +9. **Cookies wholesale (typed shape):** `event.request.cookies` is + typed `Record` in `@sentry/vue`. Replace met + `{ scrubbed: '[scrubbed]' }` in plaats van een string — preserves + the typed shape. + +### 5.3 Boundary: business outcomes vs programmer/infra errors + +| Exception class | Backend behaviour | Reden | +|---|---|---| +| `Throwable`, `RuntimeException`, `TypeError` | Captured | Programmer error | +| `QueryException`, `PDOException` | Captured | Infra error | +| `ValidationException` | NOT captured (`ignore_exceptions`) | Verwacht user-input error | +| `AuthenticationException` | NOT captured (`ignore_exceptions`) | Verwacht user-state error | +| `AuthorizationException` | NOT captured (`ignore_exceptions`) | Verwacht user-permission error | +| `HttpException` status `< 500` | NOT captured (scrubber returns `null`) | Verwacht 4xx outcome | +| `HttpException` status `>= 500` | Captured | Genuine server error | + +`Integration::handles($exceptions)` in `bootstrap/app.php` is **niet +auto-registered** door sentry-laravel 4.x. Zonder deze regel runt +`report($e)` alleen door Laravel's default reporter (logs to channel) +en bereikt het Sentry niet. Gedekt door +`tests/Feature/Observability/ExceptionReportingTest`. Zie ook +[BACKLOG OBS-6](./BACKLOG.md). + +**Voor élke nieuwe `$exceptions->render(...)` handler in +`bootstrap/app.php`:** Laravel's flow is `report()` → `render()`. Als +de handler een Throwable consumeert en een Response retourneert, zorgt +de framework-flow voor `report()` automatisch. **Render handlers MOGEN +NIET** `report()` hand-rollen of vroegtijdig short-circuiten — zie +[BACKLOG OBS-7](./BACKLOG.md) voor expansion plan. + +### 5.4 Form Builder runtime exceptions (concrete classification) + +Form Builder is Crewli's grootste runtime-domein met eigen +exception-hierarchy (zie [`ARCH-FORM-BUILDER.md`](./ARCH-FORM-BUILDER.md)). +De classificatie tussen "expected business outcome" en "programmer / +infra error" voor deze classes is concreet vastgelegd: + +**Wel naar GlitchTip (programmer/infra errors):** - `App\Exceptions\FormBuilder\PersonProvisioningException` — runtime failure during the apply pipeline. Caught by @@ -59,63 +408,323 @@ Laravel's `app/Exceptions/Handler.php` `$dontReport` array: visibility into recurring patterns across orgs. - `App\Exceptions\FormBuilder\PurposeSubjectResolutionException` — runtime resolution failure (no portal token, no auth user, etc.). - Same dual-handling rationale: action-failures table for - org-admin operational handling; Sentry for engineering visibility. + Same dual-handling rationale: action-failures table for org-admin + operational handling; GlitchTip for engineering visibility. - `App\Exceptions\FormBuilder\FormBindingApplicatorException` — - runtime applicator failure (no_transaction, no_schema, - unknown_purpose). These should never happen in production; if they - do, they're systemic bugs — Sentry is the correct destination. + runtime applicator failure (`no_transaction`, `no_schema`, + `unknown_purpose`). These should never happen in production; if + they do, they're systemic bugs — GlitchTip is the correct + destination. -The dual recording (Sentry + `form_submission_action_failures` table) -is intentional: org admins fix specific failures via the WS-6 admin -UI; engineering identifies systemic issues across all orgs via -Sentry's aggregation. +**Niet naar GlitchTip (expected business outcomes):** -## §4 — Structured logging conventions +- `App\Exceptions\FormBuilder\PublishGuardViolationException` — + publish-time validation: schema fails a guard. Returned as 422 + with field-level errors. Not a system bug. +- `App\Exceptions\FormBuilder\PurposeRequirementsNotMetException` — + schema lacks required bindings for its purpose. Returned as 422. + Not a system bug. +- `App\Exceptions\FormBuilder\IdempotencyConflictException` — + duplicate idempotency key on submission. Returned as 409. Not a + system bug. -[WS-7: log key naming convention. Skeleton: +Dual-handling voor de eerste groep is intentional: org-admins fixen +specifieke failures via de WS-6 admin UI; engineering identificeert +systemic issues across all orgs via GlitchTip's aggregation. De +"niet naar GlitchTip" groep is afgedekt door `ignore_exceptions` (voor +`PublishGuardViolationException` etc. die `HttpExceptionInterface` +implementeren via 422 response) en moet bij toevoeging van een nieuwe +expected-outcome class expliciet worden uitgezonderd in +`config/sentry.php`. -- Hierarchical dot-separated namespace tree -- Existing examples to align with: - - `form-builder.apply.transaction_rolled_back` - - `form-builder.identity-match.no_person_subject_post_apply` - - `form-webhook.delivery.exception` +--- -Define the tree formally so future code discovers the right namespace -deterministically.] +## §6 Runtime context-split (frontend) -## §5 — Metrics +Vie zones, gedecide per `route.path` en `route.meta`: -[WS-7: which counters / histograms / gauges? Namespace? -Statsd / Prometheus / OTel flavour? At minimum, candidate metrics: +### 6.1 `actor_scope=organisation` -- `form_submissions_total` (counter, tagged by purpose) -- `form_submission_apply_status` (counter, tagged by status) -- `form_failures_open` (gauge per org) -- `retry_attempts_total` (counter, tagged by outcome) -- `apply_pipeline_duration_seconds` (histogram) -] +- Organizer routes met active org context (`useOrganisationStore().activeOrganisationId !== null`) +- Tags: `actor_scope=organisation`, `organisation_id=`, plus user-context +- Voorbeelden: `/organisations/:id/dashboard`, `/events/:id`, `/dashboard` -## §6 — Alerting rules +### 6.2 `actor_scope=platform` -[WS-7: which thresholds trigger alerts? Where (Slack? PagerDuty? -Email?). At minimum, candidate alerts: +- super_admin op `/platform/*` paths +- Tags: `actor_scope=platform`, GEEN `organisation_id` +- **Geforceerde org-attribution zou misleidend zijn.** Platform-mode + events spannen impliciet over alle organisaties. -- "Open failures > X for > Y hours" -- "Apply pipeline error rate > X% in 1h window" -- "no_transaction guard fired" (immediate alert; should never happen - in production) -- "Webhook dead-letter rate > X%" -] +### 6.3 `actor_scope=user` -## §7 — Dashboards +- Authenticated user op routes zonder org-scope (`/account-settings`, + `/portal/profiel`) +- Tags: `actor_scope=user`, GEEN `organisation_id` +- Reden: Crewli's User↔Organisation is many-to-many; geen reliable + single-org hint zonder route-context. -[WS-7: Grafana / Cloudwatch / similar. Panel layout, widget types, -default time ranges. Skeleton later.] +### 6.4 `actor_scope=portal` -## Related docs +- Token-based portal flows: `route.meta.public === true && route.meta.context === 'portal'` +- Concrete routes: `/portal/advance/:token` (artist advance), + `/register/:public_token` (public form fill) +- Tags: `actor_scope=portal`, `actor_type=portal_token` +- **Geen `user_id`, geen `username`** — RFC §3.7 frontend point 5. + De ULID-token zelf is gevoelig; de bezoeker is niet permanent met + Crewli verbonden. +- Backend portal-token request resolves de organisation via de + matching artist/event row; frontend events correleren via + `request_id` back naar het backend-event dat wel `organisation_id` + heeft. -- `RFC-WS-6.md` — WS-6 binding pipeline design (the failures observed - and recorded by §3's classes originate here) -- `ARCH-BINDINGS.md` — apply pipeline architecture -- `ARCH-FORM-BUILDER.md` — form-builder runtime including webhooks +### 6.5 `actor_scope=anonymous` + +- Public routes zonder auth: `/login`, `/forgot-password`, `/register`, + `/invitations/:token` (acceptance flow) +- Tags: `actor_scope=anonymous`, `actor_type=unauthenticated` + +### 6.6 Cross-zone leakage prevention + +`Sentry.getCurrentScope().clear()` wordt aangeroepen op élke +route-transitie in `installContextBinding`. Voorbeeld: user logt uit +in organizer-context, navigeert naar `/login`. Zonder clear zou het +volgende anonymous error-event nog `user_id` van de uitgelogde +gebruiker dragen. Met clear wordt het Sentry-scope reset; de +unauthenticated event krijgt alleen de zojuist gebonden anonymous-tags. + +Test: `contextBinding.spec.ts::test_cross-zone_leak_guard`. + +--- + +## §7 CSP whitelist (kritisch) + +Crewli's strict CSP `connect-src` directive moet de GlitchTip +ingest-host expliciet whitelisten. Zonder deze entry blokkeert de +browser elke `@sentry/vue` POST stilletjes met *"Refused to connect +because it violates the following Content Security Policy directive"* +in DevTools Console — de SDK denkt dat het werkt, maar geen events +bereiken GlitchTip. + +| Environment | CSP-locatie | `connect-src` entry | +|---|---|---| +| Dev | `apps/app/index.html` meta tag | `http://localhost:8200` | +| Prod organizer SPA | `deploy/nginx/csp-spa.conf` (Report-Only én Enforce regels) | `https://monitoring.hausdesign.nl` | +| API JSON responses | `api/config/security.php` — geen update | `default-src 'none'`; geen `connect-src` want JSON-context heeft geen fetch-origin | + +**Bij introductie van een nieuwe environment** (bijv. staging — zie +[BACKLOG OBS-9](./BACKLOG.md)) MOET: + +1. De bijbehorende GlitchTip ingest-host worden toegevoegd aan de + juiste CSP-locatie. +2. `tests/Feature/Security/CspConnectsToObservabilityTest` worden + uitgebreid met een staging-assertion zodat de regression-guard de + nieuwe environment dekt. + +--- + +## §8 Sourcemap upload (frontend) + +Vite produceert sourcemaps voor élke chunk (`build.sourcemap=true` in +`vite.config.ts`). `deploy.sh` uploadt ze naar GlitchTip én verwijdert +ze uit `dist/` vóór nginx ze serveert. RFC §3.5: **never** +public-mapped sources op productie. + +``` +vite build → apps/app/dist/assets/*.js + *.js.map + │ + ▼ + sentry-cli sourcemaps upload --org $SENTRY_ORG \ + --project crewli-app \ + --release $VITE_SENTRY_RELEASE \ + --url-prefix "~/assets/" \ + apps/app/dist/assets + │ + ▼ + find apps/app/dist -name '*.map' -type f -delete + │ + ▼ + nginx serves dist/ +``` + +**Required env vars** (deploy host alleen, niet committed): + +| Var | Beschrijving | +|---|---| +| `SENTRY_AUTH_TOKEN` | Per-project upload-only token in GlitchTip. Bert provisioned dit handmatig in `crewli-app` project settings. | +| `SENTRY_ORG` | GlitchTip organisation slug. Default in `deploy.sh`: `crewli`. | +| `VITE_SENTRY_DSN_FRONTEND` | Aanwezigheid is conditional — als deze ontbreekt skipt `deploy.sh` upload (soft fail) maar voert alsnog `*.map` strip uit. | +| `VITE_SENTRY_RELEASE` | Build-time injected door `deploy.sh`: `crewli-app@$(git rev-parse --short HEAD)`. | + +**Soft-fail:** als upload faalt (GlitchTip unreachable, expired token), +gaat de deploy door en logt een warning. De `find … -delete` stap loopt +**altijd**. Beter unmapped stack traces in GlitchTip dan een +geblokkeerde deploy. + +--- + +## §9 GDPR & privacy + +### 9.1 Processing register + +Crewli is **controller** voor GlitchTip-data (self-hosted op +Crewli-infra). Geen processor-relatie, geen DPA-uitbreiding nodig. +Processing register entry: zie +[`SECURITY_AUDIT.md`](./SECURITY_AUDIT.md), "WS-7 Observability — +finale audit". + +### 9.2 Data na scrubbing + +Wat een GlitchTip-event nog kan bevatten: + +- ULIDs (user_id, organisation_id, event_id, request_id, session_id) +- Stack traces (zonder locals — `send_default_pii=false`) +- Route names en HTTP methods +- Gecureerde tags (zie §3) +- Breadcrumbs (input-text masked, console-integration off in prod) + +Wat **niet**: emails, telefoonnummers, namen, IP-adressen, raw +form_values, raw cookies, raw headers (Authorization etc.). + +### 9.3 Retention + +90 dagen, daarna purged door GlitchTip's eigen partition-maintenance +loop (zie [`GLITCHTIP.md`](./GLITCHTIP.md) monitoring sectie). +Configurable via GlitchTip admin UI (settings → environment-config). + +### 9.4 Right to erasure (Art. 17) + +Initieel handmatig. Procedure: zie +[`runbooks/observability-erasure.md`](./runbooks/observability-erasure.md). +Geautomatiseerd erasure-script blijft op BACKLOG (referentie in de +RFC; nog niet als concrete entry in BACKLOG.md). + +--- + +## §10 Onderhoud & uitbreiding + +### 10.1 Een nieuwe tag toevoegen + +Bepaal eerst de **bron** van de tag. Drie patronen: + +| Bron | Pattern | Voorbeeld | +|---|---|---| +| HTTP request context (route, method, headers) | Middleware | `BindSentryRouteContext` | +| Auth context (user, role, org) | Listener op `Authenticated` + `TokenAuthenticated` | `AuthScopeContextListener` | +| Domain event (job processing, custom event) | Listener op het domain event | `TagJobAttemptOnSentry` | +| Static / build-time | `config/sentry.php` initial scope | `app=api` | + +Voor élke nieuwe tag: + +1. Voeg toe aan §3 tabel hierboven. +2. Implementeer in de gekozen locatie. +3. Bij listeners: registreer expliciet in `AppServiceProvider::boot()` + met array-callable form, en voeg case toe aan + `EventListenerRegistrationTest`. +4. Schrijf een feature-test die de tag op een live HTTP flow asserteert + (volg het pattern van `AuthScopeBindingHttpFlowTest`). +5. Frontend mirror: voeg toe aan + `apps/app/src/observability/contextBinding.ts` en aan + `contextBinding.spec.ts`. + +### 10.2 Een nieuwe scrubbing-rule toevoegen + +1. Backend: voeg key toe aan `SENSITIVE_BODY_KEYS` of + `SENSITIVE_HEADERS` in + `app/Services/Observability/SentryEventScrubber.php`. +2. Frontend: identieke wijziging in + `apps/app/src/observability/scrubber.ts`. +3. Voeg test-case toe aan beide: + `tests/Feature/Observability/PiiScrubbingTest.php` (PHP) en + `apps/app/src/observability/__tests__/scrubber.spec.ts` (TypeScript). +4. Beide testbestanden moeten de nieuwe key dekken — backend en + frontend zijn semantisch gelijk en moeten dat blijven. + +### 10.3 Een nieuwe `$exceptions->render(...)` handler + +Per [BACKLOG OBS-7](./BACKLOG.md): nieuwe render handlers MOGEN NIET +short-circuiten zonder `report($e)`. Laravel's flow is `report()` → +`render()` automatisch; render handlers die een Response retourneren +hebben report al gehad. + +Als de nieuwe handler een Throwable consumeert die niet via +`Integration::handles()` zou gaan (e.g. een eigen `$exception->report()` +methode op een custom exception), voeg een case toe aan +`ExceptionReportingTest` die bewijst dat het event alsnog gecaptured +wordt. + +### 10.4 Een nieuwe environment (staging, demo, …) + +Zie [BACKLOG OBS-9](./BACKLOG.md). Vereist: + +1. GlitchTip-project provisioning + DSN naar 1Password. +2. CSP whitelist update (`apps/app/index.html` voor dev-style env, of + nieuwe nginx-config voor prod-style env). +3. `tests/Feature/Security/CspConnectsToObservabilityTest` uitbreiden + met assertion voor de nieuwe environment. +4. `deploy.sh` aanpassen als de release-tag-vorm verandert (default: + `crewli-app@`). + +### 10.5 Een nieuwe Form Builder exception class + +Zie §5.4. Bij toevoeging van een nieuwe FormBuilder exception: + +- Als het een **expected business outcome** is: voeg toe aan + `ignore_exceptions` in `config/sentry.php` als de class niet via + `HttpException` of `ValidationException` afhandeling al geignored + wordt. Documenteer in §5.4. +- Als het een **programmer/infra error** is: niets toevoegen, de class + flowt automatisch via `Integration::handles($exceptions)`. + +--- + +## §11 Verwijzingen + +**Implementatie:** + +- [`api/app/Services/Observability/SentryEventScrubber.php`](../api/app/Services/Observability/SentryEventScrubber.php) +- [`api/app/Listeners/Observability/AuthScopeContextListener.php`](../api/app/Listeners/Observability/AuthScopeContextListener.php) +- [`api/app/Listeners/Observability/TagJobAttemptOnSentry.php`](../api/app/Listeners/Observability/TagJobAttemptOnSentry.php) +- [`api/app/Http/Middleware/BindSentryRouteContext.php`](../api/app/Http/Middleware/BindSentryRouteContext.php) +- [`api/app/Http/Middleware/HandleImpersonation.php`](../api/app/Http/Middleware/HandleImpersonation.php) +- [`api/config/sentry.php`](../api/config/sentry.php) +- [`api/bootstrap/app.php`](../api/bootstrap/app.php) +- [`apps/app/src/observability/sentry.ts`](../apps/app/src/observability/sentry.ts) +- [`apps/app/src/observability/scrubber.ts`](../apps/app/src/observability/scrubber.ts) +- [`apps/app/src/observability/contextBinding.ts`](../apps/app/src/observability/contextBinding.ts) +- [`apps/app/index.html`](../apps/app/index.html) +- [`deploy/nginx/csp-spa.conf`](../deploy/nginx/csp-spa.conf) +- [`deploy.sh`](../deploy.sh) + +**Tests (regression guards):** + +- `tests/Feature/Observability/PiiScrubbingTest.php` +- `tests/Feature/Observability/AuthScopeContextListenerTest.php` +- `tests/Feature/Observability/AuthScopeBindingHttpFlowTest.php` +- `tests/Feature/Observability/BindSentryRouteContextTest.php` +- `tests/Feature/Observability/ExceptionReportingTest.php` +- `tests/Feature/Observability/RequestIdRoundTripTest.php` +- `tests/Feature/Observability/EventListenerRegistrationTest.php` +- `tests/Feature/Database/ActivityLogIndexesTest.php` +- `tests/Feature/Security/CspHeaderTest.php` +- `tests/Feature/Security/CspConnectsToObservabilityTest.php` +- `apps/app/src/observability/__tests__/scrubber.spec.ts` +- `apps/app/src/observability/__tests__/contextBinding.spec.ts` + +**Documenten:** + +- [`RFC-WS-7-OBSERVABILITY.md`](./RFC-WS-7-OBSERVABILITY.md) — + historische implementation-spec +- [`GLITCHTIP.md`](./GLITCHTIP.md) — operational runbook +- [`runbooks/observability-triage.md`](./runbooks/observability-triage.md) — + incoming-issue triage procedure +- [`runbooks/observability-erasure.md`](./runbooks/observability-erasure.md) — + GDPR Art. 17 procedure +- [`SECURITY_AUDIT.md`](./SECURITY_AUDIT.md) — A13-9 (CSP) + WS-7 + finale entry (processing register, security controls) +- [`BACKLOG.md`](./BACKLOG.md) — OBS-* entries (active + resolved) +- [`ARCH-FORM-BUILDER.md`](./ARCH-FORM-BUILDER.md) — Form Builder runtime + (consumer of §5.4 exception classification) +- [`ARCH-BINDINGS.md`](./ARCH-BINDINGS.md) — apply pipeline (origin of + the runtime exceptions captured in §5.4) +- [`RFC-WS-6.md`](./RFC-WS-6.md) — WS-6 binding pipeline design