WS-7 Observability — closure #8

Merged
bert.hausmans merged 30 commits from feat/ws-7-observability into main 2026-05-07 22:49:31 +02:00

WS-7 Observability is afgerond als implementation-werk. 4 PRs gemerged op feat/ws-7-observability:

  • PR-1 Infra: GlitchTip Docker stack (5f6fc07)
  • PR-2 Backend SDK: sentry-laravel + scrubber + listener-discipline (bdb89a2..0379016)
  • PR-3 Frontend SDK: @sentry/vue + CSP whitelist (bc47783..5c42f27)
  • PR-4 Docs + WS-8b: ARCH-OBSERVABILITY.md + 2 runbooks (754222f..e9da01f)
  • BACKLOG closure (d4a450d)

Tests: 1551 backend + 252 frontend, alle groen.

Acceptance criteria 1-14 voldaan:

  • Implementation criteria 3, 4, 5, 6, 8, 11, 12, 13, 14 via PRs
  • Operationele criteria 1, 2, 7, 9, 10 via deploy-checklist (DNS monitoring.hausdesign.nl, TLS, superuser+2FA, prod DSNs, email-alerting, retention 90d, cron backup)

Architecturale patronen vastgelegd:

  • Explicit > implicit listener registration (auto-discovery uit, expliciete Event::listen calls)
  • Default-in-listener / override-in-middleware voor binary tags (impersonation.active)
  • Tenant resolution chain (route-param → portal-token → super_admin platform → user fallback)
  • Runtime context-split via actor_scope tag (organisation / platform / user / anonymous / portal)

Observability live op monitoring.hausdesign.nl met twee GlitchTip projecten (crewli-api + crewli-app), 90-dagen retention, email-alerting naar support@hausdesign.nl, dagelijks backup-cron.

Refs:

  • dev-docs/ARCH-OBSERVABILITY.md (730 regels — post-implementation reference)
  • dev-docs/runbooks/observability-triage.md (P0-P3 classificatie, request_id correlation)
  • dev-docs/runbooks/observability-erasure.md (GDPR Art. 17 procedure)
  • dev-docs/SECURITY_AUDIT.md (WS-7 Observability finale audit sectie, 7 controls)
  • dev-docs/BACKLOG.md (closure entry + 5 follow-up items: OBS-1, OBS-4, OBS-6, OBS-7, OBS-9)

Merge style: Merge commit (--no-ff equivalent in Gitea UI = "Create merge commit").

WS-7 Observability is afgerond als implementation-werk. 4 PRs gemerged op feat/ws-7-observability: - PR-1 Infra: GlitchTip Docker stack (5f6fc07) - PR-2 Backend SDK: sentry-laravel + scrubber + listener-discipline (bdb89a2..0379016) - PR-3 Frontend SDK: @sentry/vue + CSP whitelist (bc47783..5c42f27) - PR-4 Docs + WS-8b: ARCH-OBSERVABILITY.md + 2 runbooks (754222f..e9da01f) - BACKLOG closure (d4a450d) **Tests:** 1551 backend + 252 frontend, alle groen. **Acceptance criteria 1-14 voldaan:** - Implementation criteria 3, 4, 5, 6, 8, 11, 12, 13, 14 via PRs - Operationele criteria 1, 2, 7, 9, 10 via deploy-checklist (DNS monitoring.hausdesign.nl, TLS, superuser+2FA, prod DSNs, email-alerting, retention 90d, cron backup) **Architecturale patronen vastgelegd:** - Explicit > implicit listener registration (auto-discovery uit, expliciete Event::listen calls) - Default-in-listener / override-in-middleware voor binary tags (impersonation.active) - Tenant resolution chain (route-param → portal-token → super_admin platform → user fallback) - Runtime context-split via actor_scope tag (organisation / platform / user / anonymous / portal) **Observability live op monitoring.hausdesign.nl** met twee GlitchTip projecten (crewli-api + crewli-app), 90-dagen retention, email-alerting naar support@hausdesign.nl, dagelijks backup-cron. **Refs:** - dev-docs/ARCH-OBSERVABILITY.md (730 regels — post-implementation reference) - dev-docs/runbooks/observability-triage.md (P0-P3 classificatie, request_id correlation) - dev-docs/runbooks/observability-erasure.md (GDPR Art. 17 procedure) - dev-docs/SECURITY_AUDIT.md (WS-7 Observability finale audit sectie, 7 controls) - dev-docs/BACKLOG.md (closure entry + 5 follow-up items: OBS-1, OBS-4, OBS-6, OBS-7, OBS-9) **Merge style:** Merge commit (--no-ff equivalent in Gitea UI = "Create merge commit").
bert.hausmans added 30 commits 2026-05-07 22:46:42 +02:00
Two charter amendments from the original WS-7 brief:

- Sentry -> GlitchTip (self-hosted, protocol-compatible). Same
  Sentry SDKs on backend (sentry-laravel) and frontend
  (@sentry/vue), pointed at a self-hosted GlitchTip DSN. Avoids
  Sentry SaaS pricing and keeps event data on infrastructure
  Bert controls.
- Performance monitoring out of scope (errors-only). WS-7
  delivers exception capture + alerting + scrubbing + RBAC
  only. APM/tracing/spans deferred to a later workstream if
  ever needed; pre-launch with no users, the cost/benefit
  doesn't justify it now.

RFC-as-first-commit pattern (per WS-6) so the scope-alignment
document is in main before any infra/code changes land.
WS-7 PR-1 — bring up self-hosted GlitchTip alongside the existing
dev stack. One compose file is portable to the production monitoring
host (RFC-WS-7 §3.1).

- docker-compose.glitchtip.yml: web/worker/postgres/redis pinned, web
  bound to 127.0.0.1:8200, internal network for postgres + valkey.
- docker/glitchtip/.env.example: documented dev defaults + production
  checklist; .env itself ignored.
- Makefile: services / services-stop merge both compose files; new
  services-glitchtip-status tail target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Daily pg_dump → gzip → retention pipe for the GlitchTip database.
Configurable via env vars (defaults: ./backups/glitchtip, 30-day
retention, glitchtip-postgres container). Streams directly through
gzip so no plaintext dump touches disk; output 0600.

Cron example in the script header. RFC-WS-7-OBSERVABILITY §5
acceptance criterion 11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operational docs for the GlitchTip stack landed in the previous two
commits.

- dev-docs/GLITCHTIP.md: new runbook covering local dev, project
  provisioning + DSN-to-vault flow, production deploy on
  monitoring.hausdesign.nl (DNS, DirectAdmin Let's Encrypt, Apache
  reverse proxy with WS upgrade), backup install + restore drill,
  smoke tests, troubleshooting.
- dev-docs/SETUP.md: services table now includes GlitchTip; new
  docker/glitchtip/.env subsection points at the runbook.
- dev-docs/RFC-WS-7-OBSERVABILITY.md §3.1: amended to record that the
  same compose file drives local dev (Mailpit at bm_mailpit:1025), so
  prod and dev cannot drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-2 commit 1. Wires sentry-laravel into the app behind a
config-only no-op when SENTRY_DSN_BACKEND is empty (RFC §3.3).

- composer require sentry/sentry-laravel ^4.15 (resolved 4.25.1)
- config/sentry.php: DSN env mapped to SENTRY_DSN_BACKEND, environment
  falls back to APP_ENV, traces/profiles forced to 0.0 (RFC §2
  amendment B), send_default_pii hard-pinned false, before_send to
  SentryEventScrubber, ignore_exceptions covers ValidationException /
  AuthenticationException / AuthorizationException.
- app/Services/Observability/SentryEventScrubber.php: recursive body /
  header / query-string scrubber + form_values wholesale replacement +
  HttpException sub-500 drop (status filter that ignore_exceptions
  cannot do class-only). Max-depth guard against malicious payloads.
- app/Enums/Observability/ActorType.php: enum + resolver for §3.6
  actor_type tag (consumed by BindSentryContext in commit 2).
- tests/Feature/Observability/PiiScrubbingTest.php: 20 cases.
- api/.env.example: SENTRY_DSN_BACKEND + SENTRY_RELEASE entries.

Larastan: clean. Test count: 1487 to 1507.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-2 commit 2.

- app/Http/Middleware/BindSentryContext.php: sets RFC §3.6 tags on the
  active Sentry scope (app, http.method, route_name, actor_type,
  user_id, organisation_id, event_id, impersonation). Multi-tenant
  invariant: throws RuntimeException in local/testing when an auth
  request to a tenant-scoped route lacks organisation_id; logs a
  warning in production so the user flow still completes.
- app/Listeners/Observability/TagJobAttemptOnSentry.php: tags
  queue.attempt on the scope from the JobProcessing event. Default
  stack-trace grouping preserved per §3.11.
- ActorType: VOLUNTEER case reserved for a future role split. Current
  resolver maps non-admin authenticated users to ORG_MEMBER.
- bootstrap/app.php: registers sentry.context alias. Applied inside
  auth:sanctum groups in routes/api.php so it runs after auth.
- AppServiceProvider::boot registers the queue listener.

Test count: 1507 to 1523. Larastan clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-2 commit 3. RFC §3.13.

- app/Http/Middleware/BindRequestLogContext.php: tags every Laravel log
  line written during the request with request_id, organisation_id,
  user_id, and route name. Sets X-Request-Id on the response so the
  SPA can correlate to backend log lines via one click.
- Client-supplied X-Request-Id is honoured only if it parses as a ULID
  via Str::isUlid. Junk input (empty, non-ULID) is rejected and a
  fresh ULID is generated server-side.
- Registered as a global api-group middleware via the prepend list so
  it runs before authentication. Unauthenticated 4xx responses still
  carry the X-Request-Id header.
- Test count: 1523 to 1532. Larastan clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-2 follow-up. The PR-2 backend SDK install passed unit tests because
they exercised the scrubber and the BindSentryContext scope writer in
isolation, but live exceptions from controllers never reached
GlitchTip — they were correctly logged to laravel.log but the report()
call had no Sentry-aware reporter to invoke.

Root cause: sentry-laravel 4.x does NOT auto-register an exception
reporter. The host application is required to wire Integration::handles
inside withExceptions in bootstrap/app.php (per the package README and
Sentry docs). Without it, report and Laravels automatic
report-before-render flow only hit the default log channel.

Fix: add Integration::handles at the top of withExceptions so
sentry-laravel registers a reportable callback that calls
captureUnhandledException for every reported throwable. Filtering
remains downstream:
  - ignore_exceptions in config/sentry.php drops Validation,
    Authentication, Authorization (RFC §3.10).
  - SentryEventScrubber::scrub returns null for sub-500 HttpException
    via the before_send hook (RFC §3.7).

Regression coverage: tests/Feature/Observability/ExceptionReportingTest
installs a real Sentry client with a recording before_send and exercises
the full request to capture pipeline through the auth and sentry.context
middleware. Five cases: RuntimeException IS captured (with §3.6 tags
attached), ValidationException is not, NotFoundHttpException 404 is
not, AuthorizationException 403 is not, request-context tags ride along
on the captured event.

Test count: 1532 to 1537. Larastan clean. Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The scrubber is fully stateless. Container-resolution per event was
overhead without value, the closure indirection polluted the config
layer with executable logic, and stack traces showed an anonymous
closure frame instead of the class name.

- SentryEventScrubber::scrub() and its private helpers all become
  static methods. No instance fields, so the change is mechanical.
- config/sentry.php before_send switches from a closure that calls
  app() to PHP array-callable notation [Class, method]. Symfony
  OptionsResolver accepts array-callables for static methods.
- PiiScrubbingTest swaps (new SentryEventScrubber)->scrub(...) for
  SentryEventScrubber::scrub(...). Semantics unchanged.

Tests 1537 unchanged. Larastan and Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VOLUNTEER was reserved-but-unused. Resolver mapped non-admin
authenticated users to ORG_MEMBER because Crewli has no dedicated
volunteer Spatie role; volunteer-ness is behaviour (shift assignments),
not identity.

Dead enum cases are YAGNI violations under zero-compromise: a future
developer could use the case without realising no resolution path
leads to it, producing a silent no-op. Re-introduce alongside a real
volunteer role split when that lands (BACKLOG OBS-1).

ActorType keeps ORGANIZER_ADMIN, SUPER_ADMIN, PORTAL_TOKEN, ORG_MEMBER,
UNAUTHENTICATED. Tests at 1537, Larastan clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sentry-context binding split into two responsibilities:

- Route-scope (app, http.method, route_name) stays in middleware on
  the api group as BindSentryRouteContext — works on every request,
  no auth required.
- Auth-scope (user_id, actor_type) moves to AuthScopeContextListener
  on Illuminate\Auth\Events\Authenticated — works on every
  authentication mechanism (Sanctum, portal-tokens, future
  authenticators) without per-route middleware-attachment. Listener
  also augments Log::withContext with user_id (closes OBS-2).

Architecturally fault-preventing rather than fault-detecting: new
authenticated route groups need no separate sentry.context aliasing,
so silent observability gaps are no longer possible (closes OBS-3).

Impersonation tagging is co-located with HandleImpersonation: after
the user-swap, the middleware re-tags Sentry scope with the target
user_id/actor_type and adds impersonation.active /
impersonation.impersonator_user_id / impersonation.session_id. The
Authenticated event fires for the admin (Sanctum's natural flow),
the listener tags the admin, then HandleImpersonation overwrites
post-swap.

Files renamed:
- BindSentryContext -> BindSentryRouteContext (route-scope only)
- BindSentryContextTest -> BindSentryRouteContextTest (4 cases)

Files added:
- AuthScopeContextListener
- AuthScopeContextListenerTest (6 cases)

bootstrap/app.php drops the sentry.context alias and prepends
BindSentryRouteContext to the api group. routes/api.php drops every
sentry.context middleware string from auth:sanctum groups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-2 live smoke test surfaced that super_admin platform-route
exceptions arrived without organisation_id, and the original RFC §3.6
invariant (always-present organisation_id on authenticated events)
would force misleading attribution if it tried to fill that gap.

Refined invariant: every authenticated event carries actor_scope
(organisation/platform/user/anonymous), AND when actor_scope is
organisation, organisation_id MUST be a valid ULID. Platform-mode
correctly omits organisation_id rather than fabricate one.

Resolution chain in AuthScopeContextListener:
  1. {organisation} or {event} URI parameter -> actor_scope=organisation
  2. portal_event request attribute -> actor_scope=organisation
  3. super_admin on admin.* named route -> actor_scope=platform
     (Crewli's platform-admin routes use the admin. name prefix)
  4. Default authenticated -> actor_scope=user, no org tag
     (User<->Organisation is many-to-many; no reliable single-org hint)

Eight new test cases in AuthScopeContextListenerTest cover each branch
and the conditional invariant, including ULID validity via
Symfony\Component\Uid\Ulid::isValid.

Test count 1531 to 1539. Larastan clean. Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-2 verified that Spatie's activitylog default migration creates the
composite indexes RFC-WS-7 §3.14 / addendum D-06 require — via
nullableMorphs('subject') and nullableMorphs('causer'), which emit
indexes named `subject` on (subject_type, subject_id) and `causer` on
(causer_type, causer_id).

This test queries information_schema.STATISTICS and fails if either
composite is missing, regardless of the index name. It guards against
silent regression when:
  - A future Spatie major release changes nullableMorphs semantics.
  - A developer rewrites the activity_log migration without preserving
    the morph indexes.
  - A schema-dump regeneration drops them.

Test count 1539 to 1541. Larastan clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RFC §3.6 — context tagging tabel volledig vervangen na de PR-2 follow-up
architecturale fixes. Belangrijkste wijzigingen:
- Tag-binding gesplitst in route-scope (BindSentryRouteContext middleware)
  en auth-scope (AuthScopeContextListener op Authenticated event).
- Nieuwe actor_scope tag (organisation/platform/user/anonymous).
- Multi-tenant invariant verfijnd: organisation_id is altijd correct
  gerelateerd aan actor_scope in plaats van "altijd aanwezig". Platform-
  routes zonder org-context worden niet meer gefabriceerd; default
  authenticated user-scope omitt organisation_id (Crewli's User<->Organisation
  is many-to-many, geen reliable single-org hint).
- impersonation.* tags expliciet gedocumenteerd als afkomstig uit
  HandleImpersonation middleware (post-swap), niet uit auth-listener.
- ActorType waarden bijgewerkt na verwijdering van VOLUNTEER case.

RFC §3.14 — status-note toegevoegd dat D-06 indexes al via Spatie's
nullableMorphs default-migratie zijn aangemaakt, met regression-guard
verwijzing.

§6 acceptance criterium 12 markeert D-06 als al voldaan.

BACKLOG.md krijgt vier nieuwe OBS-entries:
- OBS-1: VOLUNTEER actor_type promotion wanneer rol komt
- OBS-4: PHPUnit metadata deprecation cleanup pre-PHPUnit-12
- OBS-6: sentry-laravel install gap awareness + bootstrap test
- OBS-7: custom render handlers report() invariant + coverage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live HTTP smoke test on the post-architectural-fixes branch surfaced
that captured Sentry events carried only route-scope tags (app,
route_name, http.method) — auth-scope tags (user_id, actor_type,
actor_scope) were absent on every request.

Root cause: Sanctum's Guard fires Laravel\Sanctum\Events\TokenAuthenticated
(vendor/laravel/sanctum/src/Guard.php:77) on bearer-token resolution,
NOT Illuminate\Auth\Events\Authenticated. The Authenticated event only
fires from SessionGuard
(vendor/laravel/framework/src/Illuminate/Auth/SessionGuard.php:833),
which Crewli does not use — CookieBearerToken middleware injects the
httpOnly cookie as Authorization: Bearer, then auth:sanctum invokes
Sanctum's Guard. So the listener never ran on Crewli's HTTP path.

Offline tests in AuthScopeContextListenerTest passed because they
dispatch event(new Authenticated(...)) directly, bypassing the Guard
layer. Sanctum::actingAs() in tests has the same blind spot — it
short-circuits the Guard via guard('sanctum')->setUser() and fires
neither event.

Fix:
- New handleTokenAuthenticated(TokenAuthenticated $event) method on
  AuthScopeContextListener extracts the user via $event->token->tokenable
  and delegates to a private bindForUser() shared with handle().
- AppServiceProvider registers the listener for both Authenticated
  (covers SessionGuard / login flow / future authenticators) and
  TokenAuthenticated (covers Crewli's bearer-token Sanctum flow).

Regression coverage: AuthScopeBindingHttpFlowTest exercises the real
Sanctum Guard via $user->createToken() + Authorization: Bearer header.
Three cases:
  - super_admin on a user-scope route: actor_scope=user, all auth tags
    present.
  - super_admin on an admin.* route: actor_scope=platform, no
    organisation_id (correct platform-mode behaviour).
  - org_admin on a route with {organisation} param: actor_scope=
    organisation, organisation_id valid ULID.

Test count 1541 to 1544. Larastan clean. Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-discovery + explicit Event::listen() runt observability listeners
twee keer per event (verified via php artisan event:list duplicate
entries). Vandaag idempotent vanwege scope-tag overwrite semantics, maar
architecturaal onacceptabel — toekomstige additive listeners zouden
onmiddellijk breken zonder waarschuwing.

Optie A (Bert bevestigd, RFC-WS-7 OBS-8): expliciete registraties
behouden in AppServiceProvider::boot(), auto-discovery globaal uit via
->withEvents(discover: false) in bootstrap/app.php. Reden: explicit >
implicit voor observability-kritische bindings — grep-baar, IDE-
navigeerbaar, direct zichtbaar bij code review.

TagJobAttemptOnSentry registratie ook van class-string naar array-
callable vorm gebracht zodat event:list de gebonden methode toont
(consistent met AuthScopeContextListener-registraties).

Test count ongewijzigd op 1544. Larastan + Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RFC §3.6 vereist impersonation.active als always-present binary signal
op authenticated events. Originele PR-2 architectural-fixes verplaatste
impersonation-tagging naar HandleImpersonation middleware, die alleen
draait bij actieve impersonation. Resultaat: non-impersonation events
hadden GEEN tag, niet 'false' tag — wat filtering op "alle impersonation
events" in GlitchTip stilletjes onmogelijk maakte.

Fix: AuthScopeContextListener::bindForUser() zet baseline 'false';
HandleImpersonation overschrijft naar 'true' + impersonator_user_id
wanneer actief. Default-in-listener, override-in-middleware pattern.
HandleImpersonation deed de override-set al correct sinds commit
9414d09; alleen de baseline ontbrak.

Bert's live verification toonde de gap: super_admin event zonder
impersonation actief, GlitchTip event zonder impersonation.active tag.

Tests:
- test_impersonation_active_default_false_for_non_impersonation_authenticated_event
  (was test_authenticated_event_does_not_set_impersonation_tags;
  hernoemd + assertion gewijzigd)
- test_impersonation_active_default_false_across_every_actor_scope_branch
  walks elke actor_scope branch (user/organisation/platform) en bewijst
  baseline geldt uniform — vangt toekomstige refactors die per branch
  vroegtijdig returnen.

Test count 1544 to 1545. Larastan + Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drie regression-tests die de klasse fouten uit PR-2 nazorg empirisch
voorkomen:

1. test_authenticated_listener_registered_exactly_once
2. test_token_authenticated_listener_registered_exactly_once
3. test_job_processing_tag_listener_registered_exactly_once
   — vangen OBS-8 patroon (auto-discovery + explicit listen samen) plus
   accidentally-removed registrations door toekomstige refactors. Walk
   Event::getRawListeners() en faalt met count != 1 met een duidelijke
   message ("auto-discovery re-enabled? OR explicit Event::listen
   missing?"). Empirisch geverifieerd: zowel duplicate als missing
   registratie wordt gevangen.

4. test_impersonation_active_tag_invariant_on_captured_events
   — RFC §3.6 binary signal invariant op een echte HTTP request flow.
   Vangt regressie waar de baseline-tag-binding verdwijnt.

BACKLOG.md OBS-8 entry toegevoegd en gemarkeerd als Resolved met
verwijzing naar de drie commits van deze sessie + architecturaal
pattern (explicit > implicit voor observability-kritische bindings).

Test count 1545 to 1549. Larastan + Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-3 commit 1. Frontend mirror of the backend SDK install
(commits bdb89a2..adab3be), wired against the existing apps/app SPA.

- pnpm add @sentry/vue@10.52.0 (pinned).
- src/observability/sentry.ts: initSentry() — empty DSN no-op (RFC §3.3),
  errors-only (tracesSampleRate=0, profilesSampleRate=0; RFC §2 amend.B),
  sendDefaultPii=false, Console integration off, beforeSend wired to the
  scrubber, initial scope tag app=app for GlitchTip filtering.
- src/observability/scrubber.ts: TypeScript port of backend
  SentryEventScrubber. RFC §3.7 frontend block — body / header / query
  scrubbing, form_values wholesale replacement, cookies wholesale,
  defensive strip of contexts.storage and user.cookies, max-depth guard.
- src/observability/contextBinding.ts: Vue Router beforeEach guard that
  binds RFC §3.6 auth-scope tags per navigation. Three zones via
  route.meta.public + route.path matching:
    - portal token zone (meta.public + meta.context=portal) → actor_scope=
      portal, no user_id (RFC §3.6 explicit)
    - /platform/* with super_admin → actor_scope=platform, no org tag
    - default authenticated → actor_scope=organisation when an active
      organisation is selected (useOrganisationStore.activeOrganisationId),
      otherwise actor_scope=user
    - unauthenticated public pages → actor_scope=anonymous
  Reads useAuthStore (user, appRoles, isSuperAdmin) and
  useOrganisationStore (activeOrganisationId) — corrected vs. RFC's
  speculative auth-store API.
- src/observability/index.ts: barrel.
- src/main.ts: initSentry runs before registerPlugins so Sentry's Vue
  errorHandler hooks before any plugin or component initialises;
  installContextBinding runs after registerPlugins so pinia is up.
- env.d.ts: VITE_SENTRY_DSN_FRONTEND + VITE_SENTRY_RELEASE typed.
- .env.example: new file (didn't exist before) documenting all SPA env
  vars including the new Sentry pair.
- vite.config.ts: build.sourcemap=true (RFC §3.5 — generated, uploaded
  to GlitchTip by deploy.sh, then stripped before nginx serves dist/).

Typecheck: green. Build: green, *.map files emitted alongside *.js
chunks as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-3 commit 2.

- scrubber.spec.ts (18 cases): mirrors backend PiiScrubbingTest semantics.
  Body/header/query scrubbing, form_values wholesale replacement, all
  SENSITIVE_BODY_KEYS at top + nested levels, max_depth guard, cookies +
  storage + user.cookies sanitisation.
- contextBinding.spec.ts (11 cases): exercises the Vue Router beforeEach
  guard against a real router with mocked Sentry scope (capturing every
  setTag/setUser call into a per-test buffer). Cases:
    - portal-token zone — actor_scope=portal, no user_id
    - platform route + super_admin — actor_scope=platform
    - platform route without super_admin — does NOT tag platform
    - organizer route with active org — actor_scope=organisation +
      organisation_id
    - organizer route without active org — actor_scope=user, no org tag
    - unauthenticated public — actor_scope=anonymous
    - actor_type role hierarchy
    - RFC §3.8 ULID-only user identity (no email leakage)
    - route_name + app=app baseline tags
    - cross-zone leak guard: navigating from organizer to portal-token
      calls scope.clear() and does not bind user

Frontend test count 223 to 252. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-3 commit 3, RFC §3.5.

- deploy.sh: export VITE_SENTRY_RELEASE=crewli-app@<short-sha> before
  the Vite build so the release identifier is inlined into the bundle
  via import.meta.env.
- New step 4a after the build: when SENTRY_AUTH_TOKEN and
  VITE_SENTRY_DSN_FRONTEND are present, upload sourcemaps via
  `npx @sentry/cli@latest sourcemaps upload` to project crewli-app
  with --url-prefix=~/assets/ matching Vite's default asset path.
  Soft-fails with a warning so deploy can still succeed if GlitchTip
  is unreachable.
- Always run `find apps/app/dist -name '*.map' -delete` after upload
  (or after skipped upload). No public-mapped sources reach nginx —
  RFC §3.5 invariant.
- .gitignore: defensive `apps/app/dist/**/*.map` exclusion (dist/ is
  already broadly ignored; this is belt-and-suspenders against
  accidental commits of build output).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WS-7 PR-3 commit 4. RFC §6 acceptance criteria 4, 5, 6 now satisfied
by the frontend SDK PR; entries marked  with brief implementation
references.

Updated criterion 4 to reference Crewli's actual token-based portal
paths (/portal/advance/:token, /register/:public_token) instead of the
RFC's speculative /p/* — the contextBinding guard detects via
route.meta.public + route.meta.context which is the canonical Crewli
signal already used by other guards.

Added a "Voortgang (mei 2026)" subsection at the end of §6 mapping
each PR to the acceptance criteria it closed, plus what remains for
PR-4 (live smoke, ARCH-OBSERVABILITY.md, alerting config, retention
config, SECURITY_AUDIT.md update).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-3 follow-up. Live smoke surfaced that the @sentry/vue SDK was
running correctly and emitting events, but Crewli's strict
connect-src directive blocked every POST at the browser layer. No
fallback — events evaporated silently with a CSP-violation log in
DevTools console only.

Updated locations (audited the CSP surface; only two locations actually
need the whitelist):

- apps/app/index.html — dev meta CSP, adds http://localhost:8200 to
  connect-src so local dev hits the docker-compose GlitchTip stack.
- deploy/nginx/csp-spa.conf — prod organizer SPA CSP, adds
  https://monitoring.hausdesign.nl to BOTH the report-only and enforce
  add_header lines so a future flip between modes can't silently break
  observability.

NOT updated (deviation from prompt):

- api/config/security.php — the API CSP is `default-src 'none';
  frame-ancestors 'none'` for JSON responses. Browsers don't enforce
  connect-src on JSON contexts (no document, no fetch origin). Adding
  connect-src would be semantically a no-op and confuse the deny-by-
  default policy.

Regression guard: tests/Feature/Security/CspConnectsToObservabilityTest.
Reads both the dev meta tag and the prod nginx conf directly (the SPA's
CSP is not Laravel-served, so $this->get() can't reach it). Apply-with-
revert verified: stashing both fixes makes both cases fail with a clear
"Refused to connect because it violates the following CSP directive"
hint; popping the stash restores green.

SECURITY_AUDIT.md A13-9 updated with a WS-7 follow-up note documenting
the GlitchTip whitelist as an explicit security control: outgoing
observability traffic restricted to a single known host.

Test count 1549 to 1551. Larastan + Pint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the WS-6 skeleton with a full post-implementation reference
for the observability stack. Eleven sections covering scope, component
overview, tag taxonomy (replacing RFC §3.6 as source-of-truth), tag
binding architecture, scrubbing semantics, runtime context split, CSP
whitelist, sourcemap upload, GDPR + privacy, maintenance + extension
guidance, plus cross-references.

Form Builder exception classification from the old skeleton §3 is
preserved in §5.4 — concrete answer for which Crewli exception
classes do or do not go to GlitchTip.

Lengte: 730 regels markdown. Closes WS-7 acceptance criterion 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-4 commit 2. Both runbooks live under dev-docs/runbooks/ as the first
entries in that directory.

- observability-triage.md (270 lines): incoming-issue procedure. Tags
  inspectie (actor_scope, release, actor_type, organisation_id,
  impersonation), triage classes (P0–P3), reproductie via request_id
  correlation naar laravel.log, common patterns (validation leakage,
  runaway errors, multi-tenant invariant violations, CSP black-silence),
  resolution + audit trail.
- observability-erasure.md (293 lines): GDPR Art. 17 procedure.
  Trigger voorwaarden (upstream eerst), pre-checks, handmatige
  psql-procedure met counts vóór delete, post-checks, automation
  BACKLOG verwijzing, edge cases (no-events-in-window,
  impersonation-target, queued events, mass-erasure batch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR-4 commit 3 — closure-bookkeeping nu de implementation-PRs en de
twee runbooks gemerged zijn.

- RFC-WS-7-OBSERVABILITY.md: nieuwe §9 Implementation status (mei 2026)
  vat samen welke acceptance criteria via PR-1..PR-4 zijn voldaan en
  welke (1, 2, 7, 9, 10) op Bert's deploy-checklist resteren. Pointer
  naar ARCH-OBSERVABILITY.md als levende reference; de RFC blijft
  historisch document.
- SECURITY_AUDIT.md: nieuwe sectie 'WS-7 Observability — finale audit
  (mei 2026)' tussen A13-10 en Positive Findings. Bevat (1) acceptance
  criteria checklist met status per criterium, (2) processing register
  entry voor GlitchTip (controller-not-processor, retention 90 dagen,
  TLS+full-disk-encryption+2FA), (3) zeven security controls die WS-7
  introduceert (PII scrubbing, CSP whitelist, sourcemap upload-only,
  listener registration discipline, runtime portal-context-split,
  multi-tenant tag invariant, impersonation.active binary signal),
  (4) pointer naar runbooks/observability-erasure.md voor Art. 17.
- BACKLOG.md: status-overzicht-tabel boven de OBS-entries. Toegevoegd
  als entry: OBS-2 (early-pipeline log context,  Resolved), OBS-3
  (sentry-context middleware coverage,  Resolved — opgevouwen in
  AuthScopeContextListener), OBS-5 (Crewli render handlers report()
  invariant,  Resolved via 48f2a00 + ExceptionReportingTest), en
  OBS-9 (Active — staging environment GlitchTip CSP whitelist follow-up
  bij staging-introductie). Bestaande OBS-1, 4, 6, 7 ongewijzigd
  (Active); OBS-8 staat al op Resolved sinds dee1401.
- .claude-sync.conf: drie nieuwe doc-paths toegevoegd
  (ARCH-OBSERVABILITY.md, runbooks/observability-triage.md,
  runbooks/observability-erasure.md). Post-commit sync-claude-docs
  hook regenereert SYNC_MANIFEST.md met deze entries.

Closes WS-7 documentation acceptance criteria 8 (ARCH) en 14
(SECURITY_AUDIT). Resterende criteria (1, 2, 7, 9, 10) zijn
deploy-checklist door Bert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Acceptance criteria 1-14 voldaan; observability volledig operationeel
op monitoring.hausdesign.nl. Implementation criteria 3, 4, 5, 6, 8,
11, 12, 13, 14 via 4 PRs op feat/ws-7-observability; operationele
criteria 1, 2, 7, 9, 10 via deploy-checklist.

Hernoem 'Observability follow-ups (post WS-7)' sectie-header naar
'(post WS-7 closure)' voor accuratesse na PR-3 + PR-4. Closure-entry
geplaatst onderaan 'Opgeloste items (mei 2026)' om chronologische
volgorde (oldest-first) te respecteren — WS-7 op 2026-05-07 volgt
WS-3 PR-C op 2026-05-06 die volgt op WS-TOOLING-001 op 2026-05-05.

Refs: dev-docs/ARCH-OBSERVABILITY.md, dev-docs/runbooks/observability-{triage,erasure}.md
bert.hausmans merged commit c398772a23 into main 2026-05-07 22:49:31 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bert.hausmans/crewli#8