docs(testing): add ARCH-TESTING.md — test pyramid, scope per tier, anti-patterns

B5 of TEST-INFRA-001 (RFC-WS-FRONTEND-PRIMEVUE Amendment A-1). - Add dev-docs/ARCH-TESTING.md (~13 KB): §1 Five-tier pyramid (Unit / Component / Integration / Visual / E2E) with environment, cost, and purpose per tier §2 Decision tree — pick by what is being verified, not by speed §3 Mock-vs-real-backend rules + the self-confirming-bias anti- pattern that motivated TEST-CONTRACT-001 §4 Visual baseline workflow including the composite-over-isolated strategy used in B3 §5 CI strategy stub — deferred to TEST-INFRA-002 §6 Conventions + 5 anti-patterns §7 Vuetify-during-PrimeVue-migration: explicit doc that the Vuetify plugin in playwright/index.ts is INTENTIONAL TEMPORARY STATE replaced in F3 by PrimeVue. Forbids the "abstract the UI framework provider" deferred-cost trap. §8 Host setup — Node, pnpm, Chromium, Git LFS, MySQL 8, PHP, .env; known risks (unpkg.com flakiness, shared crewli_test DB) §9 Deferred work cross-references to BACKLOG entries - Update CLAUDE.md ### Testing section to reference ARCH-TESTING.md - Add ARCH-TESTING.md to .claude-sync.conf so the dev-docs sync pipeline picks it up; sync script run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:29:18 +02:00
parent 2dfb1e8bae
commit 7e21c6a633
3 changed files with 346 additions and 0 deletions
--- a/dev-docs/ARCH-TESTING.md
+++ b/dev-docs/ARCH-TESTING.md
@@ -0,0 +1,341 @@
+# Crewli — Test Architecture
+
+> Authoritative reference for test-tier choices in the SPA. Read this
+> before adding a new test. Linked from `CLAUDE.md`.
+
+This document describes:
+
+1. The test pyramid Crewli uses, and what each tier is for
+2. When to use which tier (decision tree)
+3. Mock-vs-real-backend rules
+4. Visual baseline workflow
+5. CI integration status
+6. Conventions and anti-patterns
+7. Vuetify-during-PrimeVue-migration: the temporary state in test infra
+8. Host setup requirements
+9. Deferred work (BACKLOG references)
+
+---
+
+## 1. Test pyramid and scope per layer
+
+Crewli runs five test tiers in the SPA. Each has a narrow purpose;
+overlap is wasted work, gaps are silent risk. Pick the tier whose
+purpose matches what you're actually verifying.
+
+### Tier 1 — Unit (Vitest + happy-dom)
+
+**Run via:** `pnpm test` (filtered by `tests/unit/**`)
+**Environment:** Node + happy-dom, single module graph
+**Cost:** ~20 ms per test
+**For:** Pure logic, schema parsing, store reducers, isolated composable
+behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.
+
+### Tier 2 — Component (Playwright Component Testing)
+
+**Run via:** `pnpm test:component`
+**Environment:** Real Chromium via `@playwright/experimental-ct-vue`
+**Cost:** ~300 ms per test (incl. Chromium reuse)
+**For:** Single-component verification. DOM rendering, click/keyboard,
+prop propagation, slot rendering, CSS resolution. Mocks API at axios
+layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is
+wired in `apps/app/playwright/index.ts`'s `beforeMount` hook.
+
+### Tier 3 — Integration (Playwright CT, multi-component)
+
+**Run via:** Same `pnpm test:component` runner; placement convention
+distinguishes integration from single-component.
+**Cost:** ~500 ms per test
+**For:** Page-level mounting with mocked API responses. Tests
+cross-component coordination (drag from Wachtrij → canvas, popover
+→ mutation flow). Same provider stack as Tier 2.
+
+### Tier 4 — Visual regression (Playwright CT, `@visual` tag)
+
+**Run via:** `pnpm test:visual` (verify), `pnpm test:visual:update`
+(regenerate baselines)
+**Environment:** Real Chromium driving the canonical prototype HTML
+served by a tiny static-server fixture (`tests/playwright-ct/visual/
+static-server.mjs`).
+**Cost:** ~1.2 s per test
+**For:** Pixel baselines against canonical visual sources. The
+prototype HTML at `resources/Crewli - Artist  Timetable Management/
+crewli-timetable.html` is the source of truth for Artist Management
+surfaces. F4 (component migration) extends visual coverage to live
+SPA components against the prototype.
+
+### Tier 5 — E2E (Playwright)
+
+**Run via:** `pnpm test:e2e`
+**Environment:** Real Laravel test server (`php artisan serve --port=
+8001`, DB `crewli_test`) + real Chromium browser context.
+**Cost:** ~5 s for the suite (includes migrate:fresh + seed)
+**For:** Contract verification end-to-end. Real network, real auth,
+real DB transactions. Currently only the 409-conflict optimistic-
+locking contract test (TEST-CONTRACT-001). Add tests sparingly — this
+is the most expensive tier.
+
+---
+
+## 2. When to use what — decision tree
+
+```
+Is the thing under test pure logic with no DOM?
+  └─ YES → Unit (Vitest + happy-dom)
+
+Is it a single component? (props, events, slots, CSS, keyboard)
+  └─ YES → Component (Playwright CT)
+
+Is it cross-component coordination, but no real backend?
+  └─ YES → Integration (Playwright CT)
+
+Is it a contract between SPA and backend (request/response shape)?
+  └─ YES → E2E (Playwright + Laravel)
+
+Is it visual fidelity to a canonical baseline?
+  └─ YES → Visual (Playwright CT, @visual tag)
+```
+
+**Don't pick by speed.** Pick by what you're verifying. A unit test
+that mocks the backend cannot catch a contract-drift bug; an e2e test
+for pure logic is wasted CI time.
+
+---
+
+## 3. Mock-vs-real-backend choice rules
+
+### Mock when
+
+- The test verifies SPA behaviour given a known response shape
+- Backend availability would slow the test below the relevant tier's
+  cost budget
+- The path under test is independent of transactional / auth
+  semantics
+
+### Real backend when
+
+- The test verifies the contract between frontend and backend (Zod
+  schema vs. PHP Resource shape)
+- Authentication or authorisation flows are involved
+- Optimistic-locking, idempotency, or other multi-request semantics
+  matter
+
+**Anti-pattern: matching mocks to schemas.** Don't mock with the same
+shape your Zod schema validates — that creates self-confirming bias
+where both sides agree but neither matches reality. This is the
+exact failure mode TEST-CONTRACT-001 was created to catch (timetable-
+stabilization B5).
+
+---
+
+## 4. Visual baseline workflow
+
+### Capturing baselines
+
+```bash
+pnpm test:visual:update
+```
+
+Reviews PNG diffs in PRs. Baselines live at:
+```
+apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png
+```
+Tracked via Git LFS (see `.gitattributes`). Pixel tolerance:
+`maxDiffPixelRatio: 0.001` (0.1%) per `playwright-ct.config.ts`.
+
+### Updating baselines (intentional UX change)
+
+1. Make the UX change (component edit, token edit, …)
+2. Run `pnpm test:visual:update` locally
+3. Review the diff PNG manually — does the new baseline match the
+   intended UX?
+4. Commit baseline + UX change in the **same PR**. Reviewer can
+   compare baseline change against the UX intent.
+5. Never update baselines to "make tests pass" without a UX-justified
+   reason in the PR description.
+
+### Updating baselines (unintentional diff in CI)
+
+1. Determine if the diff is environmental (font hinting, OS rendering,
+   timezone-based date formatting) or a real regression.
+2. Environmental → consider tightening determinism (lock fonts, fake
+   timers, fixed locale) before tweaking tolerance.
+3. Real regression → fix the regression, not the baseline.
+
+### Composite-over-isolated strategy (B3 baselines)
+
+Some surfaces enumerated in RFC §A.3's baseline list are captured as
+composite views rather than individual block-state baselines. Reason:
+the prototype's DOM exposes status only via inline `style.background`,
+no `data-*` attributes. Isolated locators (e.g. by artist name) lock
+the test to specific seed data and silently rot if data changes.
+
+The current 5 baselines cover the visual vocabulary:
+
+| File                          | Captures                                                |
+| ----------------------------- | ------------------------------------------------------- |
+| `canvas-friday.png`           | Status colors, b2b indicators, multi-lane stacking      |
+| `canvas-saturday.png`         | Conflict ring, capacity warning                         |
+| `stage-row-multilane.png`     | First row in isolation                                  |
+| `wachtrij-populated.png`      | Sidebar list rendering, status badges, counts           |
+| `popover.png`                 | Block-click popover layout                              |
+
+9 additional surfaces are documented as `test.skip()` in
+`tests/playwright-ct/visual/prototype.spec.ts` with the gap reason.
+F4 component migration adds isolated baselines using stable
+`data-test-id` attributes on Vue components.
+
+---
+
+## 5. CI integration
+
+**Status: deferred.** The repo currently has no CI runner configured.
+Local development workflow:
+
+- Vitest (`pnpm test`) — tier 1, runs on demand
+- Playwright Component (`pnpm test:component`) — tiers 2–4, runs on
+  demand
+- Playwright E2E (`pnpm test:e2e`) — tier 5, runs on demand against a
+  developer-managed Laravel test server
+
+CI design (Gitea Actions vs. GitHub Actions decision, Linux runner
+image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload,
+label-gated nightly e2e) is captured as `TEST-INFRA-002` in
+`dev-docs/BACKLOG.md`.
+
+When CI lands:
+
+- Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
+- PR-CI: Vitest unit + Playwright component + visual. Slower but full
+  coverage.
+- Nightly / label-gated: Playwright e2e against real Laravel + MySQL.
+  Most expensive tier.
+
+---
+
+## 6. Conventions
+
+- **Test file naming:** `*.spec.ts` for Playwright (CT + e2e),
+  `*.test.ts` for Vitest. The runner config glob keeps them apart.
+- **`@visual` tag:** required on all visual-regression tests so
+  `--grep @visual` filters them.
+- **Provider stack for CT:** wired in `apps/app/playwright/index.ts`'s
+  `beforeMount` hook, not at mount call time. Tests forward
+  per-test overrides via `hooksConfig` (see
+  `tests/playwright-ct/utils/mountWithProviders.ts`).
+- **E2E test isolation:** `globalSetup` runs `migrate:fresh + seed`
+  once per `pnpm test:e2e` invocation. Tests within one run share DB
+  state. Re-run = fresh DB.
+- **Pixel tolerance:** `maxDiffPixelRatio: 0.001` default
+  (`playwright-ct.config.ts`). Per-test exceptions allowed if
+  documented inline.
+- **Auth in e2e tests:** Bearer-via-cookie (`api/.../SetAuthCookie.php`).
+  POST `/api/v1/auth/login` returns `crewli_app_token` httpOnly cookie.
+  No CSRF dance, no Sanctum stateful flow. baseURL must be
+  `localhost:8001` (matching the cookie's `domain=localhost`),
+  **not** `127.0.0.1:8001`.
+
+### Anti-patterns to avoid
+
+1. **Mocking the same data shape that the schema validates** —
+   creates self-confirming bias. Use real backend for contract tests
+   (TEST-CONTRACT-001 catches this class of bug).
+2. **Updating baselines silently** without diff review or a UX-
+   justified PR description.
+3. **Adding Playwright tests for pure logic** that Vitest can cover
+   in 20 ms. Reserve Playwright for tests that need the browser.
+4. **Treating "small" UX changes as not needing visual updates** —
+   there is no small visual change in an enterprise product; the
+   user notices.
+5. **Brittle locators** by data values (artist names, stage names)
+   instead of stable test IDs. F4 will add `data-test-id` to Vue
+   components for this reason.
+
+---
+
+## 7. Vuetify in test infrastructure during the PrimeVue migration
+
+`apps/app/playwright/index.ts`'s `beforeMount` hook registers Vuetify
+as a Vue plugin. This is **intentional temporary state**.
+
+### Why
+
+The current SPA still ships Vuetify. Component-level Playwright CT
+tests must mount components against the same UI framework the live
+app uses, otherwise they would test a non-existent surface. Stripping
+Vuetify from test infra now would make CT tests un-runnable until
+F3 lands PrimeVue.
+
+### When it ends
+
+F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the
+Vuetify plugin line in `playwright/index.ts` with PrimeVue and
+updates `tests/playwright-ct/components/sanity-vuetify.spec.ts` to
+its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical
+swap, no architecture change).
+
+### Why not abstract
+
+The instinct of "abstract the UI framework provider so we can swap
+without touching test code" is a **deferred-cost trap** here:
+
+1. We are NOT retaining Vuetify post-F3. The abstraction would itself
+   need to be removed in F4 alongside the framework swap.
+2. The swap is mechanical (~2 hours). An abstraction layer would take
+   longer to design well than the swap itself takes.
+3. Reviewers seeing "Vuetify in test infra in a PrimeVue migration
+   sprint" should read this section + the JSDoc on
+   `mountWithProviders.ts` for context.
+
+The forbidden pattern: do not propose "let's make a `UIFrameworkPlugin`
+interface and dependency-inject the provider per test" during F2/F3.
+That's exactly the abstraction this section forbids.
+
+---
+
+## 8. Host setup requirements
+
+For Playwright tests to run, the host must have:
+
+- **Node v22+** with **pnpm 10+** (matching `apps/app/`'s expectations)
+- **Chromium** installed via `pnpm exec playwright install chromium`
+  (downloads to `~/Library/Caches/ms-playwright` on macOS)
+- **Git LFS** installed (`brew install git-lfs` on macOS) and active
+  (`git lfs install --skip-repo` to avoid hook conflict with lefthook;
+  the LFS pre-push step is delegated through `lefthook.yml`)
+- **MySQL 8** running locally via `make services` for e2e tests, with
+  the `crewli_test` database created via `make test-db-create`
+- **PHP 8.2+ + composer** for the Laravel test server in e2e tests
+- **`api/.env`** present with valid `APP_KEY` (e2e `globalSetup`
+  inherits this; only `DB_DATABASE` is overridden to `crewli_test` on
+  the command line)
+
+### Known risks
+
+- **`unpkg.com` dependency** — the prototype HTML loads React + Babel
+  from unpkg.com via `<script src="https://unpkg.com/...">`. Local
+  network outage or unpkg CDN issues will flake B3 baselines. Mitigation
+  if it bites: vendor `react.umd.js` + `babel.min.js` into the
+  prototype directory. Defer until it actually breaks.
+- **Test DB shared with PHPUnit** — `crewli_test` is used by both the
+  PHPUnit suite (transaction-rollback per test) and the e2e fixture
+  (migrate:fresh + seed once). Running them concurrently would
+  collide. Lifecycle assumes serial execution, which is the realistic
+  local-dev flow.
+
+---
+
+## 9. Deferred to BACKLOG
+
+- **TEST-INFRA-002** — CI runner selection (Gitea Actions vs. GitHub
+  Actions decision), runner image with PHP+MySQL+Node+pnpm, caching
+  strategy, screenshot-diff artifact upload, label-gated nightly e2e.
+- F4 isolated component-level visual baselines (replacing the
+  composite baselines in B3 with per-state baselines using stable
+  `data-test-id` attributes).
+- F4 multi-context concurrent-edit e2e tests (currently the 409
+  contract test uses single-context replay).
+- Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only
+  for v1 per RFC §A.5.
+- Mobile viewport baselines — desktop 1440×900 only for v1.
+- Soketi / WebSocket testing infrastructure when ART-15 lands.