docs(testing): add ARCH-TESTING.md — test pyramid, scope per tier, anti-patterns

B5 of TEST-INFRA-001 (RFC-WS-FRONTEND-PRIMEVUE Amendment A-1).

- Add dev-docs/ARCH-TESTING.md (~13 KB):
  §1 Five-tier pyramid (Unit / Component / Integration / Visual /
     E2E) with environment, cost, and purpose per tier
  §2 Decision tree — pick by what is being verified, not by speed
  §3 Mock-vs-real-backend rules + the self-confirming-bias anti-
     pattern that motivated TEST-CONTRACT-001
  §4 Visual baseline workflow including the composite-over-isolated
     strategy used in B3
  §5 CI strategy stub — deferred to TEST-INFRA-002
  §6 Conventions + 5 anti-patterns
  §7 Vuetify-during-PrimeVue-migration: explicit doc that the
     Vuetify plugin in playwright/index.ts is INTENTIONAL TEMPORARY
     STATE replaced in F3 by PrimeVue. Forbids the "abstract the UI
     framework provider" deferred-cost trap.
  §8 Host setup — Node, pnpm, Chromium, Git LFS, MySQL 8, PHP, .env;
     known risks (unpkg.com flakiness, shared crewli_test DB)
  §9 Deferred work cross-references to BACKLOG entries
- Update CLAUDE.md ### Testing section to reference ARCH-TESTING.md
- Add ARCH-TESTING.md to .claude-sync.conf so the dev-docs sync
  pipeline picks it up; sync script run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-10 15:29:18 +02:00
parent 2dfb1e8bae
commit 7e21c6a633
3 changed files with 346 additions and 0 deletions

341
dev-docs/ARCH-TESTING.md Normal file
View File

@@ -0,0 +1,341 @@
# Crewli — Test Architecture
> Authoritative reference for test-tier choices in the SPA. Read this
> before adding a new test. Linked from `CLAUDE.md`.
This document describes:
1. The test pyramid Crewli uses, and what each tier is for
2. When to use which tier (decision tree)
3. Mock-vs-real-backend rules
4. Visual baseline workflow
5. CI integration status
6. Conventions and anti-patterns
7. Vuetify-during-PrimeVue-migration: the temporary state in test infra
8. Host setup requirements
9. Deferred work (BACKLOG references)
---
## 1. Test pyramid and scope per layer
Crewli runs five test tiers in the SPA. Each has a narrow purpose;
overlap is wasted work, gaps are silent risk. Pick the tier whose
purpose matches what you're actually verifying.
### Tier 1 — Unit (Vitest + happy-dom)
**Run via:** `pnpm test` (filtered by `tests/unit/**`)
**Environment:** Node + happy-dom, single module graph
**Cost:** ~20 ms per test
**For:** Pure logic, schema parsing, store reducers, isolated composable
behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.
### Tier 2 — Component (Playwright Component Testing)
**Run via:** `pnpm test:component`
**Environment:** Real Chromium via `@playwright/experimental-ct-vue`
**Cost:** ~300 ms per test (incl. Chromium reuse)
**For:** Single-component verification. DOM rendering, click/keyboard,
prop propagation, slot rendering, CSS resolution. Mocks API at axios
layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is
wired in `apps/app/playwright/index.ts`'s `beforeMount` hook.
### Tier 3 — Integration (Playwright CT, multi-component)
**Run via:** Same `pnpm test:component` runner; placement convention
distinguishes integration from single-component.
**Cost:** ~500 ms per test
**For:** Page-level mounting with mocked API responses. Tests
cross-component coordination (drag from Wachtrij → canvas, popover
→ mutation flow). Same provider stack as Tier 2.
### Tier 4 — Visual regression (Playwright CT, `@visual` tag)
**Run via:** `pnpm test:visual` (verify), `pnpm test:visual:update`
(regenerate baselines)
**Environment:** Real Chromium driving the canonical prototype HTML
served by a tiny static-server fixture (`tests/playwright-ct/visual/
static-server.mjs`).
**Cost:** ~1.2 s per test
**For:** Pixel baselines against canonical visual sources. The
prototype HTML at `resources/Crewli - Artist Timetable Management/
crewli-timetable.html` is the source of truth for Artist Management
surfaces. F4 (component migration) extends visual coverage to live
SPA components against the prototype.
### Tier 5 — E2E (Playwright)
**Run via:** `pnpm test:e2e`
**Environment:** Real Laravel test server (`php artisan serve --port=
8001`, DB `crewli_test`) + real Chromium browser context.
**Cost:** ~5 s for the suite (includes migrate:fresh + seed)
**For:** Contract verification end-to-end. Real network, real auth,
real DB transactions. Currently only the 409-conflict optimistic-
locking contract test (TEST-CONTRACT-001). Add tests sparingly — this
is the most expensive tier.
---
## 2. When to use what — decision tree
```
Is the thing under test pure logic with no DOM?
└─ YES → Unit (Vitest + happy-dom)
Is it a single component? (props, events, slots, CSS, keyboard)
└─ YES → Component (Playwright CT)
Is it cross-component coordination, but no real backend?
└─ YES → Integration (Playwright CT)
Is it a contract between SPA and backend (request/response shape)?
└─ YES → E2E (Playwright + Laravel)
Is it visual fidelity to a canonical baseline?
└─ YES → Visual (Playwright CT, @visual tag)
```
**Don't pick by speed.** Pick by what you're verifying. A unit test
that mocks the backend cannot catch a contract-drift bug; an e2e test
for pure logic is wasted CI time.
---
## 3. Mock-vs-real-backend choice rules
### Mock when
- The test verifies SPA behaviour given a known response shape
- Backend availability would slow the test below the relevant tier's
cost budget
- The path under test is independent of transactional / auth
semantics
### Real backend when
- The test verifies the contract between frontend and backend (Zod
schema vs. PHP Resource shape)
- Authentication or authorisation flows are involved
- Optimistic-locking, idempotency, or other multi-request semantics
matter
**Anti-pattern: matching mocks to schemas.** Don't mock with the same
shape your Zod schema validates — that creates self-confirming bias
where both sides agree but neither matches reality. This is the
exact failure mode TEST-CONTRACT-001 was created to catch (timetable-
stabilization B5).
---
## 4. Visual baseline workflow
### Capturing baselines
```bash
pnpm test:visual:update
```
Reviews PNG diffs in PRs. Baselines live at:
```
apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png
```
Tracked via Git LFS (see `.gitattributes`). Pixel tolerance:
`maxDiffPixelRatio: 0.001` (0.1%) per `playwright-ct.config.ts`.
### Updating baselines (intentional UX change)
1. Make the UX change (component edit, token edit, …)
2. Run `pnpm test:visual:update` locally
3. Review the diff PNG manually — does the new baseline match the
intended UX?
4. Commit baseline + UX change in the **same PR**. Reviewer can
compare baseline change against the UX intent.
5. Never update baselines to "make tests pass" without a UX-justified
reason in the PR description.
### Updating baselines (unintentional diff in CI)
1. Determine if the diff is environmental (font hinting, OS rendering,
timezone-based date formatting) or a real regression.
2. Environmental → consider tightening determinism (lock fonts, fake
timers, fixed locale) before tweaking tolerance.
3. Real regression → fix the regression, not the baseline.
### Composite-over-isolated strategy (B3 baselines)
Some surfaces enumerated in RFC §A.3's baseline list are captured as
composite views rather than individual block-state baselines. Reason:
the prototype's DOM exposes status only via inline `style.background`,
no `data-*` attributes. Isolated locators (e.g. by artist name) lock
the test to specific seed data and silently rot if data changes.
The current 5 baselines cover the visual vocabulary:
| File | Captures |
| ----------------------------- | ------------------------------------------------------- |
| `canvas-friday.png` | Status colors, b2b indicators, multi-lane stacking |
| `canvas-saturday.png` | Conflict ring, capacity warning |
| `stage-row-multilane.png` | First row in isolation |
| `wachtrij-populated.png` | Sidebar list rendering, status badges, counts |
| `popover.png` | Block-click popover layout |
9 additional surfaces are documented as `test.skip()` in
`tests/playwright-ct/visual/prototype.spec.ts` with the gap reason.
F4 component migration adds isolated baselines using stable
`data-test-id` attributes on Vue components.
---
## 5. CI integration
**Status: deferred.** The repo currently has no CI runner configured.
Local development workflow:
- Vitest (`pnpm test`) — tier 1, runs on demand
- Playwright Component (`pnpm test:component`) — tiers 24, runs on
demand
- Playwright E2E (`pnpm test:e2e`) — tier 5, runs on demand against a
developer-managed Laravel test server
CI design (Gitea Actions vs. GitHub Actions decision, Linux runner
image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload,
label-gated nightly e2e) is captured as `TEST-INFRA-002` in
`dev-docs/BACKLOG.md`.
When CI lands:
- Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
- PR-CI: Vitest unit + Playwright component + visual. Slower but full
coverage.
- Nightly / label-gated: Playwright e2e against real Laravel + MySQL.
Most expensive tier.
---
## 6. Conventions
- **Test file naming:** `*.spec.ts` for Playwright (CT + e2e),
`*.test.ts` for Vitest. The runner config glob keeps them apart.
- **`@visual` tag:** required on all visual-regression tests so
`--grep @visual` filters them.
- **Provider stack for CT:** wired in `apps/app/playwright/index.ts`'s
`beforeMount` hook, not at mount call time. Tests forward
per-test overrides via `hooksConfig` (see
`tests/playwright-ct/utils/mountWithProviders.ts`).
- **E2E test isolation:** `globalSetup` runs `migrate:fresh + seed`
once per `pnpm test:e2e` invocation. Tests within one run share DB
state. Re-run = fresh DB.
- **Pixel tolerance:** `maxDiffPixelRatio: 0.001` default
(`playwright-ct.config.ts`). Per-test exceptions allowed if
documented inline.
- **Auth in e2e tests:** Bearer-via-cookie (`api/.../SetAuthCookie.php`).
POST `/api/v1/auth/login` returns `crewli_app_token` httpOnly cookie.
No CSRF dance, no Sanctum stateful flow. baseURL must be
`localhost:8001` (matching the cookie's `domain=localhost`),
**not** `127.0.0.1:8001`.
### Anti-patterns to avoid
1. **Mocking the same data shape that the schema validates**
creates self-confirming bias. Use real backend for contract tests
(TEST-CONTRACT-001 catches this class of bug).
2. **Updating baselines silently** without diff review or a UX-
justified PR description.
3. **Adding Playwright tests for pure logic** that Vitest can cover
in 20 ms. Reserve Playwright for tests that need the browser.
4. **Treating "small" UX changes as not needing visual updates**
there is no small visual change in an enterprise product; the
user notices.
5. **Brittle locators** by data values (artist names, stage names)
instead of stable test IDs. F4 will add `data-test-id` to Vue
components for this reason.
---
## 7. Vuetify in test infrastructure during the PrimeVue migration
`apps/app/playwright/index.ts`'s `beforeMount` hook registers Vuetify
as a Vue plugin. This is **intentional temporary state**.
### Why
The current SPA still ships Vuetify. Component-level Playwright CT
tests must mount components against the same UI framework the live
app uses, otherwise they would test a non-existent surface. Stripping
Vuetify from test infra now would make CT tests un-runnable until
F3 lands PrimeVue.
### When it ends
F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the
Vuetify plugin line in `playwright/index.ts` with PrimeVue and
updates `tests/playwright-ct/components/sanity-vuetify.spec.ts` to
its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical
swap, no architecture change).
### Why not abstract
The instinct of "abstract the UI framework provider so we can swap
without touching test code" is a **deferred-cost trap** here:
1. We are NOT retaining Vuetify post-F3. The abstraction would itself
need to be removed in F4 alongside the framework swap.
2. The swap is mechanical (~2 hours). An abstraction layer would take
longer to design well than the swap itself takes.
3. Reviewers seeing "Vuetify in test infra in a PrimeVue migration
sprint" should read this section + the JSDoc on
`mountWithProviders.ts` for context.
The forbidden pattern: do not propose "let's make a `UIFrameworkPlugin`
interface and dependency-inject the provider per test" during F2/F3.
That's exactly the abstraction this section forbids.
---
## 8. Host setup requirements
For Playwright tests to run, the host must have:
- **Node v22+** with **pnpm 10+** (matching `apps/app/`'s expectations)
- **Chromium** installed via `pnpm exec playwright install chromium`
(downloads to `~/Library/Caches/ms-playwright` on macOS)
- **Git LFS** installed (`brew install git-lfs` on macOS) and active
(`git lfs install --skip-repo` to avoid hook conflict with lefthook;
the LFS pre-push step is delegated through `lefthook.yml`)
- **MySQL 8** running locally via `make services` for e2e tests, with
the `crewli_test` database created via `make test-db-create`
- **PHP 8.2+ + composer** for the Laravel test server in e2e tests
- **`api/.env`** present with valid `APP_KEY` (e2e `globalSetup`
inherits this; only `DB_DATABASE` is overridden to `crewli_test` on
the command line)
### Known risks
- **`unpkg.com` dependency** — the prototype HTML loads React + Babel
from unpkg.com via `<script src="https://unpkg.com/...">`. Local
network outage or unpkg CDN issues will flake B3 baselines. Mitigation
if it bites: vendor `react.umd.js` + `babel.min.js` into the
prototype directory. Defer until it actually breaks.
- **Test DB shared with PHPUnit** — `crewli_test` is used by both the
PHPUnit suite (transaction-rollback per test) and the e2e fixture
(migrate:fresh + seed once). Running them concurrently would
collide. Lifecycle assumes serial execution, which is the realistic
local-dev flow.
---
## 9. Deferred to BACKLOG
- **TEST-INFRA-002** — CI runner selection (Gitea Actions vs. GitHub
Actions decision), runner image with PHP+MySQL+Node+pnpm, caching
strategy, screenshot-diff artifact upload, label-gated nightly e2e.
- F4 isolated component-level visual baselines (replacing the
composite baselines in B3 with per-state baselines using stable
`data-test-id` attributes).
- F4 multi-context concurrent-edit e2e tests (currently the 409
contract test uses single-context replay).
- Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only
for v1 per RFC §A.5.
- Mobile viewport baselines — desktop 1440×900 only for v1.
- Soketi / WebSocket testing infrastructure when ART-15 lands.