Files
crewli/dev-docs/ARCH-TESTING.md
bert.hausmans 637d77b327 docs(plan-3): close out Plan 3 — BACKLOG entries, RFC status, primitives registry, tooling conventions
- BACKLOG: add 3 spawned follow-ups (EnergyDots NaN, DraggableBlock pointercancel, AD-3 Menubar a11y)
- RFC-WS-GUI-REDESIGN-CREWLI-STARTER: mark Plan 3 complete with commit refs + DoD ledger
- PRIMEVUE_COMPONENTS: v2 primitives registry (8 components), statusSeverity SoT, Menubar-wrap pattern
- ARCH-TESTING: mount-helper type convention (Plan 3 codified, Plan 4 carry-over)
- FRONTEND-TOOLING: scoped lint invocation note (DoD #13 root cause)
- AppDialog.stories.ts: rename title to 'Shared/AppDialog' for sibling consistency
2026-05-19 01:41:19 +02:00

370 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Crewli — Test Architecture
> Authoritative reference for test-tier choices in the SPA. Read this
> before adding a new test. Linked from `CLAUDE.md`.
This document describes:
1. The test pyramid Crewli uses, and what each tier is for
2. When to use which tier (decision tree)
3. Mock-vs-real-backend rules
4. Visual baseline workflow
5. CI integration status
6. Conventions and anti-patterns
7. Vuetify-during-PrimeVue-migration: the temporary state in test infra
8. Host setup requirements
9. Deferred work (BACKLOG references)
---
## 1. Test pyramid and scope per layer
Crewli runs five test tiers in the SPA. Each has a narrow purpose;
overlap is wasted work, gaps are silent risk. Pick the tier whose
purpose matches what you're actually verifying.
### Tier 1 — Unit (Vitest + happy-dom)
**Run via:** `pnpm test` (filtered by `tests/unit/**`)
**Environment:** Node + happy-dom, single module graph
**Cost:** ~20 ms per test
**For:** Pure logic, schema parsing, store reducers, isolated composable
behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.
### Tier 2 — Component (Playwright Component Testing)
**Run via:** `pnpm test:component`
**Environment:** Real Chromium via `@playwright/experimental-ct-vue`
**Cost:** ~300 ms per test (incl. Chromium reuse)
**For:** Single-component verification. DOM rendering, click/keyboard,
prop propagation, slot rendering, CSS resolution. Mocks API at axios
layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is
wired in `apps/app/playwright/index.ts`'s `beforeMount` hook.
### Tier 3 — Integration (Playwright CT, multi-component)
**Run via:** Same `pnpm test:component` runner; placement convention
distinguishes integration from single-component.
**Cost:** ~500 ms per test
**For:** Page-level mounting with mocked API responses. Tests
cross-component coordination (drag from Wachtrij → canvas, popover
→ mutation flow). Same provider stack as Tier 2.
### Tier 4 — Visual regression (Playwright CT, `@visual` tag)
**Run via:** `pnpm test:visual` (verify), `pnpm test:visual:update`
(regenerate baselines)
**Environment:** Real Chromium driving the canonical prototype HTML
served by a tiny static-server fixture (`tests/playwright-ct/visual/
static-server.mjs`).
**Cost:** ~1.2 s per test
**For:** Pixel baselines against canonical visual sources. The
prototype HTML at `resources/Crewli - Artist Timetable Management/
crewli-timetable.html` is the source of truth for Artist Management
surfaces. F4 (component migration) extends visual coverage to live
SPA components against the prototype.
### Tier 5 — E2E (Playwright)
**Run via:** `pnpm test:e2e`
**Environment:** Real Laravel test server (`php artisan serve --port=
8001`, DB `crewli_test`) + real Chromium browser context.
**Cost:** ~5 s for the suite (includes migrate:fresh + seed)
**For:** Contract verification end-to-end. Real network, real auth,
real DB transactions. Currently only the 409-conflict optimistic-
locking contract test (TEST-CONTRACT-001). Add tests sparingly — this
is the most expensive tier.
---
## 2. When to use what — decision tree
*First match wins — stop at the first YES.*
```
Is the thing under test pure logic with no DOM?
└─ YES → Unit (Vitest + happy-dom)
Is it a single component? (props, events, slots, CSS, keyboard)
└─ YES → Component (Playwright CT)
Is it cross-component coordination, but no real backend?
└─ YES → Integration (Playwright CT)
Is it a contract between SPA and backend (request/response shape)?
└─ YES → E2E (Playwright + Laravel)
Is it visual fidelity to a canonical baseline?
└─ YES → Visual (Playwright CT, @visual tag)
```
**Don't pick by speed.** Pick by what you're verifying. A unit test
that mocks the backend cannot catch a contract-drift bug; an e2e test
for pure logic is wasted CI time.
---
## 3. Mock-vs-real-backend choice rules
### Mock when
- The test verifies SPA behaviour given a known response shape
- Backend availability would slow the test below the relevant tier's
cost budget
- The path under test is independent of transactional / auth
semantics
### Real backend when
- The test verifies the contract between frontend and backend (Zod
schema vs. PHP Resource shape)
- Authentication or authorisation flows are involved
- Optimistic-locking, idempotency, or other multi-request semantics
matter
**Anti-pattern: matching mocks to schemas.** Don't mock with the same
shape your Zod schema validates — that creates self-confirming bias
where both sides agree but neither matches reality. This is the
exact failure mode TEST-CONTRACT-001 was created to catch (timetable-
stabilization B5).
---
## 4. Visual baseline workflow
### Capturing baselines
```bash
pnpm test:visual:update
```
Diffs are reviewed in PRs. Baselines live at:
```
apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png
```
Tracked via Git LFS (see `.gitattributes`). Pixel tolerance:
`maxDiffPixelRatio: 0.001` (0.1%) per `playwright-ct.config.ts`.
### Updating baselines (intentional UX change)
1. Make the UX change (component edit, token edit, …)
2. Run `pnpm test:visual:update` locally
3. Review the diff PNG manually — does the new baseline match the
intended UX?
4. Commit baseline + UX change in the **same PR**. Reviewer can
compare baseline change against the UX intent.
5. Never update baselines to "make tests pass" without a UX-justified
reason in the PR description.
### Updating baselines (unintentional diff in CI)
1. Determine if the diff is environmental (font hinting, OS rendering,
timezone-based date formatting) or a real regression.
2. Environmental → consider tightening determinism (lock fonts, fake
timers, fixed locale) before tweaking tolerance.
3. Real regression → fix the regression, not the baseline.
### Composite-over-isolated strategy (B3 baselines)
Some surfaces enumerated in RFC §A.3's baseline list are captured as
composite views rather than individual block-state baselines. Reason:
the prototype's DOM exposes status only via inline `style.background`,
no `data-*` attributes. Isolated locators (e.g. by artist name) lock
the test to specific seed data and silently rot if data changes.
The current 5 baselines cover the visual vocabulary:
| File | Captures |
| ----------------------------- | ------------------------------------------------------- |
| `canvas-friday.png` | Status colors, b2b indicators, multi-lane stacking |
| `canvas-saturday.png` | Conflict ring, capacity warning |
| `stage-row-multilane.png` | First row in isolation |
| `wachtrij-populated.png` | Sidebar list rendering, status badges, counts |
| `popover.png` | Block-click popover layout |
9 additional surfaces are documented as `test.skip()` in
`tests/playwright-ct/visual/prototype.spec.ts` with the gap reason.
F4 component migration adds isolated baselines using stable
`data-test-id` attributes on Vue components.
---
## 5. CI integration
**Status: deferred.** The repo currently has no CI runner configured.
Local development workflow:
- Vitest (`pnpm test`) — tier 1, runs on demand
- Playwright Component (`pnpm test:component`) — tiers 24, runs on
demand
- Playwright E2E (`pnpm test:e2e`) — tier 5, runs on demand against a
developer-managed Laravel test server
CI design (Gitea Actions vs. GitHub Actions decision, Linux runner
image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload,
label-gated nightly e2e) is captured as `TEST-INFRA-002` in
`dev-docs/BACKLOG.md`.
When CI lands:
- Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
- PR-CI: Vitest unit + Playwright component + visual. Slower but full
coverage.
- Nightly / label-gated: Playwright e2e against real Laravel + MySQL.
Most expensive tier.
---
## 6. Conventions
- **Test file naming:** `*.spec.ts` for Playwright (CT + e2e),
`*.test.ts` for Vitest. The runner config glob keeps them apart.
- **`@visual` tag:** required on all visual-regression tests so
`--grep @visual` filters them.
- **Provider stack for CT:** wired in `apps/app/playwright/index.ts`'s
`beforeMount` hook, not at mount call time. Tests forward
per-test overrides via `hooksConfig` (see
`tests/playwright-ct/utils/mountWithProviders.ts`).
- **E2E test isolation:** `globalSetup` runs `migrate:fresh + seed`
once per `pnpm test:e2e` invocation. Tests within one run share DB
state. Re-run = fresh DB.
- **Pixel tolerance:** `maxDiffPixelRatio: 0.001` default
(`playwright-ct.config.ts`). Per-test exceptions allowed if
documented inline.
- **Auth in e2e tests:** Bearer-via-cookie (`api/.../SetAuthCookie.php`).
POST `/api/v1/auth/login` returns `crewli_app_token` httpOnly cookie.
No CSRF dance, no Sanctum stateful flow. baseURL must be
`localhost:8001` (matching the cookie's `domain=localhost`),
**not** `127.0.0.1:8001`.
### Anti-patterns to avoid
1. **Mocking the same data shape that the schema validates**
creates self-confirming bias. Use real backend for contract tests
(TEST-CONTRACT-001 catches this class of bug).
2. **Updating baselines silently** without diff review or a UX-
justified PR description.
3. **Adding Playwright tests for pure logic** that Vitest can cover
in 20 ms. Reserve Playwright for tests that need the browser.
4. **Treating "small" UX changes as not needing visual updates**
there is no small visual change in an enterprise product; the
user notices.
5. **Brittle locators** by data values (artist names, stage names)
instead of stable test IDs. F4 will add `data-test-id` to Vue
components for this reason.
---
## 7. Vuetify in test infrastructure during the PrimeVue migration
`apps/app/playwright/index.ts`'s `beforeMount` hook registers Vuetify
as a Vue plugin. This is **intentional temporary state**.
### Why
The current SPA still ships Vuetify. Component-level Playwright CT
tests must mount components against the same UI framework the live
app uses, otherwise they would test a non-existent surface. Stripping
Vuetify from test infra now would make CT tests un-runnable until
F3 lands PrimeVue.
### When it ends
F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the
Vuetify plugin line in `playwright/index.ts` with PrimeVue and
updates `tests/playwright-ct/components/sanity-vuetify.spec.ts` to
its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical
swap, no architecture change).
### Why not abstract
The instinct of "abstract the UI framework provider so we can swap
without touching test code" is a **deferred-cost trap** here:
1. We are NOT retaining Vuetify post-F3. The abstraction would itself
need to be removed in F4 alongside the framework swap.
2. The swap is mechanical (~2 hours). An abstraction layer would take
longer to design well than the swap itself takes.
3. Reviewers seeing "Vuetify in test infra in a PrimeVue migration
sprint" should read this section + the JSDoc on
`mountWithProviders.ts` for context.
The forbidden pattern: do not propose "let's make a `UIFrameworkPlugin`
interface and dependency-inject the provider per test" during F2/F3.
That's exactly the abstraction this section forbids.
---
## 8. Host setup requirements
For Playwright tests to run, the host must have:
- **Node v22+** with **pnpm 10+** (matching `apps/app/`'s expectations)
- **Chromium** installed via `pnpm exec playwright install chromium`
(downloads to `~/Library/Caches/ms-playwright` on macOS)
- **Git LFS** installed (`brew install git-lfs` on macOS) and active
(`git lfs install --skip-repo` to avoid hook conflict with lefthook;
the LFS pre-push step is delegated through `lefthook.yml`)
- **MySQL 8** running locally via `make services` for e2e tests, with
the `crewli_test` database created via `make test-db-create`
- **PHP 8.2+ + composer** for the Laravel test server in e2e tests
- **`api/.env`** present with valid `APP_KEY` (e2e `globalSetup`
inherits this; only `DB_DATABASE` is overridden to `crewli_test` on
the command line)
### Known risks
- **`unpkg.com` dependency** — the prototype HTML loads React + Babel
from unpkg.com via `<script src="https://unpkg.com/...">`. Local
network outage or unpkg CDN issues will flake B3 baselines. Mitigation
if it bites: vendor `react.umd.js` + `babel.min.js` into the
prototype directory. Defer until it actually breaks.
- **Test DB shared with PHPUnit** — `crewli_test` is used by both the
PHPUnit suite (transaction-rollback per test) and the e2e fixture
(migrate:fresh + seed once). Running them concurrently would
collide. Lifecycle assumes serial execution, which is the realistic
local-dev flow.
---
## 9. Deferred to BACKLOG
- **TEST-INFRA-002** — CI runner selection (Gitea Actions vs. GitHub
Actions decision), runner image with PHP+MySQL+Node+pnpm, caching
strategy, screenshot-diff artifact upload, label-gated nightly e2e.
- F4 isolated component-level visual baselines (replacing the
composite baselines in B3 with per-state baselines using stable
`data-test-id` attributes).
- Multi-context concurrent-edit e2e patterns — see TEST-INFRA-002 in BACKLOG.md
- Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only
for v1 per RFC §A.5.
- Mobile viewport baselines — desktop 1440×900 only for v1.
- Soketi / WebSocket testing infrastructure when ART-15 lands.
---
## Mount-helper type convention (Plan 3 codified)
Plan 3 hit this in 6+ tasks: a plan-doc test spec typed the `mount`
helper's props parameter as `Record<string, unknown>`, which `vue-tsc`
strict mode rejects when the object is passed to
`mount(Component, { props })` — the component's generated prop type is
narrower than `Record<string, unknown>`, so the assignment is a type
error (and widening it with `any` would violate the project zero-`any`
rule).
**Convention:** type the helper parameter as `Partial<<Component>Props>`,
never `Record<string, unknown>`:
```ts
const mountX = (props: Partial<XProps> = {}) =>
mount(X, { props })
```
Rationale: satisfies `vue-tsc` strict; behaviour-neutral; introduces no
`any`. Plan 3 used the equivalent explicit inline props shape per task
(the behaviour-neutral sanctioned deviation from the verbatim plan-doc);
**standardise on `Partial<<Component>Props>` from Plan 4 onward** so the
template-layer tests (List / Form / Detail / Dashboard / StateBlock)
share one idiom rather than re-deriving the shape each time.