- BACKLOG: add 3 spawned follow-ups (EnergyDots NaN, DraggableBlock pointercancel, AD-3 Menubar a11y) - RFC-WS-GUI-REDESIGN-CREWLI-STARTER: mark Plan 3 complete with commit refs + DoD ledger - PRIMEVUE_COMPONENTS: v2 primitives registry (8 components), statusSeverity SoT, Menubar-wrap pattern - ARCH-TESTING: mount-helper type convention (Plan 3 codified, Plan 4 carry-over) - FRONTEND-TOOLING: scoped lint invocation note (DoD #13 root cause) - AppDialog.stories.ts: rename title to 'Shared/AppDialog' for sibling consistency
14 KiB
Crewli — Test Architecture
Authoritative reference for test-tier choices in the SPA. Read this before adding a new test. Linked from
CLAUDE.md.
This document describes:
- The test pyramid Crewli uses, and what each tier is for
- When to use which tier (decision tree)
- Mock-vs-real-backend rules
- Visual baseline workflow
- CI integration status
- Conventions and anti-patterns
- Vuetify-during-PrimeVue-migration: the temporary state in test infra
- Host setup requirements
- Deferred work (BACKLOG references)
1. Test pyramid and scope per layer
Crewli runs five test tiers in the SPA. Each has a narrow purpose; overlap is wasted work, gaps are silent risk. Pick the tier whose purpose matches what you're actually verifying.
Tier 1 — Unit (Vitest + happy-dom)
Run via: pnpm test (filtered by tests/unit/**)
Environment: Node + happy-dom, single module graph
Cost: ~20 ms per test
For: Pure logic, schema parsing, store reducers, isolated composable
behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.
Tier 2 — Component (Playwright Component Testing)
Run via: pnpm test:component
Environment: Real Chromium via @playwright/experimental-ct-vue
Cost: ~300 ms per test (incl. Chromium reuse)
For: Single-component verification. DOM rendering, click/keyboard,
prop propagation, slot rendering, CSS resolution. Mocks API at axios
layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is
wired in apps/app/playwright/index.ts's beforeMount hook.
Tier 3 — Integration (Playwright CT, multi-component)
Run via: Same pnpm test:component runner; placement convention
distinguishes integration from single-component.
Cost: ~500 ms per test
For: Page-level mounting with mocked API responses. Tests
cross-component coordination (drag from Wachtrij → canvas, popover
→ mutation flow). Same provider stack as Tier 2.
Tier 4 — Visual regression (Playwright CT, @visual tag)
Run via: pnpm test:visual (verify), pnpm test:visual:update
(regenerate baselines)
Environment: Real Chromium driving the canonical prototype HTML
served by a tiny static-server fixture (tests/playwright-ct/visual/ static-server.mjs).
Cost: ~1.2 s per test
For: Pixel baselines against canonical visual sources. The
prototype HTML at resources/Crewli - Artist Timetable Management/ crewli-timetable.html is the source of truth for Artist Management
surfaces. F4 (component migration) extends visual coverage to live
SPA components against the prototype.
Tier 5 — E2E (Playwright)
Run via: pnpm test:e2e
Environment: Real Laravel test server (php artisan serve --port= 8001, DB crewli_test) + real Chromium browser context.
Cost: ~5 s for the suite (includes migrate:fresh + seed)
For: Contract verification end-to-end. Real network, real auth,
real DB transactions. Currently only the 409-conflict optimistic-
locking contract test (TEST-CONTRACT-001). Add tests sparingly — this
is the most expensive tier.
2. When to use what — decision tree
First match wins — stop at the first YES.
Is the thing under test pure logic with no DOM?
└─ YES → Unit (Vitest + happy-dom)
Is it a single component? (props, events, slots, CSS, keyboard)
└─ YES → Component (Playwright CT)
Is it cross-component coordination, but no real backend?
└─ YES → Integration (Playwright CT)
Is it a contract between SPA and backend (request/response shape)?
└─ YES → E2E (Playwright + Laravel)
Is it visual fidelity to a canonical baseline?
└─ YES → Visual (Playwright CT, @visual tag)
Don't pick by speed. Pick by what you're verifying. A unit test that mocks the backend cannot catch a contract-drift bug; an e2e test for pure logic is wasted CI time.
3. Mock-vs-real-backend choice rules
Mock when
- The test verifies SPA behaviour given a known response shape
- Backend availability would slow the test below the relevant tier's cost budget
- The path under test is independent of transactional / auth semantics
Real backend when
- The test verifies the contract between frontend and backend (Zod schema vs. PHP Resource shape)
- Authentication or authorisation flows are involved
- Optimistic-locking, idempotency, or other multi-request semantics matter
Anti-pattern: matching mocks to schemas. Don't mock with the same shape your Zod schema validates — that creates self-confirming bias where both sides agree but neither matches reality. This is the exact failure mode TEST-CONTRACT-001 was created to catch (timetable- stabilization B5).
4. Visual baseline workflow
Capturing baselines
pnpm test:visual:update
Diffs are reviewed in PRs. Baselines live at:
apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png
Tracked via Git LFS (see .gitattributes). Pixel tolerance:
maxDiffPixelRatio: 0.001 (0.1%) per playwright-ct.config.ts.
Updating baselines (intentional UX change)
- Make the UX change (component edit, token edit, …)
- Run
pnpm test:visual:updatelocally - Review the diff PNG manually — does the new baseline match the intended UX?
- Commit baseline + UX change in the same PR. Reviewer can compare baseline change against the UX intent.
- Never update baselines to "make tests pass" without a UX-justified reason in the PR description.
Updating baselines (unintentional diff in CI)
- Determine if the diff is environmental (font hinting, OS rendering, timezone-based date formatting) or a real regression.
- Environmental → consider tightening determinism (lock fonts, fake timers, fixed locale) before tweaking tolerance.
- Real regression → fix the regression, not the baseline.
Composite-over-isolated strategy (B3 baselines)
Some surfaces enumerated in RFC §A.3's baseline list are captured as
composite views rather than individual block-state baselines. Reason:
the prototype's DOM exposes status only via inline style.background,
no data-* attributes. Isolated locators (e.g. by artist name) lock
the test to specific seed data and silently rot if data changes.
The current 5 baselines cover the visual vocabulary:
| File | Captures |
|---|---|
canvas-friday.png |
Status colors, b2b indicators, multi-lane stacking |
canvas-saturday.png |
Conflict ring, capacity warning |
stage-row-multilane.png |
First row in isolation |
wachtrij-populated.png |
Sidebar list rendering, status badges, counts |
popover.png |
Block-click popover layout |
9 additional surfaces are documented as test.skip() in
tests/playwright-ct/visual/prototype.spec.ts with the gap reason.
F4 component migration adds isolated baselines using stable
data-test-id attributes on Vue components.
5. CI integration
Status: deferred. The repo currently has no CI runner configured. Local development workflow:
- Vitest (
pnpm test) — tier 1, runs on demand - Playwright Component (
pnpm test:component) — tiers 2–4, runs on demand - Playwright E2E (
pnpm test:e2e) — tier 5, runs on demand against a developer-managed Laravel test server
CI design (Gitea Actions vs. GitHub Actions decision, Linux runner
image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload,
label-gated nightly e2e) is captured as TEST-INFRA-002 in
dev-docs/BACKLOG.md.
When CI lands:
- Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
- PR-CI: Vitest unit + Playwright component + visual. Slower but full coverage.
- Nightly / label-gated: Playwright e2e against real Laravel + MySQL. Most expensive tier.
6. Conventions
- Test file naming:
*.spec.tsfor Playwright (CT + e2e),*.test.tsfor Vitest. The runner config glob keeps them apart. @visualtag: required on all visual-regression tests so--grep @visualfilters them.- Provider stack for CT: wired in
apps/app/playwright/index.ts'sbeforeMounthook, not at mount call time. Tests forward per-test overrides viahooksConfig(seetests/playwright-ct/utils/mountWithProviders.ts). - E2E test isolation:
globalSetuprunsmigrate:fresh + seedonce perpnpm test:e2einvocation. Tests within one run share DB state. Re-run = fresh DB. - Pixel tolerance:
maxDiffPixelRatio: 0.001default (playwright-ct.config.ts). Per-test exceptions allowed if documented inline. - Auth in e2e tests: Bearer-via-cookie (
api/.../SetAuthCookie.php). POST/api/v1/auth/loginreturnscrewli_app_tokenhttpOnly cookie. No CSRF dance, no Sanctum stateful flow. baseURL must belocalhost:8001(matching the cookie'sdomain=localhost), not127.0.0.1:8001.
Anti-patterns to avoid
- Mocking the same data shape that the schema validates — creates self-confirming bias. Use real backend for contract tests (TEST-CONTRACT-001 catches this class of bug).
- Updating baselines silently without diff review or a UX- justified PR description.
- Adding Playwright tests for pure logic that Vitest can cover in 20 ms. Reserve Playwright for tests that need the browser.
- Treating "small" UX changes as not needing visual updates — there is no small visual change in an enterprise product; the user notices.
- Brittle locators by data values (artist names, stage names)
instead of stable test IDs. F4 will add
data-test-idto Vue components for this reason.
7. Vuetify in test infrastructure during the PrimeVue migration
apps/app/playwright/index.ts's beforeMount hook registers Vuetify
as a Vue plugin. This is intentional temporary state.
Why
The current SPA still ships Vuetify. Component-level Playwright CT tests must mount components against the same UI framework the live app uses, otherwise they would test a non-existent surface. Stripping Vuetify from test infra now would make CT tests un-runnable until F3 lands PrimeVue.
When it ends
F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the
Vuetify plugin line in playwright/index.ts with PrimeVue and
updates tests/playwright-ct/components/sanity-vuetify.spec.ts to
its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical
swap, no architecture change).
Why not abstract
The instinct of "abstract the UI framework provider so we can swap without touching test code" is a deferred-cost trap here:
- We are NOT retaining Vuetify post-F3. The abstraction would itself need to be removed in F4 alongside the framework swap.
- The swap is mechanical (~2 hours). An abstraction layer would take longer to design well than the swap itself takes.
- Reviewers seeing "Vuetify in test infra in a PrimeVue migration
sprint" should read this section + the JSDoc on
mountWithProviders.tsfor context.
The forbidden pattern: do not propose "let's make a UIFrameworkPlugin
interface and dependency-inject the provider per test" during F2/F3.
That's exactly the abstraction this section forbids.
8. Host setup requirements
For Playwright tests to run, the host must have:
- Node v22+ with pnpm 10+ (matching
apps/app/'s expectations) - Chromium installed via
pnpm exec playwright install chromium(downloads to~/Library/Caches/ms-playwrighton macOS) - Git LFS installed (
brew install git-lfson macOS) and active (git lfs install --skip-repoto avoid hook conflict with lefthook; the LFS pre-push step is delegated throughlefthook.yml) - MySQL 8 running locally via
make servicesfor e2e tests, with thecrewli_testdatabase created viamake test-db-create - PHP 8.2+ + composer for the Laravel test server in e2e tests
api/.envpresent with validAPP_KEY(e2eglobalSetupinherits this; onlyDB_DATABASEis overridden tocrewli_teston the command line)
Known risks
unpkg.comdependency — the prototype HTML loads React + Babel from unpkg.com via<script src="https://unpkg.com/...">. Local network outage or unpkg CDN issues will flake B3 baselines. Mitigation if it bites: vendorreact.umd.js+babel.min.jsinto the prototype directory. Defer until it actually breaks.- Test DB shared with PHPUnit —
crewli_testis used by both the PHPUnit suite (transaction-rollback per test) and the e2e fixture (migrate:fresh + seed once). Running them concurrently would collide. Lifecycle assumes serial execution, which is the realistic local-dev flow.
9. Deferred to BACKLOG
- TEST-INFRA-002 — CI runner selection (Gitea Actions vs. GitHub Actions decision), runner image with PHP+MySQL+Node+pnpm, caching strategy, screenshot-diff artifact upload, label-gated nightly e2e.
- F4 isolated component-level visual baselines (replacing the
composite baselines in B3 with per-state baselines using stable
data-test-idattributes). - Multi-context concurrent-edit e2e patterns — see TEST-INFRA-002 in BACKLOG.md
- Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only for v1 per RFC §A.5.
- Mobile viewport baselines — desktop 1440×900 only for v1.
- Soketi / WebSocket testing infrastructure when ART-15 lands.
Mount-helper type convention (Plan 3 codified)
Plan 3 hit this in 6+ tasks: a plan-doc test spec typed the mount
helper's props parameter as Record<string, unknown>, which vue-tsc
strict mode rejects when the object is passed to
mount(Component, { props }) — the component's generated prop type is
narrower than Record<string, unknown>, so the assignment is a type
error (and widening it with any would violate the project zero-any
rule).
Convention: type the helper parameter as Partial<<Component>Props>,
never Record<string, unknown>:
const mountX = (props: Partial<XProps> = {}) =>
mount(X, { props })
Rationale: satisfies vue-tsc strict; behaviour-neutral; introduces no
any. Plan 3 used the equivalent explicit inline props shape per task
(the behaviour-neutral sanctioned deviation from the verbatim plan-doc);
standardise on Partial<<Component>Props> from Plan 4 onward so the
template-layer tests (List / Form / Detail / Dashboard / StateBlock)
share one idiom rather than re-deriving the shape each time.