Files
crewli/dev-docs/ARCH-TESTING.md
bert.hausmans 637d77b327 docs(plan-3): close out Plan 3 — BACKLOG entries, RFC status, primitives registry, tooling conventions
- BACKLOG: add 3 spawned follow-ups (EnergyDots NaN, DraggableBlock pointercancel, AD-3 Menubar a11y)
- RFC-WS-GUI-REDESIGN-CREWLI-STARTER: mark Plan 3 complete with commit refs + DoD ledger
- PRIMEVUE_COMPONENTS: v2 primitives registry (8 components), statusSeverity SoT, Menubar-wrap pattern
- ARCH-TESTING: mount-helper type convention (Plan 3 codified, Plan 4 carry-over)
- FRONTEND-TOOLING: scoped lint invocation note (DoD #13 root cause)
- AppDialog.stories.ts: rename title to 'Shared/AppDialog' for sibling consistency
2026-05-19 01:41:19 +02:00

14 KiB
Raw Permalink Blame History

Crewli — Test Architecture

Authoritative reference for test-tier choices in the SPA. Read this before adding a new test. Linked from CLAUDE.md.

This document describes:

  1. The test pyramid Crewli uses, and what each tier is for
  2. When to use which tier (decision tree)
  3. Mock-vs-real-backend rules
  4. Visual baseline workflow
  5. CI integration status
  6. Conventions and anti-patterns
  7. Vuetify-during-PrimeVue-migration: the temporary state in test infra
  8. Host setup requirements
  9. Deferred work (BACKLOG references)

1. Test pyramid and scope per layer

Crewli runs five test tiers in the SPA. Each has a narrow purpose; overlap is wasted work, gaps are silent risk. Pick the tier whose purpose matches what you're actually verifying.

Tier 1 — Unit (Vitest + happy-dom)

Run via: pnpm test (filtered by tests/unit/**) Environment: Node + happy-dom, single module graph Cost: ~20 ms per test For: Pure logic, schema parsing, store reducers, isolated composable behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.

Tier 2 — Component (Playwright Component Testing)

Run via: pnpm test:component Environment: Real Chromium via @playwright/experimental-ct-vue Cost: ~300 ms per test (incl. Chromium reuse) For: Single-component verification. DOM rendering, click/keyboard, prop propagation, slot rendering, CSS resolution. Mocks API at axios layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is wired in apps/app/playwright/index.ts's beforeMount hook.

Tier 3 — Integration (Playwright CT, multi-component)

Run via: Same pnpm test:component runner; placement convention distinguishes integration from single-component. Cost: ~500 ms per test For: Page-level mounting with mocked API responses. Tests cross-component coordination (drag from Wachtrij → canvas, popover → mutation flow). Same provider stack as Tier 2.

Tier 4 — Visual regression (Playwright CT, @visual tag)

Run via: pnpm test:visual (verify), pnpm test:visual:update (regenerate baselines) Environment: Real Chromium driving the canonical prototype HTML served by a tiny static-server fixture (tests/playwright-ct/visual/ static-server.mjs). Cost: ~1.2 s per test For: Pixel baselines against canonical visual sources. The prototype HTML at resources/Crewli - Artist Timetable Management/ crewli-timetable.html is the source of truth for Artist Management surfaces. F4 (component migration) extends visual coverage to live SPA components against the prototype.

Tier 5 — E2E (Playwright)

Run via: pnpm test:e2e Environment: Real Laravel test server (php artisan serve --port= 8001, DB crewli_test) + real Chromium browser context. Cost: ~5 s for the suite (includes migrate:fresh + seed) For: Contract verification end-to-end. Real network, real auth, real DB transactions. Currently only the 409-conflict optimistic- locking contract test (TEST-CONTRACT-001). Add tests sparingly — this is the most expensive tier.


2. When to use what — decision tree

First match wins — stop at the first YES.

Is the thing under test pure logic with no DOM?
  └─ YES → Unit (Vitest + happy-dom)

Is it a single component? (props, events, slots, CSS, keyboard)
  └─ YES → Component (Playwright CT)

Is it cross-component coordination, but no real backend?
  └─ YES → Integration (Playwright CT)

Is it a contract between SPA and backend (request/response shape)?
  └─ YES → E2E (Playwright + Laravel)

Is it visual fidelity to a canonical baseline?
  └─ YES → Visual (Playwright CT, @visual tag)

Don't pick by speed. Pick by what you're verifying. A unit test that mocks the backend cannot catch a contract-drift bug; an e2e test for pure logic is wasted CI time.


3. Mock-vs-real-backend choice rules

Mock when

  • The test verifies SPA behaviour given a known response shape
  • Backend availability would slow the test below the relevant tier's cost budget
  • The path under test is independent of transactional / auth semantics

Real backend when

  • The test verifies the contract between frontend and backend (Zod schema vs. PHP Resource shape)
  • Authentication or authorisation flows are involved
  • Optimistic-locking, idempotency, or other multi-request semantics matter

Anti-pattern: matching mocks to schemas. Don't mock with the same shape your Zod schema validates — that creates self-confirming bias where both sides agree but neither matches reality. This is the exact failure mode TEST-CONTRACT-001 was created to catch (timetable- stabilization B5).


4. Visual baseline workflow

Capturing baselines

pnpm test:visual:update

Diffs are reviewed in PRs. Baselines live at:

apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png

Tracked via Git LFS (see .gitattributes). Pixel tolerance: maxDiffPixelRatio: 0.001 (0.1%) per playwright-ct.config.ts.

Updating baselines (intentional UX change)

  1. Make the UX change (component edit, token edit, …)
  2. Run pnpm test:visual:update locally
  3. Review the diff PNG manually — does the new baseline match the intended UX?
  4. Commit baseline + UX change in the same PR. Reviewer can compare baseline change against the UX intent.
  5. Never update baselines to "make tests pass" without a UX-justified reason in the PR description.

Updating baselines (unintentional diff in CI)

  1. Determine if the diff is environmental (font hinting, OS rendering, timezone-based date formatting) or a real regression.
  2. Environmental → consider tightening determinism (lock fonts, fake timers, fixed locale) before tweaking tolerance.
  3. Real regression → fix the regression, not the baseline.

Composite-over-isolated strategy (B3 baselines)

Some surfaces enumerated in RFC §A.3's baseline list are captured as composite views rather than individual block-state baselines. Reason: the prototype's DOM exposes status only via inline style.background, no data-* attributes. Isolated locators (e.g. by artist name) lock the test to specific seed data and silently rot if data changes.

The current 5 baselines cover the visual vocabulary:

File Captures
canvas-friday.png Status colors, b2b indicators, multi-lane stacking
canvas-saturday.png Conflict ring, capacity warning
stage-row-multilane.png First row in isolation
wachtrij-populated.png Sidebar list rendering, status badges, counts
popover.png Block-click popover layout

9 additional surfaces are documented as test.skip() in tests/playwright-ct/visual/prototype.spec.ts with the gap reason. F4 component migration adds isolated baselines using stable data-test-id attributes on Vue components.


5. CI integration

Status: deferred. The repo currently has no CI runner configured. Local development workflow:

  • Vitest (pnpm test) — tier 1, runs on demand
  • Playwright Component (pnpm test:component) — tiers 24, runs on demand
  • Playwright E2E (pnpm test:e2e) — tier 5, runs on demand against a developer-managed Laravel test server

CI design (Gitea Actions vs. GitHub Actions decision, Linux runner image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload, label-gated nightly e2e) is captured as TEST-INFRA-002 in dev-docs/BACKLOG.md.

When CI lands:

  • Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
  • PR-CI: Vitest unit + Playwright component + visual. Slower but full coverage.
  • Nightly / label-gated: Playwright e2e against real Laravel + MySQL. Most expensive tier.

6. Conventions

  • Test file naming: *.spec.ts for Playwright (CT + e2e), *.test.ts for Vitest. The runner config glob keeps them apart.
  • @visual tag: required on all visual-regression tests so --grep @visual filters them.
  • Provider stack for CT: wired in apps/app/playwright/index.ts's beforeMount hook, not at mount call time. Tests forward per-test overrides via hooksConfig (see tests/playwright-ct/utils/mountWithProviders.ts).
  • E2E test isolation: globalSetup runs migrate:fresh + seed once per pnpm test:e2e invocation. Tests within one run share DB state. Re-run = fresh DB.
  • Pixel tolerance: maxDiffPixelRatio: 0.001 default (playwright-ct.config.ts). Per-test exceptions allowed if documented inline.
  • Auth in e2e tests: Bearer-via-cookie (api/.../SetAuthCookie.php). POST /api/v1/auth/login returns crewli_app_token httpOnly cookie. No CSRF dance, no Sanctum stateful flow. baseURL must be localhost:8001 (matching the cookie's domain=localhost), not 127.0.0.1:8001.

Anti-patterns to avoid

  1. Mocking the same data shape that the schema validates — creates self-confirming bias. Use real backend for contract tests (TEST-CONTRACT-001 catches this class of bug).
  2. Updating baselines silently without diff review or a UX- justified PR description.
  3. Adding Playwright tests for pure logic that Vitest can cover in 20 ms. Reserve Playwright for tests that need the browser.
  4. Treating "small" UX changes as not needing visual updates — there is no small visual change in an enterprise product; the user notices.
  5. Brittle locators by data values (artist names, stage names) instead of stable test IDs. F4 will add data-test-id to Vue components for this reason.

7. Vuetify in test infrastructure during the PrimeVue migration

apps/app/playwright/index.ts's beforeMount hook registers Vuetify as a Vue plugin. This is intentional temporary state.

Why

The current SPA still ships Vuetify. Component-level Playwright CT tests must mount components against the same UI framework the live app uses, otherwise they would test a non-existent surface. Stripping Vuetify from test infra now would make CT tests un-runnable until F3 lands PrimeVue.

When it ends

F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the Vuetify plugin line in playwright/index.ts with PrimeVue and updates tests/playwright-ct/components/sanity-vuetify.spec.ts to its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical swap, no architecture change).

Why not abstract

The instinct of "abstract the UI framework provider so we can swap without touching test code" is a deferred-cost trap here:

  1. We are NOT retaining Vuetify post-F3. The abstraction would itself need to be removed in F4 alongside the framework swap.
  2. The swap is mechanical (~2 hours). An abstraction layer would take longer to design well than the swap itself takes.
  3. Reviewers seeing "Vuetify in test infra in a PrimeVue migration sprint" should read this section + the JSDoc on mountWithProviders.ts for context.

The forbidden pattern: do not propose "let's make a UIFrameworkPlugin interface and dependency-inject the provider per test" during F2/F3. That's exactly the abstraction this section forbids.


8. Host setup requirements

For Playwright tests to run, the host must have:

  • Node v22+ with pnpm 10+ (matching apps/app/'s expectations)
  • Chromium installed via pnpm exec playwright install chromium (downloads to ~/Library/Caches/ms-playwright on macOS)
  • Git LFS installed (brew install git-lfs on macOS) and active (git lfs install --skip-repo to avoid hook conflict with lefthook; the LFS pre-push step is delegated through lefthook.yml)
  • MySQL 8 running locally via make services for e2e tests, with the crewli_test database created via make test-db-create
  • PHP 8.2+ + composer for the Laravel test server in e2e tests
  • api/.env present with valid APP_KEY (e2e globalSetup inherits this; only DB_DATABASE is overridden to crewli_test on the command line)

Known risks

  • unpkg.com dependency — the prototype HTML loads React + Babel from unpkg.com via <script src="https://unpkg.com/...">. Local network outage or unpkg CDN issues will flake B3 baselines. Mitigation if it bites: vendor react.umd.js + babel.min.js into the prototype directory. Defer until it actually breaks.
  • Test DB shared with PHPUnitcrewli_test is used by both the PHPUnit suite (transaction-rollback per test) and the e2e fixture (migrate:fresh + seed once). Running them concurrently would collide. Lifecycle assumes serial execution, which is the realistic local-dev flow.

9. Deferred to BACKLOG

  • TEST-INFRA-002 — CI runner selection (Gitea Actions vs. GitHub Actions decision), runner image with PHP+MySQL+Node+pnpm, caching strategy, screenshot-diff artifact upload, label-gated nightly e2e.
  • F4 isolated component-level visual baselines (replacing the composite baselines in B3 with per-state baselines using stable data-test-id attributes).
  • Multi-context concurrent-edit e2e patterns — see TEST-INFRA-002 in BACKLOG.md
  • Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only for v1 per RFC §A.5.
  • Mobile viewport baselines — desktop 1440×900 only for v1.
  • Soketi / WebSocket testing infrastructure when ART-15 lands.

Mount-helper type convention (Plan 3 codified)

Plan 3 hit this in 6+ tasks: a plan-doc test spec typed the mount helper's props parameter as Record<string, unknown>, which vue-tsc strict mode rejects when the object is passed to mount(Component, { props }) — the component's generated prop type is narrower than Record<string, unknown>, so the assignment is a type error (and widening it with any would violate the project zero-any rule).

Convention: type the helper parameter as Partial<<Component>Props>, never Record<string, unknown>:

const mountX = (props: Partial<XProps> = {}) =>
  mount(X, { props })

Rationale: satisfies vue-tsc strict; behaviour-neutral; introduces no any. Plan 3 used the equivalent explicit inline props shape per task (the behaviour-neutral sanctioned deviation from the verbatim plan-doc); standardise on Partial<<Component>Props> from Plan 4 onward so the template-layer tests (List / Form / Detail / Dashboard / StateBlock) share one idiom rather than re-deriving the shape each time.