crewli/dev-docs/ARCH-TESTING.md

# Crewli — Test Architecture

> Authoritative reference for test-tier choices in the SPA. Read this
> before adding a new test. Linked from `CLAUDE.md`.

This document describes:

1. The test pyramid Crewli uses, and what each tier is for
2. When to use which tier (decision tree)
3. Mock-vs-real-backend rules
4. Visual baseline workflow
5. CI integration status
6. Conventions and anti-patterns
7. Vuetify-during-PrimeVue-migration: the temporary state in test infra
8. Host setup requirements
9. Deferred work (BACKLOG references)

---

## 1. Test pyramid and scope per layer

Crewli runs five test tiers in the SPA. Each has a narrow purpose;
overlap is wasted work, gaps are silent risk. Pick the tier whose
purpose matches what you're actually verifying.

### Tier 1 — Unit (Vitest + happy-dom)

**Run via:** `pnpm test` (filtered by `tests/unit/**`)
**Environment:** Node + happy-dom, single module graph
**Cost:** ~20 ms per test
**For:** Pure logic, schema parsing, store reducers, isolated composable
behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.

### Tier 2 — Component (Playwright Component Testing)

**Run via:** `pnpm test:component`
**Environment:** Real Chromium via `@playwright/experimental-ct-vue`
**Cost:** ~300 ms per test (incl. Chromium reuse)
**For:** Single-component verification. DOM rendering, click/keyboard,
prop propagation, slot rendering, CSS resolution. Mocks API at axios
layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is
wired in `apps/app/playwright/index.ts`'s `beforeMount` hook.

### Tier 3 — Integration (Playwright CT, multi-component)

**Run via:** Same `pnpm test:component` runner; placement convention
distinguishes integration from single-component.
**Cost:** ~500 ms per test
**For:** Page-level mounting with mocked API responses. Tests
cross-component coordination (drag from Wachtrij → canvas, popover
→ mutation flow). Same provider stack as Tier 2.

### Tier 4 — Visual regression (Playwright CT, `@visual` tag)

**Run via:** `pnpm test:visual` (verify), `pnpm test:visual:update`
(regenerate baselines)
**Environment:** Real Chromium driving the canonical prototype HTML
served by a tiny static-server fixture (`tests/playwright-ct/visual/
static-server.mjs`).
**Cost:** ~1.2 s per test
**For:** Pixel baselines against canonical visual sources. The
prototype HTML at `resources/Crewli - Artist  Timetable Management/
crewli-timetable.html` is the source of truth for Artist Management
surfaces. F4 (component migration) extends visual coverage to live
SPA components against the prototype.

### Tier 5 — E2E (Playwright)

**Run via:** `pnpm test:e2e`
**Environment:** Real Laravel test server (`php artisan serve --port=
8001`, DB `crewli_test`) + real Chromium browser context.
**Cost:** ~5 s for the suite (includes migrate:fresh + seed)
**For:** Contract verification end-to-end. Real network, real auth,
real DB transactions. Currently only the 409-conflict optimistic-
locking contract test (TEST-CONTRACT-001). Add tests sparingly — this
is the most expensive tier.

---

## 2. When to use what — decision tree

*First match wins — stop at the first YES.*

```
Is the thing under test pure logic with no DOM?
  └─ YES → Unit (Vitest + happy-dom)

Is it a single component? (props, events, slots, CSS, keyboard)
  └─ YES → Component (Playwright CT)

Is it cross-component coordination, but no real backend?
  └─ YES → Integration (Playwright CT)

Is it a contract between SPA and backend (request/response shape)?
  └─ YES → E2E (Playwright + Laravel)

Is it visual fidelity to a canonical baseline?
  └─ YES → Visual (Playwright CT, @visual tag)
```

**Don't pick by speed.** Pick by what you're verifying. A unit test
that mocks the backend cannot catch a contract-drift bug; an e2e test
for pure logic is wasted CI time.

---

## 3. Mock-vs-real-backend choice rules

### Mock when

- The test verifies SPA behaviour given a known response shape
- Backend availability would slow the test below the relevant tier's
  cost budget
- The path under test is independent of transactional / auth
  semantics

### Real backend when

- The test verifies the contract between frontend and backend (Zod
  schema vs. PHP Resource shape)
- Authentication or authorisation flows are involved
- Optimistic-locking, idempotency, or other multi-request semantics
  matter

**Anti-pattern: matching mocks to schemas.** Don't mock with the same
shape your Zod schema validates — that creates self-confirming bias
where both sides agree but neither matches reality. This is the
exact failure mode TEST-CONTRACT-001 was created to catch (timetable-
stabilization B5).

---

## 4. Visual baseline workflow

### Capturing baselines

```bash
pnpm test:visual:update
```

Diffs are reviewed in PRs. Baselines live at:
```
apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png
```
Tracked via Git LFS (see `.gitattributes`). Pixel tolerance:
`maxDiffPixelRatio: 0.001` (0.1%) per `playwright-ct.config.ts`.

### Updating baselines (intentional UX change)

1. Make the UX change (component edit, token edit, …)
2. Run `pnpm test:visual:update` locally
3. Review the diff PNG manually — does the new baseline match the
   intended UX?
4. Commit baseline + UX change in the **same PR**. Reviewer can
   compare baseline change against the UX intent.
5. Never update baselines to "make tests pass" without a UX-justified
   reason in the PR description.

### Updating baselines (unintentional diff in CI)

1. Determine if the diff is environmental (font hinting, OS rendering,
   timezone-based date formatting) or a real regression.
2. Environmental → consider tightening determinism (lock fonts, fake
   timers, fixed locale) before tweaking tolerance.
3. Real regression → fix the regression, not the baseline.

### Composite-over-isolated strategy (B3 baselines)

Some surfaces enumerated in RFC §A.3's baseline list are captured as
composite views rather than individual block-state baselines. Reason:
the prototype's DOM exposes status only via inline `style.background`,
no `data-*` attributes. Isolated locators (e.g. by artist name) lock
the test to specific seed data and silently rot if data changes.

The current 5 baselines cover the visual vocabulary:

| File                          | Captures                                                |
| ----------------------------- | ------------------------------------------------------- |
| `canvas-friday.png`           | Status colors, b2b indicators, multi-lane stacking      |
| `canvas-saturday.png`         | Conflict ring, capacity warning                         |
| `stage-row-multilane.png`     | First row in isolation                                  |
| `wachtrij-populated.png`      | Sidebar list rendering, status badges, counts           |
| `popover.png`                 | Block-click popover layout                              |

9 additional surfaces are documented as `test.skip()` in
`tests/playwright-ct/visual/prototype.spec.ts` with the gap reason.
F4 component migration adds isolated baselines using stable
`data-test-id` attributes on Vue components.

---

## 5. CI integration

**Status: deferred.** The repo currently has no CI runner configured.
Local development workflow:

- Vitest (`pnpm test`) — tier 1, runs on demand
- Playwright Component (`pnpm test:component`) — tiers 2–4, runs on
  demand
- Playwright E2E (`pnpm test:e2e`) — tier 5, runs on demand against a
  developer-managed Laravel test server

CI design (Gitea Actions vs. GitHub Actions decision, Linux runner
image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload,
label-gated nightly e2e) is captured as `TEST-INFRA-002` in
`dev-docs/BACKLOG.md`.

When CI lands:

- Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
- PR-CI: Vitest unit + Playwright component + visual. Slower but full
  coverage.
- Nightly / label-gated: Playwright e2e against real Laravel + MySQL.
  Most expensive tier.

---

## 6. Conventions

- **Test file naming:** `*.spec.ts` for Playwright (CT + e2e),
  `*.test.ts` for Vitest. The runner config glob keeps them apart.
- **`@visual` tag:** required on all visual-regression tests so
  `--grep @visual` filters them.
- **Provider stack for CT:** wired in `apps/app/playwright/index.ts`'s
  `beforeMount` hook, not at mount call time. Tests forward
  per-test overrides via `hooksConfig` (see
  `tests/playwright-ct/utils/mountWithProviders.ts`).
- **E2E test isolation:** `globalSetup` runs `migrate:fresh + seed`
  once per `pnpm test:e2e` invocation. Tests within one run share DB
  state. Re-run = fresh DB.
- **Pixel tolerance:** `maxDiffPixelRatio: 0.001` default
  (`playwright-ct.config.ts`). Per-test exceptions allowed if
  documented inline.
- **Auth in e2e tests:** Bearer-via-cookie (`api/.../SetAuthCookie.php`).
  POST `/api/v1/auth/login` returns `crewli_app_token` httpOnly cookie.
  No CSRF dance, no Sanctum stateful flow. baseURL must be
  `localhost:8001` (matching the cookie's `domain=localhost`),
  **not** `127.0.0.1:8001`.

### Anti-patterns to avoid

1. **Mocking the same data shape that the schema validates** —
   creates self-confirming bias. Use real backend for contract tests
   (TEST-CONTRACT-001 catches this class of bug).
2. **Updating baselines silently** without diff review or a UX-
   justified PR description.
3. **Adding Playwright tests for pure logic** that Vitest can cover
   in 20 ms. Reserve Playwright for tests that need the browser.
4. **Treating "small" UX changes as not needing visual updates** —
   there is no small visual change in an enterprise product; the
   user notices.
5. **Brittle locators** by data values (artist names, stage names)
   instead of stable test IDs. F4 will add `data-test-id` to Vue
   components for this reason.

---

## 7. Vuetify in test infrastructure during the PrimeVue migration

`apps/app/playwright/index.ts`'s `beforeMount` hook registers Vuetify
as a Vue plugin. This is **intentional temporary state**.

### Why

The current SPA still ships Vuetify. Component-level Playwright CT
tests must mount components against the same UI framework the live
app uses, otherwise they would test a non-existent surface. Stripping
Vuetify from test infra now would make CT tests un-runnable until
F3 lands PrimeVue.

### When it ends

F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the
Vuetify plugin line in `playwright/index.ts` with PrimeVue and
updates `tests/playwright-ct/components/sanity-vuetify.spec.ts` to
its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical
swap, no architecture change).

### Why not abstract

The instinct of "abstract the UI framework provider so we can swap
without touching test code" is a **deferred-cost trap** here:

1. We are NOT retaining Vuetify post-F3. The abstraction would itself
   need to be removed in F4 alongside the framework swap.
2. The swap is mechanical (~2 hours). An abstraction layer would take
   longer to design well than the swap itself takes.
3. Reviewers seeing "Vuetify in test infra in a PrimeVue migration
   sprint" should read this section + the JSDoc on
   `mountWithProviders.ts` for context.

The forbidden pattern: do not propose "let's make a `UIFrameworkPlugin`
interface and dependency-inject the provider per test" during F2/F3.
That's exactly the abstraction this section forbids.

---

## 8. Host setup requirements

For Playwright tests to run, the host must have:

- **Node v22+** with **pnpm 10+** (matching `apps/app/`'s expectations)
- **Chromium** installed via `pnpm exec playwright install chromium`
  (downloads to `~/Library/Caches/ms-playwright` on macOS)
- **Git LFS** installed (`brew install git-lfs` on macOS) and active
  (`git lfs install --skip-repo` to avoid hook conflict with lefthook;
  the LFS pre-push step is delegated through `lefthook.yml`)
- **MySQL 8** running locally via `make services` for e2e tests, with
  the `crewli_test` database created via `make test-db-create`
- **PHP 8.2+ + composer** for the Laravel test server in e2e tests
- **`api/.env`** present with valid `APP_KEY` (e2e `globalSetup`
  inherits this; only `DB_DATABASE` is overridden to `crewli_test` on
  the command line)

### Known risks

- **`unpkg.com` dependency** — the prototype HTML loads React + Babel
  from unpkg.com via `<script src="https://unpkg.com/...">`. Local
  network outage or unpkg CDN issues will flake B3 baselines. Mitigation
  if it bites: vendor `react.umd.js` + `babel.min.js` into the
  prototype directory. Defer until it actually breaks.
- **Test DB shared with PHPUnit** — `crewli_test` is used by both the
  PHPUnit suite (transaction-rollback per test) and the e2e fixture
  (migrate:fresh + seed once). Running them concurrently would
  collide. Lifecycle assumes serial execution, which is the realistic
  local-dev flow.

---

## 9. Deferred to BACKLOG

- **TEST-INFRA-002** — CI runner selection (Gitea Actions vs. GitHub
  Actions decision), runner image with PHP+MySQL+Node+pnpm, caching
  strategy, screenshot-diff artifact upload, label-gated nightly e2e.
- F4 isolated component-level visual baselines (replacing the
  composite baselines in B3 with per-state baselines using stable
  `data-test-id` attributes).
- Multi-context concurrent-edit e2e patterns — see TEST-INFRA-002 in BACKLOG.md
- Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only
  for v1 per RFC §A.5.
- Mobile viewport baselines — desktop 1440×900 only for v1.
- Soketi / WebSocket testing infrastructure when ART-15 lands.

---

## Mount-helper type convention (Plan 3 codified)

Plan 3 hit this in 6+ tasks: a plan-doc test spec typed the `mount`
helper's props parameter as `Record<string, unknown>`, which `vue-tsc`
strict mode rejects when the object is passed to
`mount(Component, { props })` — the component's generated prop type is
narrower than `Record<string, unknown>`, so the assignment is a type
error (and widening it with `any` would violate the project zero-`any`
rule).

**Convention:** type the helper parameter as `Partial<<Component>Props>`,
never `Record<string, unknown>`:

```ts
const mountX = (props: Partial<XProps> = {}) =>
  mount(X, { props })
```

Rationale: satisfies `vue-tsc` strict; behaviour-neutral; introduces no
`any`. Plan 3 used the equivalent explicit inline props shape per task
(the behaviour-neutral sanctioned deviation from the verbatim plan-doc);
**standardise on `Partial<<Component>Props>` from Plan 4 onward** so the
template-layer tests (List / Form / Detail / Dashboard / StateBlock)
share one idiom rather than re-deriving the shape each time.