docs(testing): add ARCH-TESTING.md — test pyramid, scope per tier, anti-patterns
B5 of TEST-INFRA-001 (RFC-WS-FRONTEND-PRIMEVUE Amendment A-1).
- Add dev-docs/ARCH-TESTING.md (~13 KB):
§1 Five-tier pyramid (Unit / Component / Integration / Visual /
E2E) with environment, cost, and purpose per tier
§2 Decision tree — pick by what is being verified, not by speed
§3 Mock-vs-real-backend rules + the self-confirming-bias anti-
pattern that motivated TEST-CONTRACT-001
§4 Visual baseline workflow including the composite-over-isolated
strategy used in B3
§5 CI strategy stub — deferred to TEST-INFRA-002
§6 Conventions + 5 anti-patterns
§7 Vuetify-during-PrimeVue-migration: explicit doc that the
Vuetify plugin in playwright/index.ts is INTENTIONAL TEMPORARY
STATE replaced in F3 by PrimeVue. Forbids the "abstract the UI
framework provider" deferred-cost trap.
§8 Host setup — Node, pnpm, Chromium, Git LFS, MySQL 8, PHP, .env;
known risks (unpkg.com flakiness, shared crewli_test DB)
§9 Deferred work cross-references to BACKLOG entries
- Update CLAUDE.md ### Testing section to reference ARCH-TESTING.md
- Add ARCH-TESTING.md to .claude-sync.conf so the dev-docs sync
pipeline picks it up; sync script run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
341
dev-docs/ARCH-TESTING.md
Normal file
341
dev-docs/ARCH-TESTING.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# Crewli — Test Architecture
|
||||
|
||||
> Authoritative reference for test-tier choices in the SPA. Read this
|
||||
> before adding a new test. Linked from `CLAUDE.md`.
|
||||
|
||||
This document describes:
|
||||
|
||||
1. The test pyramid Crewli uses, and what each tier is for
|
||||
2. When to use which tier (decision tree)
|
||||
3. Mock-vs-real-backend rules
|
||||
4. Visual baseline workflow
|
||||
5. CI integration status
|
||||
6. Conventions and anti-patterns
|
||||
7. Vuetify-during-PrimeVue-migration: the temporary state in test infra
|
||||
8. Host setup requirements
|
||||
9. Deferred work (BACKLOG references)
|
||||
|
||||
---
|
||||
|
||||
## 1. Test pyramid and scope per layer
|
||||
|
||||
Crewli runs five test tiers in the SPA. Each has a narrow purpose;
|
||||
overlap is wasted work, gaps are silent risk. Pick the tier whose
|
||||
purpose matches what you're actually verifying.
|
||||
|
||||
### Tier 1 — Unit (Vitest + happy-dom)
|
||||
|
||||
**Run via:** `pnpm test` (filtered by `tests/unit/**`)
|
||||
**Environment:** Node + happy-dom, single module graph
|
||||
**Cost:** ~20 ms per test
|
||||
**For:** Pure logic, schema parsing, store reducers, isolated composable
|
||||
behaviour. No DOM. Fastest tier; safe for pre-commit if we ever add it.
|
||||
|
||||
### Tier 2 — Component (Playwright Component Testing)
|
||||
|
||||
**Run via:** `pnpm test:component`
|
||||
**Environment:** Real Chromium via `@playwright/experimental-ct-vue`
|
||||
**Cost:** ~300 ms per test (incl. Chromium reuse)
|
||||
**For:** Single-component verification. DOM rendering, click/keyboard,
|
||||
prop propagation, slot rendering, CSS resolution. Mocks API at axios
|
||||
layer. Provider stack (Vuetify [TEMP], Pinia, TanStack Query, Router) is
|
||||
wired in `apps/app/playwright/index.ts`'s `beforeMount` hook.
|
||||
|
||||
### Tier 3 — Integration (Playwright CT, multi-component)
|
||||
|
||||
**Run via:** Same `pnpm test:component` runner; placement convention
|
||||
distinguishes integration from single-component.
|
||||
**Cost:** ~500 ms per test
|
||||
**For:** Page-level mounting with mocked API responses. Tests
|
||||
cross-component coordination (drag from Wachtrij → canvas, popover
|
||||
→ mutation flow). Same provider stack as Tier 2.
|
||||
|
||||
### Tier 4 — Visual regression (Playwright CT, `@visual` tag)
|
||||
|
||||
**Run via:** `pnpm test:visual` (verify), `pnpm test:visual:update`
|
||||
(regenerate baselines)
|
||||
**Environment:** Real Chromium driving the canonical prototype HTML
|
||||
served by a tiny static-server fixture (`tests/playwright-ct/visual/
|
||||
static-server.mjs`).
|
||||
**Cost:** ~1.2 s per test
|
||||
**For:** Pixel baselines against canonical visual sources. The
|
||||
prototype HTML at `resources/Crewli - Artist Timetable Management/
|
||||
crewli-timetable.html` is the source of truth for Artist Management
|
||||
surfaces. F4 (component migration) extends visual coverage to live
|
||||
SPA components against the prototype.
|
||||
|
||||
### Tier 5 — E2E (Playwright)
|
||||
|
||||
**Run via:** `pnpm test:e2e`
|
||||
**Environment:** Real Laravel test server (`php artisan serve --port=
|
||||
8001`, DB `crewli_test`) + real Chromium browser context.
|
||||
**Cost:** ~5 s for the suite (includes migrate:fresh + seed)
|
||||
**For:** Contract verification end-to-end. Real network, real auth,
|
||||
real DB transactions. Currently only the 409-conflict optimistic-
|
||||
locking contract test (TEST-CONTRACT-001). Add tests sparingly — this
|
||||
is the most expensive tier.
|
||||
|
||||
---
|
||||
|
||||
## 2. When to use what — decision tree
|
||||
|
||||
```
|
||||
Is the thing under test pure logic with no DOM?
|
||||
└─ YES → Unit (Vitest + happy-dom)
|
||||
|
||||
Is it a single component? (props, events, slots, CSS, keyboard)
|
||||
└─ YES → Component (Playwright CT)
|
||||
|
||||
Is it cross-component coordination, but no real backend?
|
||||
└─ YES → Integration (Playwright CT)
|
||||
|
||||
Is it a contract between SPA and backend (request/response shape)?
|
||||
└─ YES → E2E (Playwright + Laravel)
|
||||
|
||||
Is it visual fidelity to a canonical baseline?
|
||||
└─ YES → Visual (Playwright CT, @visual tag)
|
||||
```
|
||||
|
||||
**Don't pick by speed.** Pick by what you're verifying. A unit test
|
||||
that mocks the backend cannot catch a contract-drift bug; an e2e test
|
||||
for pure logic is wasted CI time.
|
||||
|
||||
---
|
||||
|
||||
## 3. Mock-vs-real-backend choice rules
|
||||
|
||||
### Mock when
|
||||
|
||||
- The test verifies SPA behaviour given a known response shape
|
||||
- Backend availability would slow the test below the relevant tier's
|
||||
cost budget
|
||||
- The path under test is independent of transactional / auth
|
||||
semantics
|
||||
|
||||
### Real backend when
|
||||
|
||||
- The test verifies the contract between frontend and backend (Zod
|
||||
schema vs. PHP Resource shape)
|
||||
- Authentication or authorisation flows are involved
|
||||
- Optimistic-locking, idempotency, or other multi-request semantics
|
||||
matter
|
||||
|
||||
**Anti-pattern: matching mocks to schemas.** Don't mock with the same
|
||||
shape your Zod schema validates — that creates self-confirming bias
|
||||
where both sides agree but neither matches reality. This is the
|
||||
exact failure mode TEST-CONTRACT-001 was created to catch (timetable-
|
||||
stabilization B5).
|
||||
|
||||
---
|
||||
|
||||
## 4. Visual baseline workflow
|
||||
|
||||
### Capturing baselines
|
||||
|
||||
```bash
|
||||
pnpm test:visual:update
|
||||
```
|
||||
|
||||
Reviews PNG diffs in PRs. Baselines live at:
|
||||
```
|
||||
apps/app/tests/playwright-ct/__screenshots__/visual/<spec-path>/<name>.png
|
||||
```
|
||||
Tracked via Git LFS (see `.gitattributes`). Pixel tolerance:
|
||||
`maxDiffPixelRatio: 0.001` (0.1%) per `playwright-ct.config.ts`.
|
||||
|
||||
### Updating baselines (intentional UX change)
|
||||
|
||||
1. Make the UX change (component edit, token edit, …)
|
||||
2. Run `pnpm test:visual:update` locally
|
||||
3. Review the diff PNG manually — does the new baseline match the
|
||||
intended UX?
|
||||
4. Commit baseline + UX change in the **same PR**. Reviewer can
|
||||
compare baseline change against the UX intent.
|
||||
5. Never update baselines to "make tests pass" without a UX-justified
|
||||
reason in the PR description.
|
||||
|
||||
### Updating baselines (unintentional diff in CI)
|
||||
|
||||
1. Determine if the diff is environmental (font hinting, OS rendering,
|
||||
timezone-based date formatting) or a real regression.
|
||||
2. Environmental → consider tightening determinism (lock fonts, fake
|
||||
timers, fixed locale) before tweaking tolerance.
|
||||
3. Real regression → fix the regression, not the baseline.
|
||||
|
||||
### Composite-over-isolated strategy (B3 baselines)
|
||||
|
||||
Some surfaces enumerated in RFC §A.3's baseline list are captured as
|
||||
composite views rather than individual block-state baselines. Reason:
|
||||
the prototype's DOM exposes status only via inline `style.background`,
|
||||
no `data-*` attributes. Isolated locators (e.g. by artist name) lock
|
||||
the test to specific seed data and silently rot if data changes.
|
||||
|
||||
The current 5 baselines cover the visual vocabulary:
|
||||
|
||||
| File | Captures |
|
||||
| ----------------------------- | ------------------------------------------------------- |
|
||||
| `canvas-friday.png` | Status colors, b2b indicators, multi-lane stacking |
|
||||
| `canvas-saturday.png` | Conflict ring, capacity warning |
|
||||
| `stage-row-multilane.png` | First row in isolation |
|
||||
| `wachtrij-populated.png` | Sidebar list rendering, status badges, counts |
|
||||
| `popover.png` | Block-click popover layout |
|
||||
|
||||
9 additional surfaces are documented as `test.skip()` in
|
||||
`tests/playwright-ct/visual/prototype.spec.ts` with the gap reason.
|
||||
F4 component migration adds isolated baselines using stable
|
||||
`data-test-id` attributes on Vue components.
|
||||
|
||||
---
|
||||
|
||||
## 5. CI integration
|
||||
|
||||
**Status: deferred.** The repo currently has no CI runner configured.
|
||||
Local development workflow:
|
||||
|
||||
- Vitest (`pnpm test`) — tier 1, runs on demand
|
||||
- Playwright Component (`pnpm test:component`) — tiers 2–4, runs on
|
||||
demand
|
||||
- Playwright E2E (`pnpm test:e2e`) — tier 5, runs on demand against a
|
||||
developer-managed Laravel test server
|
||||
|
||||
CI design (Gitea Actions vs. GitHub Actions decision, Linux runner
|
||||
image with PHP+MySQL+Node+pnpm, screenshot-diff artifact upload,
|
||||
label-gated nightly e2e) is captured as `TEST-INFRA-002` in
|
||||
`dev-docs/BACKLOG.md`.
|
||||
|
||||
When CI lands:
|
||||
|
||||
- Pre-commit (lefthook): Vitest unit only. Fast, no Playwright launch.
|
||||
- PR-CI: Vitest unit + Playwright component + visual. Slower but full
|
||||
coverage.
|
||||
- Nightly / label-gated: Playwright e2e against real Laravel + MySQL.
|
||||
Most expensive tier.
|
||||
|
||||
---
|
||||
|
||||
## 6. Conventions
|
||||
|
||||
- **Test file naming:** `*.spec.ts` for Playwright (CT + e2e),
|
||||
`*.test.ts` for Vitest. The runner config glob keeps them apart.
|
||||
- **`@visual` tag:** required on all visual-regression tests so
|
||||
`--grep @visual` filters them.
|
||||
- **Provider stack for CT:** wired in `apps/app/playwright/index.ts`'s
|
||||
`beforeMount` hook, not at mount call time. Tests forward
|
||||
per-test overrides via `hooksConfig` (see
|
||||
`tests/playwright-ct/utils/mountWithProviders.ts`).
|
||||
- **E2E test isolation:** `globalSetup` runs `migrate:fresh + seed`
|
||||
once per `pnpm test:e2e` invocation. Tests within one run share DB
|
||||
state. Re-run = fresh DB.
|
||||
- **Pixel tolerance:** `maxDiffPixelRatio: 0.001` default
|
||||
(`playwright-ct.config.ts`). Per-test exceptions allowed if
|
||||
documented inline.
|
||||
- **Auth in e2e tests:** Bearer-via-cookie (`api/.../SetAuthCookie.php`).
|
||||
POST `/api/v1/auth/login` returns `crewli_app_token` httpOnly cookie.
|
||||
No CSRF dance, no Sanctum stateful flow. baseURL must be
|
||||
`localhost:8001` (matching the cookie's `domain=localhost`),
|
||||
**not** `127.0.0.1:8001`.
|
||||
|
||||
### Anti-patterns to avoid
|
||||
|
||||
1. **Mocking the same data shape that the schema validates** —
|
||||
creates self-confirming bias. Use real backend for contract tests
|
||||
(TEST-CONTRACT-001 catches this class of bug).
|
||||
2. **Updating baselines silently** without diff review or a UX-
|
||||
justified PR description.
|
||||
3. **Adding Playwright tests for pure logic** that Vitest can cover
|
||||
in 20 ms. Reserve Playwright for tests that need the browser.
|
||||
4. **Treating "small" UX changes as not needing visual updates** —
|
||||
there is no small visual change in an enterprise product; the
|
||||
user notices.
|
||||
5. **Brittle locators** by data values (artist names, stage names)
|
||||
instead of stable test IDs. F4 will add `data-test-id` to Vue
|
||||
components for this reason.
|
||||
|
||||
---
|
||||
|
||||
## 7. Vuetify in test infrastructure during the PrimeVue migration
|
||||
|
||||
`apps/app/playwright/index.ts`'s `beforeMount` hook registers Vuetify
|
||||
as a Vue plugin. This is **intentional temporary state**.
|
||||
|
||||
### Why
|
||||
|
||||
The current SPA still ships Vuetify. Component-level Playwright CT
|
||||
tests must mount components against the same UI framework the live
|
||||
app uses, otherwise they would test a non-existent surface. Stripping
|
||||
Vuetify from test infra now would make CT tests un-runnable until
|
||||
F3 lands PrimeVue.
|
||||
|
||||
### When it ends
|
||||
|
||||
F3 (PrimeVue foundation, RFC-WS-FRONTEND-PRIMEVUE §6) replaces the
|
||||
Vuetify plugin line in `playwright/index.ts` with PrimeVue and
|
||||
updates `tests/playwright-ct/components/sanity-vuetify.spec.ts` to
|
||||
its PrimeVue equivalent. Estimated effort: ~2 hours (mechanical
|
||||
swap, no architecture change).
|
||||
|
||||
### Why not abstract
|
||||
|
||||
The instinct of "abstract the UI framework provider so we can swap
|
||||
without touching test code" is a **deferred-cost trap** here:
|
||||
|
||||
1. We are NOT retaining Vuetify post-F3. The abstraction would itself
|
||||
need to be removed in F4 alongside the framework swap.
|
||||
2. The swap is mechanical (~2 hours). An abstraction layer would take
|
||||
longer to design well than the swap itself takes.
|
||||
3. Reviewers seeing "Vuetify in test infra in a PrimeVue migration
|
||||
sprint" should read this section + the JSDoc on
|
||||
`mountWithProviders.ts` for context.
|
||||
|
||||
The forbidden pattern: do not propose "let's make a `UIFrameworkPlugin`
|
||||
interface and dependency-inject the provider per test" during F2/F3.
|
||||
That's exactly the abstraction this section forbids.
|
||||
|
||||
---
|
||||
|
||||
## 8. Host setup requirements
|
||||
|
||||
For Playwright tests to run, the host must have:
|
||||
|
||||
- **Node v22+** with **pnpm 10+** (matching `apps/app/`'s expectations)
|
||||
- **Chromium** installed via `pnpm exec playwright install chromium`
|
||||
(downloads to `~/Library/Caches/ms-playwright` on macOS)
|
||||
- **Git LFS** installed (`brew install git-lfs` on macOS) and active
|
||||
(`git lfs install --skip-repo` to avoid hook conflict with lefthook;
|
||||
the LFS pre-push step is delegated through `lefthook.yml`)
|
||||
- **MySQL 8** running locally via `make services` for e2e tests, with
|
||||
the `crewli_test` database created via `make test-db-create`
|
||||
- **PHP 8.2+ + composer** for the Laravel test server in e2e tests
|
||||
- **`api/.env`** present with valid `APP_KEY` (e2e `globalSetup`
|
||||
inherits this; only `DB_DATABASE` is overridden to `crewli_test` on
|
||||
the command line)
|
||||
|
||||
### Known risks
|
||||
|
||||
- **`unpkg.com` dependency** — the prototype HTML loads React + Babel
|
||||
from unpkg.com via `<script src="https://unpkg.com/...">`. Local
|
||||
network outage or unpkg CDN issues will flake B3 baselines. Mitigation
|
||||
if it bites: vendor `react.umd.js` + `babel.min.js` into the
|
||||
prototype directory. Defer until it actually breaks.
|
||||
- **Test DB shared with PHPUnit** — `crewli_test` is used by both the
|
||||
PHPUnit suite (transaction-rollback per test) and the e2e fixture
|
||||
(migrate:fresh + seed once). Running them concurrently would
|
||||
collide. Lifecycle assumes serial execution, which is the realistic
|
||||
local-dev flow.
|
||||
|
||||
---
|
||||
|
||||
## 9. Deferred to BACKLOG
|
||||
|
||||
- **TEST-INFRA-002** — CI runner selection (Gitea Actions vs. GitHub
|
||||
Actions decision), runner image with PHP+MySQL+Node+pnpm, caching
|
||||
strategy, screenshot-diff artifact upload, label-gated nightly e2e.
|
||||
- F4 isolated component-level visual baselines (replacing the
|
||||
composite baselines in B3 with per-state baselines using stable
|
||||
`data-test-id` attributes).
|
||||
- F4 multi-context concurrent-edit e2e tests (currently the 409
|
||||
contract test uses single-context replay).
|
||||
- Multi-browser (Firefox, WebKit) baselines — Linux+Chromium only
|
||||
for v1 per RFC §A.5.
|
||||
- Mobile viewport baselines — desktop 1440×900 only for v1.
|
||||
- Soketi / WebSocket testing infrastructure when ART-15 lands.
|
||||
Reference in New Issue
Block a user