diff --git a/.claude-sync.conf b/.claude-sync.conf index ea6629e3..1c895c79 100644 --- a/.claude-sync.conf +++ b/.claude-sync.conf @@ -25,6 +25,7 @@ dev-docs/RFC-WS-FRONTEND-PRIMEVUE.md dev-docs/MIGRATION-AUDIT-PRIMEVUE.md dev-docs/PRIMEVUE_COMPONENTS.md dev-docs/CLAUDE_CODE_TOOLING.md +dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md dev-docs/FRONTEND-TOOLING.md dev-docs/LARASTAN.md dev-docs/RECTOR.md diff --git a/dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md b/dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md new file mode 100644 index 00000000..4764bdc0 --- /dev/null +++ b/dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md @@ -0,0 +1,311 @@ +--- +title: "ADR — Lefthook + Git-LFS integration: own the hook layer, don't duplicate it" +status: Accepted +date: 2026-05-11 +owner: infra +supersedes: commit 1b06804 (the `piped: true` placebo fix) +--- + +# ADR — Lefthook + Git-LFS integration + +## Status + +**Accepted, 2026-05-11.** + +## Context + +Crewli's `pre-push` git hook is managed by **lefthook v2** (introduced +in WS-3, April 2026, for `.claude-sync/` post-commit propagation and +non-blocking sync-staleness warnings on push). **Git LFS** was added +in TEST-INFRA-001 (May 2026, commit `b8d18e6`) to store visual +regression baseline PNGs without bloating the git pack. + +The interaction between the two tools is therefore **two days old**. +For the first weeks of the project, only one of the two was active at +a time — the bug below did not exist because the trigger was not yet +present. + +### Symptom: `git push` hangs in pre-push hook + +Two incidents are on record. Both manifest as `git push` blocking +indefinitely inside the `pre-push` hook with no path to recover other +than `--no-verify`: + +| Incident | Commit pushed | LFS objects in push | Symptom | +|---|---|---|---| +| 1 | `b8d18e6` (TEST-INFRA-001) | 5 PNGs | `Uploading LFS objects: 100% (5/5), 265 KB, done.` → hang | +| 2 | `99eedb6` (F2 docs) | 0 | Spinner stuck at `waiting: git-lfs` | + +The previous fix attempt — commit `1b06804`, which added +`piped: true` to the pre-push block — appeared to succeed on its own +push but failed on the next non-LFS push (incident 2). It was based +on a wrong understanding of `piped`'s semantics (see "Two misconceptions" +below). + +## Decision + +**Lefthook owns the `pre-push` hook for both sync-check AND LFS.** LFS +upload is delegated to lefthook's **built-in** pre-push LFS handler. +The `lefthook.yml` `pre-push` block contains **one** user-defined +command (`sync-check`) and nothing else. + +Concretely: + +```yaml +# lefthook.yml +pre-push: + commands: + sync-check: + run: bash .githooks/pre-push +``` + +No `git-lfs:` command. No `piped: true`. No `use_stdin: true`. No +`skip_lfs: true`. + +## Rationale + +### Lefthook already runs git-lfs internally for pre-push + +`docs/usage/features/git-lfs.md` (verbatim): + +> Lefthook runs LFS hooks internally for the following hooks: +> post-checkout, post-commit, post-merge, **pre-push**. +> Errors are suppressed if git LFS is not required for the project. + +Source confirmation in `internal/run/controller/lfs.go`: + +```go +err = c.cmd.RunWithContext( + ctx, + append([]string{"git", "lfs", hookName}, args...), + "", + c.cachedStdin, // ← buffered cached stdin with proper EOF + out, errOut, +) +``` + +The internal handler reads git's pre-push stdin **once**, caches it in +a `bytes.Buffer`, then invokes `git lfs pre-push ` with +that buffer as stdin — which delivers EOF cleanly when consumed. + +### The manual command was a literal duplicate, directly visible in logs + +Sandbox reproduction with the pre-fix `lefthook.yml` (filesystem +remote, `LEFTHOOK_VERBOSE=1`) shows both invocations in the same +push: + +``` +│ [git-lfs] executing hook: git lfs pre-push origin /tmp/... ← internal +│ [git-lfs] stdout: Uploading LFS objects: 100% (1/1), ..., done. +│ [lefthook] job: git lfs pre-push origin /tmp/... ← duplicate +│ [lefthook] run: git lfs pre-push origin /tmp/... +┃ git-lfs ❯ +``` + +After the fix (same sandbox, same data), the second invocation is +gone and the LFS upload still happens via the internal call: + +``` +│ [git-lfs] executing hook: git lfs pre-push origin /tmp/... +│ [git-lfs] stdout: Uploading LFS objects: 100% (1/1), ..., done. +│ [lefthook] run: bash .githooks/pre-push +┃ sync-check ❯ +``` + +### Sandbox falsifiability test: result and limitation + +A `/tmp/lefthook-lfs-test/` sandbox with a bare-repo file remote was +built per the Phase B spec. Both the pre-fix LFS-object push and the +pre-fix docs-only push **completed in ~0.3 seconds; the hang did not +reproduce**, despite the duplicate execution being clearly visible in +the logs. + +This means the duplicate execution is **necessary but possibly not +sufficient** for the production hang. A network remote (Gitea over +SSH/HTTPS) appears to be a required additional ingredient — the +sandbox's filesystem remote requires no LFS server handshake, so the +manual second invocation exits immediately with nothing to verify. + +Two mechanisms are plausible; the same fix addresses both: + +- **H1 (Phase A):** PTY-stdin without EOF in the manual command's + `while read` loop (`docs/configuration/use_stdin.md` verbatim: + *"without this option lefthook will hang: lefthook uses pseudo TTY + by default, and it doesn't close stdin when all data is read"*). +- **H4 (Phase B):** Server-side LFS interaction on the duplicate + call — locking, exhausted SSH session, or a stalled HTTPS response + on the verify path — that only manifests against a real Gitea LFS + endpoint. + +Because removal of the manual command is **zero-cost** (it is +redundant by construction; the internal call already does the work), +the fix is correct on architectural grounds regardless of which +mechanism is the proximate cause. + +## Two misconceptions to prevent regression + +### 1. The `--skip-repo` comment was technically correct but conceptually misleading + +The pre-fix `lefthook.yml` carried this comment on the manual command: + +```yaml +git-lfs: + # `git lfs install --skip-repo` only sets up global clean/smudge + # filters — it does NOT install the per-repo pre-push hook + # (which would conflict with lefthook). We delegate the LFS + # upload step here so screenshot baselines tracked via LFS get + # pushed alongside their commits. + run: git lfs pre-push {1} {2} +``` + +**The first two lines are accurate.** `git lfs install --skip-repo` +does set the global clean/smudge filters without writing +`.git/hooks/pre-push`. That part of the reasoning is fine and remains +the right setup for a lefthook-managed repo. + +**The third line — "we delegate the LFS upload step here" — is +wrong.** Lefthook delegates the LFS upload step **internally**, +without needing either the per-repo hook OR the manual command. +A future reviewer who reads only the first two lines may find them +compelling and revert to the manual command — that would re-introduce +the bug. + +This ADR is the authoritative correction: **no manual command. No +per-repo hook. Lefthook's built-in handles it.** + +### 2. `piped: true` does NOT control sequencing or stdin routing + +The pre-fix `lefthook.yml` justified `piped: true` as follows +(commit `1b06804`): + +> piped: true forces serial execution. Both sync-check and git-lfs +> read from stdin (git pipes the push refspec to pre-push hooks); +> default parallel execution deadlocks with two stdin readers. + +This is **wrong on two counts**: + +- Default lefthook execution is **already sequential**. + `docs/configuration/parallel.md` verbatim: *"Lefthook runs commands + and scripts sequentially by default."* `piped: true` does not + change ordering vs. the default. +- `piped: true` has **no documented relationship to stdin handling**. + `docs/configuration/piped.md` verbatim: *"Stop running commands and + scripts if one of them fail."* That is the entire semantic. + +The fix in `1b06804` was a placebo. It worked on its own push by +coincidence and failed on the next non-LFS push (incident 2). A +future reviewer should not assume `piped: true` solves any +stdin-related concern. + +## Consequences + +- **LFS uploads continue to happen on `git push`** — automatically, + via lefthook's internal pre-push LFS handler. No configuration + required. +- **If you ever need to disable lefthook's LFS handling** (e.g., + in a sub-repo or CI context that handles LFS separately), set + `skip_lfs: true` at the root of `lefthook.yml` per + `docs/usage/features/git-lfs.md`. Do **not** remove the internal + handler by adding a manual command — that re-introduces the + duplicate. +- **Adding new `pre-push` commands that read stdin** requires + `use_stdin: true` on **exactly one** command per + `docs/configuration/use_stdin.md`: + *"With many commands or scripts having `use_stdin: true`, only one + will receive the data. The others will have nothing."* + Lefthook's internal LFS handler is already that one consumer for + its cached stdin buffer, so any new stdin-reading command would + need to coordinate (most likely: rework so only one consumer + exists, or accept that the new command does not see the refspec). +- **The non-blocking pre-push sync-check** (`.githooks/pre-push`) + remains the only user-defined command. It does not read stdin, so + this constraint does not affect it. + +## Alternative scenarios (if Phase E regresses) + +If the real `git push` of `fix/lefthook-lfs-deadlock` hangs after +this fix is applied, the architectural change is correct but a +**different proximate cause** is in play. In that case: + +1. Do **not** revert this ADR. The duplicate execution is wrong + regardless. +2. Capture `LEFTHOOK_VERBOSE=1 GIT_TRACE=1 GIT_CURL_VERBOSE=1` output + from the hanging push. +3. Build a **network-y sandbox** (local Gitea container or an HTTP + stub server that responds to LFS verify-objects requests with + controlled timing) to reproduce against. The Phase B sandbox + (filesystem remote) is insufficient for this class of bug. +4. The investigation surface widens to git-lfs's interaction with + Gitea's LFS endpoint, SSH session lifetimes, and the lefthook + process tree around the `cachedStdin` buffer. Hypothesis H4 above + is the entry point. + +## How to verify + +### Sandbox procedure (Phase B reproduction; proves duplicate execution is gone) + +```bash +# Disposable sandbox +rm -rf /tmp/lefthook-lfs-test +mkdir -p /tmp/lefthook-lfs-test +git init --bare /tmp/lefthook-lfs-test/remote.git -q +git init -q /tmp/lefthook-lfs-test/work +cd /tmp/lefthook-lfs-test/work +git config user.email "t@t" && git config user.name "t" +git remote add origin /tmp/lefthook-lfs-test/remote.git + +# Mirror Crewli's config +cp /lefthook.yml . +mkdir .githooks +cp /.githooks/{pre-push,post-commit} .githooks/ +chmod +x .githooks/* + +# Install hooks; LFS filter-only +LEFTHOOK_BIN=/lefthook lefthook install +git lfs install --skip-repo +git lfs track "*.png" + +# Add an LFS-tracked file, commit, push +# (any small PNG works) +git add .gitattributes lefthook.yml .githooks .png +LEFTHOOK=0 git commit -q -m "init" +LEFTHOOK_VERBOSE=1 git push -u origin master +``` + +**Pass criteria:** +- Push completes within a few seconds (target: <5s on local FS). +- Logs show exactly **one** `[git-lfs] executing hook: git lfs pre-push` + line. +- Logs show **no** `[lefthook] run: git lfs pre-push` line (only + `[lefthook] run: bash .githooks/pre-push`). +- `Uploading LFS objects: 100% (...)` appears, proving the internal + handler did the work. + +### Production procedure (Phase E; proves the real hang is gone) + +```bash +# In the crewli repo, on any feature branch with at least one new commit: +git push -u origin # NO --no-verify +``` + +**Pass criteria:** +- Push completes within ~30 seconds (well under the 30s timeout + observed in the incidents). +- No spinner gets stuck at `waiting: git-lfs`. +- If the branch contains LFS-tracked changes: + `Uploading LFS objects: 100% (N/N), ...` appears and the push + completes. + +**If verification fails:** see "Alternative scenarios" above. Do not +revert this ADR; investigate via H4 with a network-y sandbox. + +## References + +- `docs/usage/features/git-lfs.md` (lefthook) — internal LFS hook list +- `docs/configuration/use_stdin.md` (lefthook) — PTY-stdin EOF behavior +- `docs/configuration/piped.md` (lefthook) — actual semantics of `piped` +- `docs/configuration/parallel.md` (lefthook) — default sequential execution +- `internal/run/controller/lfs.go` (lefthook) — `cachedStdin` source +- `internal/git/lfs.go` (lefthook) — pre-push in `lfsHooks` map +- `git-scm.com/docs/githooks` — pre-push stdin spec +- Crewli commits: `b8d18e6` (LFS introduced), `1b06804` (failed first-attempt fix), `99eedb6` (incident 2) diff --git a/lefthook.yml b/lefthook.yml index 4b9f2fc9..ca6c86a0 100644 --- a/lefthook.yml +++ b/lefthook.yml @@ -17,10 +17,11 @@ post-commit: run: bash .githooks/post-commit pre-push: - # piped: true forces serial execution. Both sync-check and git-lfs - # read from stdin (git pipes the push refspec to pre-push hooks); - # default parallel execution deadlocks with two stdin readers. - piped: true + # LFS is handled by lefthook's built-in pre-push LFS hook (see + # docs/usage/features/git-lfs.md). Do NOT add a manual `git lfs + # pre-push` command here — it duplicates the internal call and + # historically caused the pre-push to hang. Background and + # mechanism: dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md. commands: sync-check: run: bash .githooks/pre-push @@ -33,10 +34,3 @@ pre-push: # behaviour. (Pushing with zero new commits would be skipped # under lefthook but is a no-op for the sync-staleness warning # anyway, so behaviour stays effectively 1:1.) - git-lfs: - # `git lfs install --skip-repo` only sets up global clean/smudge - # filters — it does NOT install the per-repo pre-push hook - # (which would conflict with lefthook). We delegate the LFS - # upload step here so screenshot baselines tracked via LFS get - # pushed alongside their commits. - run: git lfs pre-push {1} {2} diff --git a/scripts/README.md b/scripts/README.md index 14454f92..853f83d3 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -38,3 +38,16 @@ bash scripts/sync-claude-docs.sh help Upload everything in `.claude-sync/` (including `SYNC_MANIFEST.md`) to the Crewli Claude Project Knowledge, replacing existing versions. The manifest is what Claude Chat uses for drift detection against the current HEAD. + +## Lefthook + Git-LFS pre-push smoke test + +```bash +bash scripts/test-lefthook-pre-push.sh +``` + +Builds a disposable sandbox, mirrors the repo's `lefthook.yml` + `.githooks/`, +and runs a push to verify (a) push completes, (b) exactly one internal LFS +invocation, (c) zero manual `git lfs pre-push` commands, (d) LFS upload +happens. Run this after touching `lefthook.yml` or the LFS configuration. +Background and what each failure mode signals: +`dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md`. diff --git a/scripts/test-lefthook-pre-push.sh b/scripts/test-lefthook-pre-push.sh new file mode 100755 index 00000000..6fed8fcf --- /dev/null +++ b/scripts/test-lefthook-pre-push.sh @@ -0,0 +1,145 @@ +#!/usr/bin/env bash +# Smoke test for the lefthook + git-lfs pre-push integration. +# +# Builds a disposable sandbox at SANDBOX_DIR, copies the repo's +# current lefthook.yml + .githooks/ into it, and runs a push that +# exercises both: lefthook's internal LFS handler and the sync-check +# user command. Passes when: +# +# 1. Push completes within the timeout. +# 2. Exactly one `[git-lfs] executing hook` line is present in the +# verbose log (proves no duplicate manual command). +# 3. `Uploading LFS objects: 100%` is present (proves the internal +# handler did the upload). +# +# Fails loudly otherwise. See dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md +# for what each failure mode signals. + +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +SANDBOX_DIR="${SANDBOX_DIR:-/tmp/lefthook-lfs-smoke-$$}" +TIMEOUT_SECS="${TIMEOUT_SECS:-30}" + +# Resolve the lefthook binary the repo already has installed. +LEFTHOOK_BIN="${LEFTHOOK_BIN:-}" +if [ -z "${LEFTHOOK_BIN}" ]; then + candidate=$(find "${REPO_ROOT}/node_modules" -type f -name lefthook \ + -path '*/lefthook-darwin-arm64/bin/*' 2>/dev/null | head -n 1) + if [ -z "${candidate}" ]; then + candidate=$(find "${REPO_ROOT}/node_modules" -type f -name lefthook \ + 2>/dev/null | head -n 1) + fi + LEFTHOOK_BIN="${candidate}" +fi + +if [ -z "${LEFTHOOK_BIN}" ] || [ ! -x "${LEFTHOOK_BIN}" ]; then + echo "FAIL: could not locate lefthook binary (set LEFTHOOK_BIN explicitly)" >&2 + exit 2 +fi + +if ! command -v git-lfs >/dev/null 2>&1; then + echo "FAIL: git-lfs not installed on this host" >&2 + exit 2 +fi + +cleanup() { rm -rf "${SANDBOX_DIR}"; } +trap cleanup EXIT + +echo "[smoke] sandbox: ${SANDBOX_DIR}" +echo "[smoke] lefthook: ${LEFTHOOK_BIN}" + +mkdir -p "${SANDBOX_DIR}" +git init --bare -q "${SANDBOX_DIR}/remote.git" +git init -q "${SANDBOX_DIR}/work" +cd "${SANDBOX_DIR}/work" +git config user.email "smoke@crewli.local" +git config user.name "smoke" +git remote add origin "${SANDBOX_DIR}/remote.git" + +# Mirror the repo's hook layer +cp "${REPO_ROOT}/lefthook.yml" . +mkdir .githooks +cp "${REPO_ROOT}/.githooks/pre-push" .githooks/ +cp "${REPO_ROOT}/.githooks/post-commit" .githooks/ +chmod +x .githooks/* + +LEFTHOOK_BIN="${LEFTHOOK_BIN}" "${LEFTHOOK_BIN}" install >/dev/null +git lfs install --skip-repo >/dev/null +git lfs track "*.png" >/dev/null + +# Tiny valid PNG so LFS has something to push +python3 - <<'PY' +import struct, zlib +hdr = b'\x89PNG\r\n\x1a\n' +def chunk(t, d): + return struct.pack('>I', len(d)) + t + d + struct.pack('>I', zlib.crc32(t + d)) +ihdr = chunk(b'IHDR', struct.pack('>IIBBBBB', 1, 1, 8, 2, 0, 0, 0)) +idat = chunk(b'IDAT', zlib.compress(b'\x00\xff\xff\xff', 9)) +iend = chunk(b'IEND', b'') +open('smoke.png', 'wb').write(hdr + ihdr + idat + iend) +PY + +git add .gitattributes lefthook.yml .githooks smoke.png +LEFTHOOK=0 git commit -q -m "smoke test" + +LOG_FILE="${SANDBOX_DIR}/push.log" + +# Background push; kill if it exceeds the timeout +( + cd "${SANDBOX_DIR}/work" + LEFTHOOK_BIN="${LEFTHOOK_BIN}" \ + LEFTHOOK_VERBOSE=1 \ + git push -u origin master >"${LOG_FILE}" 2>&1 + echo "EXIT=$?" >>"${LOG_FILE}" +) & +PUSH_PID=$! + +elapsed=0 +while kill -0 "${PUSH_PID}" 2>/dev/null; do + if [ "${elapsed}" -ge "${TIMEOUT_SECS}" ]; then + kill -KILL "${PUSH_PID}" 2>/dev/null || true + pkill -KILL -P "${PUSH_PID}" 2>/dev/null || true + echo "[smoke] FAIL: push exceeded ${TIMEOUT_SECS}s timeout" >&2 + echo "[smoke] partial log:" >&2 + cat "${LOG_FILE}" >&2 || true + exit 1 + fi + sleep 1 + elapsed=$((elapsed + 1)) +done + +wait "${PUSH_PID}" 2>/dev/null || true + +if ! grep -q '^EXIT=0$' "${LOG_FILE}"; then + echo "[smoke] FAIL: push exited non-zero" >&2 + cat "${LOG_FILE}" >&2 + exit 1 +fi + +# Exactly one internal LFS invocation; zero manual ones +internal_count=$(grep -c '\[git-lfs\] executing hook' "${LOG_FILE}" || true) +manual_count=$(grep -c '\[lefthook\] run: git lfs pre-push' "${LOG_FILE}" || true) +upload_marker=$(grep -c 'Uploading LFS objects: 100%' "${LOG_FILE}" || true) + +if [ "${internal_count}" != "1" ]; then + echo "[smoke] FAIL: expected 1 internal LFS invocation, found ${internal_count}" >&2 + cat "${LOG_FILE}" >&2 + exit 1 +fi + +if [ "${manual_count}" != "0" ]; then + echo "[smoke] FAIL: a manual 'git lfs pre-push' command is present" \ + "(${manual_count} occurrences). The duplicate-execution regression has returned." >&2 + echo "[smoke] See dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md for why this fails." >&2 + cat "${LOG_FILE}" >&2 + exit 1 +fi + +if [ "${upload_marker}" = "0" ]; then + echo "[smoke] FAIL: no 'Uploading LFS objects' marker in log" >&2 + cat "${LOG_FILE}" >&2 + exit 1 +fi + +echo "[smoke] PASS: push completed; 1 internal LFS call, 0 manual, LFS upload confirmed."