Files
crewli/dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md
bert.hausmans 37af961b3e fix(lefthook): remove duplicate git-lfs pre-push command
Lefthook v2 runs `git lfs pre-push` internally for pre-push hooks (per
docs/usage/features/git-lfs.md; confirmed in internal/run/controller/
lfs.go where the internal handler invokes `git lfs pre-push <remote>
<url>` with a buffered `cachedStdin`). Our manual `git-lfs:` command
in lefthook.yml was a second invocation against the same remote; the
duplicate is directly visible in `LEFTHOOK_VERBOSE=1` output as
`[git-lfs] executing hook` (internal) followed by `[lefthook] run:
git lfs pre-push` (manual).

The previous fix attempt (piped: true, commit 1b06804) was based on a
wrong understanding of `piped`'s semantics — `piped` controls
fail-fast behavior, not stdin routing or sequencing. Default lefthook
behavior is already sequential per docs/configuration/parallel.md.
That "fix" was placebo; incident 2 (F2 push, zero LFS objects, commit
99eedb6) proved it.

Phase A investigation: documentary + source confirmation that lefthook
owns the LFS pre-push call. Phase B sandbox test against a filesystem
remote confirmed the duplicate execution in logs but did NOT reproduce
the production hang — likely because the duplicate manual call against
a local remote has no LFS server to interact with. A network-y remote
(Gitea over SSH/HTTPS) appears to be part of the trigger. Two
mechanisms remain plausible (H1: PTY-stdin without EOF in
`while read` loop per docs/configuration/use_stdin.md; H4: server-side
LFS interaction on the duplicate call). Both are eliminated by the
same fix: remove the manual command. LFS uploads continue to work via
lefthook's internal handler (verified in sandbox post-fix).

Regression coverage: scripts/test-lefthook-pre-push.sh asserts exactly
one internal LFS invocation, zero manual ones, and `Uploading LFS
objects: 100%` present, against a disposable sandbox.

See dev-docs/ADR-LEFTHOOK-LFS-INTEGRATION.md for full context, both
misconceptions to prevent regression, and the alternative-scenarios
playbook if Phase E ever regresses.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-11 00:18:56 +02:00

312 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "ADR — Lefthook + Git-LFS integration: own the hook layer, don't duplicate it"
status: Accepted
date: 2026-05-11
owner: infra
supersedes: commit 1b06804 (the `piped: true` placebo fix)
---
# ADR — Lefthook + Git-LFS integration
## Status
**Accepted, 2026-05-11.**
## Context
Crewli's `pre-push` git hook is managed by **lefthook v2** (introduced
in WS-3, April 2026, for `.claude-sync/` post-commit propagation and
non-blocking sync-staleness warnings on push). **Git LFS** was added
in TEST-INFRA-001 (May 2026, commit `b8d18e6`) to store visual
regression baseline PNGs without bloating the git pack.
The interaction between the two tools is therefore **two days old**.
For the first weeks of the project, only one of the two was active at
a time — the bug below did not exist because the trigger was not yet
present.
### Symptom: `git push` hangs in pre-push hook
Two incidents are on record. Both manifest as `git push` blocking
indefinitely inside the `pre-push` hook with no path to recover other
than `--no-verify`:
| Incident | Commit pushed | LFS objects in push | Symptom |
|---|---|---|---|
| 1 | `b8d18e6` (TEST-INFRA-001) | 5 PNGs | `Uploading LFS objects: 100% (5/5), 265 KB, done.` → hang |
| 2 | `99eedb6` (F2 docs) | 0 | Spinner stuck at `waiting: git-lfs` |
The previous fix attempt — commit `1b06804`, which added
`piped: true` to the pre-push block — appeared to succeed on its own
push but failed on the next non-LFS push (incident 2). It was based
on a wrong understanding of `piped`'s semantics (see "Two misconceptions"
below).
## Decision
**Lefthook owns the `pre-push` hook for both sync-check AND LFS.** LFS
upload is delegated to lefthook's **built-in** pre-push LFS handler.
The `lefthook.yml` `pre-push` block contains **one** user-defined
command (`sync-check`) and nothing else.
Concretely:
```yaml
# lefthook.yml
pre-push:
commands:
sync-check:
run: bash .githooks/pre-push
```
No `git-lfs:` command. No `piped: true`. No `use_stdin: true`. No
`skip_lfs: true`.
## Rationale
### Lefthook already runs git-lfs internally for pre-push
`docs/usage/features/git-lfs.md` (verbatim):
> Lefthook runs LFS hooks internally for the following hooks:
> post-checkout, post-commit, post-merge, **pre-push**.
> Errors are suppressed if git LFS is not required for the project.
Source confirmation in `internal/run/controller/lfs.go`:
```go
err = c.cmd.RunWithContext(
ctx,
append([]string{"git", "lfs", hookName}, args...),
"",
c.cachedStdin, // ← buffered cached stdin with proper EOF
out, errOut,
)
```
The internal handler reads git's pre-push stdin **once**, caches it in
a `bytes.Buffer`, then invokes `git lfs pre-push <remote> <url>` with
that buffer as stdin — which delivers EOF cleanly when consumed.
### The manual command was a literal duplicate, directly visible in logs
Sandbox reproduction with the pre-fix `lefthook.yml` (filesystem
remote, `LEFTHOOK_VERBOSE=1`) shows both invocations in the same
push:
```
│ [git-lfs] executing hook: git lfs pre-push origin /tmp/... ← internal
│ [git-lfs] stdout: Uploading LFS objects: 100% (1/1), ..., done.
│ [lefthook] job: git lfs pre-push origin /tmp/... ← duplicate
│ [lefthook] run: git lfs pre-push origin /tmp/...
┃ git-lfs
```
After the fix (same sandbox, same data), the second invocation is
gone and the LFS upload still happens via the internal call:
```
│ [git-lfs] executing hook: git lfs pre-push origin /tmp/...
│ [git-lfs] stdout: Uploading LFS objects: 100% (1/1), ..., done.
│ [lefthook] run: bash .githooks/pre-push
┃ sync-check
```
### Sandbox falsifiability test: result and limitation
A `/tmp/lefthook-lfs-test/` sandbox with a bare-repo file remote was
built per the Phase B spec. Both the pre-fix LFS-object push and the
pre-fix docs-only push **completed in ~0.3 seconds; the hang did not
reproduce**, despite the duplicate execution being clearly visible in
the logs.
This means the duplicate execution is **necessary but possibly not
sufficient** for the production hang. A network remote (Gitea over
SSH/HTTPS) appears to be a required additional ingredient — the
sandbox's filesystem remote requires no LFS server handshake, so the
manual second invocation exits immediately with nothing to verify.
Two mechanisms are plausible; the same fix addresses both:
- **H1 (Phase A):** PTY-stdin without EOF in the manual command's
`while read` loop (`docs/configuration/use_stdin.md` verbatim:
*"without this option lefthook will hang: lefthook uses pseudo TTY
by default, and it doesn't close stdin when all data is read"*).
- **H4 (Phase B):** Server-side LFS interaction on the duplicate
call — locking, exhausted SSH session, or a stalled HTTPS response
on the verify path — that only manifests against a real Gitea LFS
endpoint.
Because removal of the manual command is **zero-cost** (it is
redundant by construction; the internal call already does the work),
the fix is correct on architectural grounds regardless of which
mechanism is the proximate cause.
## Two misconceptions to prevent regression
### 1. The `--skip-repo` comment was technically correct but conceptually misleading
The pre-fix `lefthook.yml` carried this comment on the manual command:
```yaml
git-lfs:
# `git lfs install --skip-repo` only sets up global clean/smudge
# filters — it does NOT install the per-repo pre-push hook
# (which would conflict with lefthook). We delegate the LFS
# upload step here so screenshot baselines tracked via LFS get
# pushed alongside their commits.
run: git lfs pre-push {1} {2}
```
**The first two lines are accurate.** `git lfs install --skip-repo`
does set the global clean/smudge filters without writing
`.git/hooks/pre-push`. That part of the reasoning is fine and remains
the right setup for a lefthook-managed repo.
**The third line — "we delegate the LFS upload step here" — is
wrong.** Lefthook delegates the LFS upload step **internally**,
without needing either the per-repo hook OR the manual command.
A future reviewer who reads only the first two lines may find them
compelling and revert to the manual command — that would re-introduce
the bug.
This ADR is the authoritative correction: **no manual command. No
per-repo hook. Lefthook's built-in handles it.**
### 2. `piped: true` does NOT control sequencing or stdin routing
The pre-fix `lefthook.yml` justified `piped: true` as follows
(commit `1b06804`):
> piped: true forces serial execution. Both sync-check and git-lfs
> read from stdin (git pipes the push refspec to pre-push hooks);
> default parallel execution deadlocks with two stdin readers.
This is **wrong on two counts**:
- Default lefthook execution is **already sequential**.
`docs/configuration/parallel.md` verbatim: *"Lefthook runs commands
and scripts sequentially by default."* `piped: true` does not
change ordering vs. the default.
- `piped: true` has **no documented relationship to stdin handling**.
`docs/configuration/piped.md` verbatim: *"Stop running commands and
scripts if one of them fail."* That is the entire semantic.
The fix in `1b06804` was a placebo. It worked on its own push by
coincidence and failed on the next non-LFS push (incident 2). A
future reviewer should not assume `piped: true` solves any
stdin-related concern.
## Consequences
- **LFS uploads continue to happen on `git push`** — automatically,
via lefthook's internal pre-push LFS handler. No configuration
required.
- **If you ever need to disable lefthook's LFS handling** (e.g.,
in a sub-repo or CI context that handles LFS separately), set
`skip_lfs: true` at the root of `lefthook.yml` per
`docs/usage/features/git-lfs.md`. Do **not** remove the internal
handler by adding a manual command — that re-introduces the
duplicate.
- **Adding new `pre-push` commands that read stdin** requires
`use_stdin: true` on **exactly one** command per
`docs/configuration/use_stdin.md`:
*"With many commands or scripts having `use_stdin: true`, only one
will receive the data. The others will have nothing."*
Lefthook's internal LFS handler is already that one consumer for
its cached stdin buffer, so any new stdin-reading command would
need to coordinate (most likely: rework so only one consumer
exists, or accept that the new command does not see the refspec).
- **The non-blocking pre-push sync-check** (`.githooks/pre-push`)
remains the only user-defined command. It does not read stdin, so
this constraint does not affect it.
## Alternative scenarios (if Phase E regresses)
If the real `git push` of `fix/lefthook-lfs-deadlock` hangs after
this fix is applied, the architectural change is correct but a
**different proximate cause** is in play. In that case:
1. Do **not** revert this ADR. The duplicate execution is wrong
regardless.
2. Capture `LEFTHOOK_VERBOSE=1 GIT_TRACE=1 GIT_CURL_VERBOSE=1` output
from the hanging push.
3. Build a **network-y sandbox** (local Gitea container or an HTTP
stub server that responds to LFS verify-objects requests with
controlled timing) to reproduce against. The Phase B sandbox
(filesystem remote) is insufficient for this class of bug.
4. The investigation surface widens to git-lfs's interaction with
Gitea's LFS endpoint, SSH session lifetimes, and the lefthook
process tree around the `cachedStdin` buffer. Hypothesis H4 above
is the entry point.
## How to verify
### Sandbox procedure (Phase B reproduction; proves duplicate execution is gone)
```bash
# Disposable sandbox
rm -rf /tmp/lefthook-lfs-test
mkdir -p /tmp/lefthook-lfs-test
git init --bare /tmp/lefthook-lfs-test/remote.git -q
git init -q /tmp/lefthook-lfs-test/work
cd /tmp/lefthook-lfs-test/work
git config user.email "t@t" && git config user.name "t"
git remote add origin /tmp/lefthook-lfs-test/remote.git
# Mirror Crewli's config
cp <crewli-root>/lefthook.yml .
mkdir .githooks
cp <crewli-root>/.githooks/{pre-push,post-commit} .githooks/
chmod +x .githooks/*
# Install hooks; LFS filter-only
LEFTHOOK_BIN=<path-to>/lefthook lefthook install
git lfs install --skip-repo
git lfs track "*.png"
# Add an LFS-tracked file, commit, push
# (any small PNG works)
git add .gitattributes lefthook.yml .githooks <some>.png
LEFTHOOK=0 git commit -q -m "init"
LEFTHOOK_VERBOSE=1 git push -u origin master
```
**Pass criteria:**
- Push completes within a few seconds (target: <5s on local FS).
- Logs show exactly **one** `[git-lfs] executing hook: git lfs pre-push`
line.
- Logs show **no** `[lefthook] run: git lfs pre-push` line (only
`[lefthook] run: bash .githooks/pre-push`).
- `Uploading LFS objects: 100% (...)` appears, proving the internal
handler did the work.
### Production procedure (Phase E; proves the real hang is gone)
```bash
# In the crewli repo, on any feature branch with at least one new commit:
git push -u origin <branch> # NO --no-verify
```
**Pass criteria:**
- Push completes within ~30 seconds (well under the 30s timeout
observed in the incidents).
- No spinner gets stuck at `waiting: git-lfs`.
- If the branch contains LFS-tracked changes:
`Uploading LFS objects: 100% (N/N), ...` appears and the push
completes.
**If verification fails:** see "Alternative scenarios" above. Do not
revert this ADR; investigate via H4 with a network-y sandbox.
## References
- `docs/usage/features/git-lfs.md` (lefthook) — internal LFS hook list
- `docs/configuration/use_stdin.md` (lefthook) — PTY-stdin EOF behavior
- `docs/configuration/piped.md` (lefthook) — actual semantics of `piped`
- `docs/configuration/parallel.md` (lefthook) — default sequential execution
- `internal/run/controller/lfs.go` (lefthook) — `cachedStdin` source
- `internal/git/lfs.go` (lefthook) — pre-push in `lfsHooks` map
- `git-scm.com/docs/githooks` — pre-push stdin spec
- Crewli commits: `b8d18e6` (LFS introduced), `1b06804` (failed first-attempt fix), `99eedb6` (incident 2)