[M5+] add resolver::nixpkgs_git_resolve fallback

This commit is contained in:
2026-05-10 12:19:25 +00:00
parent df2c25b559
commit cb82e918d8
7 changed files with 622 additions and 0 deletions

339
docs/version-resolution.md Normal file
View File

@@ -0,0 +1,339 @@
# Version-resolution algorithm
Status: in progress (Phases 12 of 6 done). This doc fixes the contract
for **`(package, version) → nixpkgs commit_hash`** discovery and the
flake-codegen pipeline that consumes it. It overrides `SPEC.md` §10's
single-shared-rev model with a per-dep-rev model (user-directed; SPEC
amendment is Phase 6).
## Overview
```
cargoxx add <pkg>@<ver>
┌──────────────────────┐
│ resolve_version(name,│
│ version) │
└──────────────────────┘
│ │
primary HTTP │ │ offline fallback
▼ ▼
┌──────────────────┐ ┌──────────────────────┐
│ devbox_resolve │ │ nixpkgs_git_resolve │
│ search.devbox.sh │ │ ~/.cache/cargoxx/ │
│ /v1/resolve │ │ nixpkgs/ (lazy) │
└──────────────────┘ └──────────────────────┘
│ │
└──┬───┘
Result<std::string /*commit_hash*/>
cmd_add writes nixpkgs_rev into Cargoxx.lock
▼ (later)
cargoxx build
codegen::flake_nix reads lockfile
emits per-pinned-dep nixpkgs input
```
## When does resolution run?
| Trigger | What gets resolved |
| --- | --- |
| `cargoxx add <pkg>@<ver>` | `(pkg, ver)` is resolved exactly once. The resulting commit is written to `Cargoxx.lock` next to the dep entry. |
| `cargoxx add <pkg>` (no `@<ver>`) | **Not** resolved. Lockfile entry's `nixpkgs_rev` stays `nullopt`. The generated flake.nix uses only the shared `nixpkgs.url = github:NixOS/nixpkgs/nixos-unstable`. |
| `cargoxx build` (lockfile already has rev) | **Not re-resolved.** `cargoxx build` reads existing lockfile entries and preserves `nixpkgs_rev`. Re-resolution would require an explicit `cargoxx update` (deferred to v0.3). |
| `cargoxx build` (lockfile missing the rev for a dep) | Synthesized as null — same as the wildcard path. (Future: also call `resolve_version` here when manifest spec is concrete.) |
`cargoxx build` is **idempotent with respect to the lockfile**
running it twice produces byte-identical `flake.nix` + `Cargoxx.lock`
provided the manifest hasn't changed. This is the property the
"lockfile merge" change in Phase 4 enforces.
## resolve_version
```
auto resolve_version(name: string, version: string) -> Result<string /*sha40*/>:
if r := devbox_resolve(name, version); r.has_value():
return r->commit_hash
if r := nixpkgs_git_resolve(name, version); r.has_value():
return *r
return std::unexpected(ResolutionVersionNotFound)
```
Implementation point: this orchestrator lives in
`src/resolver/resolver.cppm` (declaration) +
`src/resolver/version_resolve.cpp` (definition). Both probes are
already implemented — Phase 3 just wires them into the orchestrator
and into `cmd_add`.
### Probe A — devbox_resolve (primary, HTTP)
**File:** `src/resolver/search_devbox.cpp` (committed `df2c25b`)
**URL pattern:**
```
GET https://search.devbox.sh/v1/resolve?name=<urlencoded-name>&version=<urlencoded-version>
```
This is the same endpoint devbox itself uses
(`devbox/internal/searcher/client.go` `Resolve()`). Behind the URL is
the same Jetify backend that powers nixhub.io.
**Response shape (real, abbreviated for `fmt 10.2.1`):**
```json
{
"commit_hash": "f4b140d5b253f5e2a1ff4e5506edbf8267724bde",
"version": "10.2.1",
"name": "fmt",
"attr_paths": ["fmt"],
"systems": {
"x86_64-linux": {
"commit_hash": "f4b140d5b253f5e2a1ff4e5506edbf8267724bde",
"attr_paths": ["fmt"], ...
}, ...
}
}
```
**Parser contract** (`parse_devbox_resolve`):
- `commit_hash` is mandatory. If the top-level field is missing, fall
back to the first non-empty `systems.<plat>.commit_hash`.
- `name`, `version`, `attr_paths` are best-effort; absence leaves them
blank.
- 404 / curl exit 22 → `ResolutionUnknownPackage`.
- Empty `commit_hash` after fallback → `ResolutionVersionNotFound`.
- Other curl exits, JSON parse errors → `ResolutionNetworkError`.
**Timeout:** 10 s on `--max-time`, 15 s wrapping `ExecOptions.timeout`.
### Probe B — nixpkgs_git_resolve (offline fallback)
**File:** `src/resolver/nixpkgs_git.cpp` (committed in Phase 2 series)
**Setup:** lazy clone of
`https://github.com/NixOS/nixpkgs.git` into
`$XDG_CACHE_HOME/cargoxx/nixpkgs/` (or `$HOME/.cache/...`) on first
use. ~9 GB and slow (515 min); subsequent calls are fast and offline.
**Search:**
```
git -C <repo> log --all \
-S 'version = "<urlencoded-version>"' \
--pretty='%H %ct' \
-- pkgs/
```
`-S '<term>'` returns commits that *introduced or removed* the literal
string. `--pretty='%H %ct'` emits `<sha40> <committer-time>` per
line. We restrict to `pkgs/` to keep noise down (out-of-tree match
sites in `lib/`, `nixos/`, etc. don't matter).
**Pick:** youngest committer-time (`%ct` highest) wins. The pure
helper `pick_youngest_commit(text)` does this; it tolerates malformed
lines (skips them).
**Errors:**
- `pick_youngest_commit` returns `nullopt``ResolutionVersionNotFound`.
- Clone failure → `ResolutionNetworkError`.
- Subsequent `git log` failure → `ResolutionNetworkError`.
**Test fixture trick:** instead of cloning real nixpkgs in tests, the
unit test builds a tiny throwaway repo with
`pkgs/development/libraries/<pkg>/default.nix` files at two versions
and asserts introducing-commit detection works.
### Heuristic limits
`-S 'version = "<v>"'` is fuzzy — it matches **any** file in `pkgs/`
that has that literal. Two real-world failure modes:
1. **Unrelated package match.** `version = "1.0.0"` appears in many
nix derivations. The youngest-commit tiebreaker biases toward
"the most recent thing that touched this string", which usually
*is* the package's bump commit, but not guaranteed.
2. **Non-string-formed versions.** Some derivations build the version
via `lib.removeSuffix`, interpolation, or an inherited
`pname`/`finalAttrs.version`. `-S` won't see those. For those
packages, only the devbox HTTP path can answer.
Both are accepted as known limits — the HTTP path is primary and fast
when reachable; the git fallback exists only for offline determinism.
## Lockfile interaction
`Cargoxx.lock` already carries `LockfilePackage.nixpkgs_rev`
(`std::optional<std::string>`). No schema change.
### Add path
`cmd_add fmt@10.2.1`:
1. existing manifest validation, duplicate check, linkdb resolve /
discover (separate auto-resolution feature, already shipped).
2. **NEW:** call `resolve_version("fmt", "10.2.1")`. On success,
capture `commit_hash`.
3. existing manifest write of `[dependencies] fmt = "10.2.1"`.
4. **NEW:** load lockfile (or initialize empty), find/insert the
`LockfilePackage{ name="fmt", version="10.2.1" }` entry, set
`nixpkgs_rev = "<commit_hash>"`, write lockfile back.
`cmd_add fmt` (wildcard) skips step 2 and step 4's `nixpkgs_rev`
assignment.
### Build path (Phase 4 fix)
Today, `synthesize_lockfile` overwrites the lockfile every time. With
per-dep revs in scope this would erase pinned revs on every build.
The fix:
```
build_lockfile(manifest, recipes):
let prior = parse(project_root / "Cargoxx.lock") or empty
for each dep in manifest.dependencies:
let prior_entry = prior.find(dep.name, dep.version_spec)
new_entry = LockfilePackage{ name, version=dep.version_spec, ... }
if prior_entry: new_entry.nixpkgs_rev = prior_entry.nixpkgs_rev
emit new_entry
```
The lookup key is `(name, version)`. If the user changes the version,
the prior rev is dropped (correct — the rev was for the old version).
If the user neither edited nor `cargoxx update`d, the rev survives.
### Update path (deferred to v0.3)
`cargoxx update <pkg>` would call `resolve_version` again with the
existing manifest version_spec, possibly upgrading the rev to a
newer one, even when the user-visible version string is unchanged.
Out of scope for this milestone.
## Flake codegen — per-dep inputs
**Phase 5.** Today's `flake.nix` template has a single
`@@NIXPKGS_REV@@` placeholder. The new template emits:
### Inputs block
```nix
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
# one line per dep with non-null nixpkgs_rev:
nixpkgs-fmt-10_2_1.url = "github:NixOS/nixpkgs/f4b140d5b...";
nixpkgs-spdlog-1_13_0.url = "github:NixOS/nixpkgs/abcdef0...";
flake-utils.url = "github:numtide/flake-utils";
};
```
### Outputs lambda
```nix
outputs = { self, nixpkgs, nixpkgs-fmt-10_2_1, nixpkgs-spdlog-1_13_0,
flake-utils }: ...
```
### Let bindings
```nix
let
pkgs = import nixpkgs { inherit system; };
pkgs_fmt_10_2_1 = import nixpkgs-fmt-10_2_1 { inherit system; };
pkgs_spdlog_1_13_0 = import nixpkgs-spdlog-1_13_0 { inherit system; };
llvmPkgs = pkgs.llvmPackages;
in {...}
```
### buildInputs
```nix
buildInputs = [
pkgs_fmt_10_2_1.fmt # pinned dep
pkgs_spdlog_1_13_0.spdlog # pinned dep
pkgs.zlib # unpinned: uses default nixpkgs
];
```
Unpinned deps (where `nixpkgs_rev` is null) reference the shared
`pkgs` set as today.
### Sanitization
Helper in `src/codegen/flake.cpp`:
```cpp
auto sanitize_input_attr(std::string_view name, std::string_view version)
-> std::string;
```
Steps:
1. Concatenate `nixpkgs-<name>-<version>`.
2. Replace every char outside `[a-zA-Z0-9_-]` with `_`. Mostly
converts dots in versions: `10.2.1``10_2_1`.
3. Use the sanitized form in **all three** places: `inputs.<attr>`,
the `outputs = { …, <attr>, … }` parameter list, and the
`pkgs_<attr-with-dashes-as-underscores>` `let` binding.
Examples:
- `fmt` + `10.2.1` → input attr `nixpkgs-fmt-10_2_1`,
`let` binding `pkgs_fmt_10_2_1`
- `range-v3` + `0.12.0``nixpkgs-range-v3-0_12_0`,
`pkgs_range_v3_0_12_0`
- `boost_filesystem` + `1.84.0``nixpkgs-boost_filesystem-1_84_0`
The `let`-binding name needs **all** non-alpha-num replaced with `_`
(hyphens included) because nix variable names disallow hyphens. The
**input** attr keeps hyphens (allowed in input names). Two derived
forms.
### Collision detection
Two pinned deps with the same `(sanitized_name, sanitized_version)`
collide. With the version stored fully (e.g. `10.2.1`, never the
manifest spec `10.2`) and dep names being unique within a manifest,
collisions are pathologically rare. If a real one is ever reported,
mitigation is to append `-<short-sha>` to the input attr.
## Phase status
| Phase | Status | Commit |
| --- | --- | --- |
| 1. devbox_resolve + parser | ✅ | `df2c25b` |
| 2. nixpkgs_git_resolve fallback | ✅ | (this commit) |
| 3. resolve_version + cmd_add wire-up | pending | — |
| 4. cmd_build lockfile merge | pending | — |
| 5. flake codegen for per-dep inputs | pending | — |
| 6. SPEC §7/§10 amendment + smoke | pending | — |
## End-to-end verification (Phase 6)
```sh
cd /tmp && rm -rf demo && mkdir demo && cd demo
cargoxx new app && cd app
cargoxx add fmt@10.2.1
grep "nixpkgs-fmt-10_2_1" flake.nix # input present
grep "f4b140d5" flake.nix # commit_hash substituted
cargoxx build && ./build/debug/app # binary builds + runs
cargoxx build # second run is no-op
diff <prev-flake.nix> flake.nix # byte-identical
```
A second `cargoxx build` regenerates byte-identical
`Cargoxx.lock` + `flake.nix` — proves the merge path preserves the
rev, not re-resolves it.
## ABI note
Mixing nixpkgs revisions across pinned deps trades the single-rev
ABI guarantee (SPEC §10) for flexibility. Two pinned deps may have
been compiled against different glibc / libc++ majors and fail to
link cleanly. v0.2 silently accepts the risk; surfacing a
compatibility warning is a future polish item.