[M5+] add resolver::nix_cmake_scan

This commit is contained in:
2026-05-10 10:08:55 +00:00
parent 1c7ff39f64
commit e63ac69239
8 changed files with 783 additions and 0 deletions

273
docs/auto-resolution.md Normal file
View File

@@ -0,0 +1,273 @@
# Auto-resolution for non-curated packages
Status: in progress. Tracks the implementation of `cargoxx add <pkg>` for
packages that are not in `data/linkdb.json`. See `SPEC.md` §9 step 46 for
the contract this implements.
## Goal
Today `cargoxx add` only succeeds for the 25 packages baked into
`data/linkdb.json`. This work extends `cargoxx add <pkg>` to fall through
to the user's local machine and, on success, persist the discovered
recipe to the SQLite overlay so subsequent runs are instant.
The user-stated steps:
1. confirm the package exists in `nixpkgs` (`nixos-unstable`),
2. discover its CMake `find_package` / target rules via Conan, then vcpkg,
then by scanning `lib/cmake/**/*Config.cmake` under the package's nix
store path,
3. verify the candidate by building an empty program that links the dep,
4. record the version (already in hand from step 1's `nix eval`),
5. write the recipe to the overlay so it sticks.
## Design decisions
| Decision | Choice | Why |
| --- | --- | --- |
| Verify depth | full `cargoxx build` of a tmp project | catches link / ABI errors that configure-only would miss (e.g. abseil-cpp's libstdc++ vs libc++ mismatch already exposed by `verify-curated-db.sh`) |
| Probe order | Conan → vcpkg → nix-cmake-scan; first that *passes verification* wins; failed candidates fall through | maximizes hit rate without polluting overlay |
| Discovery side-effects | `Database::resolve()` stays pure (overlay+curated only); a separate `Database::discover()` does network + verify + persist | preserves the existing test surface; `cmd_add` orchestrates the chain |
| Failure caching | populate `resolution_failures` (already in schema) when *all* probes fail; subsequent retries within 24 h short-circuit | prevents repeated minute-long retries |
| Verification result handling | scaffold tmp project, write provisional overlay row with `verified_at = 0`, build; on success rewrite `verified_at = now`; on failure delete the row | overlay only ever holds verified recipes |
## Resolution chain
```
db.resolve(name, version, components)
├─ overlay rows (existing)
├─ curated JSON (existing)
└─ on LinkdbUnknownPackage → cmd_add calls db.discover(name, project_root)
├─ nixpkgs probe: nix eval nixpkgs#<name> for { version, path }
│ fail → resolution_failures, return error
├─ Conan probe: GET conan-center-index/recipes/<name>/all/conanfile.py
│ regex out cmake_target_name + cmake_file_name
├─ vcpkg probe: GET microsoft/vcpkg/ports/<name>/usage
│ parse the literal CMake snippet
├─ nix-cmake-scan: walk <path>/lib/cmake/**/*Config.cmake
│ regex add_library(<name> ... IMPORTED) for targets
│ derive find_package name from the *Config.cmake filename stem
├─ for each candidate (in order above):
│ verify_link(candidate, name, version, components, overlay_path)
│ — scaffold tmp project (cmd_new),
│ — provisional overlay row pointing at the candidate,
│ — write empty src/main.cpp,
│ — call cmd_build(no_build = false) to run nix develop -c
│ cmake configure + build,
│ — succeeds → rewrite overlay row with verified_at = now;
│ return Recipe to caller
│ — fails → delete provisional row, try next probe
└─ all candidates failed → record to resolution_failures;
return ResolutionUnsatisfiable
```
## File layout
```
src/resolver/
├── resolver.cppm # public API surface for all resolver helpers
├── nixpkgs_probe.cpp # ✅ Phase 1 (committed: 1c7ff39)
├── nix_cmake_scan.cpp # Phase 2
├── conan_probe.cpp # Phase 3
├── vcpkg_probe.cpp # Phase 4
└── verify_link.cpp # Phase 5
```
`Database::discover` and the `cmd_add` wire-up land in Phase 6 by editing
`src/linkdb/curated.cpp`, `src/linkdb/overlay.cpp`, and
`src/cli/cmd_add.cpp`.
The deferred files in `TECH_SPEC.md` §1 (`nixhub.cpp`, `lazamar.cpp`,
`nixpkgs_git.cpp`) belong to a separate feature — the *version* resolver
that picks a concrete version from a range. Out of scope here.
## Critical files (re-)used
| File | Why |
| --- | --- |
| `src/linkdb/linkdb.cppm` | extend with `Database::discover()` declaration |
| `src/linkdb/curated.cpp:158` | `Database::resolve` already does overlay → curated; discovery is *not* folded in here, kept side-effect free |
| `src/linkdb/overlay.cpp` | split `overlay_insert_manual``overlay_insert_recipe(row, source)` so non-`manual` sources are persistable; add `overlay_delete_recipe`; add `overlay_record_failure` for `resolution_failures` |
| `src/cli/cmd_add.cpp:48` | after `db->resolve(...)` returns `LinkdbUnknownPackage`, call `db->discover(name, project_root)` and use the returned recipe |
| `src/exec/exec.cppm`, `src/exec/subprocess.cpp` | reuse `exec::run` for `nix eval` and `curl` — no new tooling, just new call sites |
| `src/util/util.cppm` | reuse `ResolutionUnknownPackage` (E40), `ResolutionNetworkError` (E41), `ResolutionUnsatisfiable` (E42); no new error codes |
| `src/cli/cmd_build.cpp` | called by `verify_link.cpp`; takes `overlay_path` and `project_root`; no signature change needed |
| `scripts/verify-curated-db.sh` | conceptual template for the `verify_link` flow — same pattern as that script, in code form |
## Probe specs
### A. nixpkgs_probe (✅ done — Phase 1, 1c7ff39)
```
nix eval nixpkgs#<pkg> --json --apply 'p: { version = p.version or ""; path = p.outPath; }'
```
- `--extra-experimental-features 'nix-command flakes'` baked into the call
so it works without user-side `nix.conf` flags.
- 60 s `ExecOptions.timeout`.
- Failure modes: missing attribute (`stderr` has `does not provide attribute`)
`ResolutionUnknownPackage`; otherwise `ResolutionNetworkError`.
- Returned: `NixpkgsInfo { attr, version, out_path }`.
- Field name **must** be `path`, not `outPath`. nix's `--json` mode coerces
any attrset containing `outPath` to a bare-string derivation reference,
which would lose the `version` field.
### B. nix_cmake_scan (Phase 2, next)
- Walk `<out_path>/lib/cmake/` recursively.
- For each `<X>Config.cmake` or `<X>-config.cmake`:
- `find_package` name = stem `<X>`.
- Read file. Regex
`add_library\(([^ ]+)\s+(STATIC|SHARED|INTERFACE|UNKNOWN)\s+IMPORTED\)`
to extract IMPORTED targets.
- Also pick up `add_library(<alias> ALIAS <real>)` so the canonical
`<alias>::<sub>` form gets detected.
- Pick best candidate:
1. case-insensitive equality between stem and `package_name`,
2. prefix match,
3. first config with non-empty target list.
- Returns `NixCmakeCandidate { find_package, targets, config_file }` or
`ResolutionUnknownPackage`.
### C. Conan probe (Phase 3)
- Text-only — never executes Python. SPEC §14 mandates this.
- `curl -fsSL https://raw.githubusercontent.com/conan-io/conan-center-index/master/recipes/<pkg>/all/conanfile.py`.
- Regex `cmake_target_name\s*=\s*['"]([^'"]+)['"]` and same for
`cmake_file_name`. Handle both `cpp_info.set_property("cmake_target_name", ...)`
and the legacy `self.cpp_info.names["cmake"] = "..."` forms.
- Pure parser exposed as `parse_conanfile(text)`; the network adapter
wraps `curl` via `exec::run`.
- 404 → `ResolutionUnknownPackage`; transport errors → `ResolutionNetworkError`.
### D. vcpkg probe (Phase 4)
- `curl -fsSL https://raw.githubusercontent.com/microsoft/vcpkg/master/ports/<pkg>/usage`.
- The file is plain CMake. Extract first `find_package(<name> ...)` line and
any `target_link_libraries(... <pkg>::...)` lines.
- Pure parser exposed as `parse_vcpkg_usage(text)`.
### E. verify_link (Phase 5)
```cpp
auto verify_link(const Recipe& candidate,
const std::string& name,
const std::string& version_spec,
const std::vector<std::string>& components,
const std::filesystem::path& cargoxx_overlay_path)
-> util::Result<void>;
```
- Create `<tmp>/cargoxx-verify-<name>` (mktemp).
- `cmd_new(name, /*lib_only=*/false, tmp_parent)`.
- Insert `candidate` into `cargoxx_overlay_path` with the right `source`
and `verified_at = 0` (provisional).
- Mutate the scaffolded manifest to declare `name` with `version_spec`
and `components`.
- Overwrite `src/main.cpp` with `int main() {}` — empty body. The point
is to exercise find_package + target_link_libraries + linker, *not* to
call any specific API (which would require per-package knowledge).
- Call `cmd_build(tmp_proj, no_build=false, release=false,
target=nullopt, overlay_path=cargoxx_overlay_path)`.
- On success: rewrite the overlay row with `verified_at = now()`,
return `{}`.
- On failure: delete the provisional row, return the build error.
- Always: `std::filesystem::remove_all(tmp_dir)` (RAII helper).
## Persistence semantics
| Probe path | `source` column | `verified_at` | TTL (existing `overlay_is_fresh`) |
| --- | --- | --- | --- |
| Conan probe verified | `conan` | now | 30 days |
| vcpkg probe verified | `vcpkg` | now | 30 days |
| nix-cmake-scan verified | `nix-probe` | now | 30 days |
| Manual via `linkdb add` | `manual` | now | never expires |
`resolution_failures` populated only when **all** probes fail. Subsequent
`cargoxx add` calls within 24 h skip probing and return the cached error.
## Phasing (one commit per phase)
| Phase | Status | Commit |
| --- | --- | --- |
| 1. nixpkgs_probe + JSON parser | ✅ | `1c7ff39` |
| 2. nix_cmake_scan | pending | — |
| 3. conan_probe + parse_conanfile | pending | — |
| 4. vcpkg_probe + parse_vcpkg_usage | pending | — |
| 5. verify_link (tmp project + cmd_build) | pending | — |
| 6. Database::discover + cmd_add wire-up + failure caching | pending | — |
## Testing strategy
| Test | Mechanism |
| --- | --- |
| `parse_nix_eval_json(text)` | ✅ Catch2 unit (`tests/nixpkgs_probe_parse.cpp`) |
| `nixpkgs_probe(name)` | ✅ network-gated (`tests/nixpkgs_probe_live.cpp`); requires `CARGOXX_NETWORK_TESTS=1` |
| `scan_imported_targets(text)` | Catch2 unit |
| `nix_cmake_scan(tmp)` | Catch2 unit using a fixture tree |
| `parse_conanfile(text)` | Catch2 unit; embedded conanfile.py snippets covering both old and new forms |
| `parse_vcpkg_usage(text)` | Catch2 unit |
| `conan_probe(name)` | network-gated; against `fmt` |
| `vcpkg_probe(name)` | network-gated; against `fmt` |
| `verify_link` end-to-end | network-gated; uses `simdjson` (small, present in nixpkgs, not in our curated DB) |
| `cmd_add` end-to-end on uncurated package | network-gated; full flow on `simdjson` |
Failure-mode coverage:
- Conan/vcpkg 404 → `ResolutionUnknownPackage`
- `nix eval` errors → `ResolutionUnknownPackage`
- All probes return candidates that fail to verify-link → record failure,
return `ResolutionUnsatisfiable`
- `resolution_failures` cache hit → returns the recorded error without
re-probing
## Definition of done
After Phase 6:
```sh
nix develop -c cmake --build build && \
ctest --test-dir build --output-on-failure # all unit tests green
CARGOXX_NETWORK_TESTS=1 nix develop -c ctest --test-dir build # live tests too
```
Manual smoke (matches the user's request 15):
```sh
cd /tmp && rm -rf simd-smoke && mkdir simd-smoke && cd simd-smoke
~/cargoxx/build/cargoxx new app && cd app
~/cargoxx/build/cargoxx add simdjson # not in curated; triggers discover
# Expected output:
# probing nixpkgs#simdjson ... ok (3.x.y)
# probing conan-center-index ... ok (cmake_target_name = simdjson::simdjson)
# verifying ... ok
# Added simdjson 3.x.y (linkdb: conan)
~/cargoxx/build/cargoxx build # ordinary build path now
# picks up the freshly cached
# overlay row
```
A second `cargoxx add simdjson` in another fresh project hits the overlay
directly and returns instantly — proves persistence step (5).
## Risks / known limits
- **Network**: Conan + vcpkg probes need outbound HTTPS. The
network-gated test layer covers this; the unit tests on pure parsers
don't need network.
- **Conan recipe shape variation**: ~10 % of recipes use Python
conditionals to set `cmake_target_name` per option — text parsing
will miss these. Falls through to vcpkg / nix-scan, which is the
point of the chain.
- **nix-cmake-scan heuristics**: packages without standard
`lib/cmake/<X>/<X>Config.cmake` layout won't be picked up. Acceptable
for v0.2; the manual escape hatch (`cargoxx linkdb add`) covers
edge cases.
- **Overlay growth**: long-tail packages will accumulate in the user's
overlay sqlite. No cleanup in v0.2 — not a concern at human-scale
package counts.
- **Verify-link slowness**: full `cargoxx build` per candidate. First
probe usually wins, so it's typically one build. Worst case: three
builds (Conan fail, vcpkg fail, nix-scan ok). Document as expected
behavior in the CLI output (`verifying...` progress message).