Files
cargoxx/docs/auto-resolution.md

274 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Auto-resolution for non-curated packages
Status: in progress. Tracks the implementation of `cargoxx add <pkg>` for
packages that are not in `data/linkdb.json`. See `SPEC.md` §9 step 46 for
the contract this implements.
## Goal
Today `cargoxx add` only succeeds for the 25 packages baked into
`data/linkdb.json`. This work extends `cargoxx add <pkg>` to fall through
to the user's local machine and, on success, persist the discovered
recipe to the SQLite overlay so subsequent runs are instant.
The user-stated steps:
1. confirm the package exists in `nixpkgs` (`nixos-unstable`),
2. discover its CMake `find_package` / target rules via Conan, then vcpkg,
then by scanning `lib/cmake/**/*Config.cmake` under the package's nix
store path,
3. verify the candidate by building an empty program that links the dep,
4. record the version (already in hand from step 1's `nix eval`),
5. write the recipe to the overlay so it sticks.
## Design decisions
| Decision | Choice | Why |
| --- | --- | --- |
| Verify depth | full `cargoxx build` of a tmp project | catches link / ABI errors that configure-only would miss (e.g. abseil-cpp's libstdc++ vs libc++ mismatch already exposed by `verify-curated-db.sh`) |
| Probe order | Conan → vcpkg → nix-cmake-scan; first that *passes verification* wins; failed candidates fall through | maximizes hit rate without polluting overlay |
| Discovery side-effects | `Database::resolve()` stays pure (overlay+curated only); a separate `Database::discover()` does network + verify + persist | preserves the existing test surface; `cmd_add` orchestrates the chain |
| Failure caching | populate `resolution_failures` (already in schema) when *all* probes fail; subsequent retries within 24 h short-circuit | prevents repeated minute-long retries |
| Verification result handling | scaffold tmp project, write provisional overlay row with `verified_at = 0`, build; on success rewrite `verified_at = now`; on failure delete the row | overlay only ever holds verified recipes |
## Resolution chain
```
db.resolve(name, version, components)
├─ overlay rows (existing)
├─ curated JSON (existing)
└─ on LinkdbUnknownPackage → cmd_add calls db.discover(name, project_root)
├─ nixpkgs probe: nix eval nixpkgs#<name> for { version, path }
│ fail → resolution_failures, return error
├─ Conan probe: GET conan-center-index/recipes/<name>/all/conanfile.py
│ regex out cmake_target_name + cmake_file_name
├─ vcpkg probe: GET microsoft/vcpkg/ports/<name>/usage
│ parse the literal CMake snippet
├─ nix-cmake-scan: walk <path>/lib/cmake/**/*Config.cmake
│ regex add_library(<name> ... IMPORTED) for targets
│ derive find_package name from the *Config.cmake filename stem
├─ for each candidate (in order above):
│ verify_link(candidate, name, version, components, overlay_path)
│ — scaffold tmp project (cmd_new),
│ — provisional overlay row pointing at the candidate,
│ — write empty src/main.cpp,
│ — call cmd_build(no_build = false) to run nix develop -c
│ cmake configure + build,
│ — succeeds → rewrite overlay row with verified_at = now;
│ return Recipe to caller
│ — fails → delete provisional row, try next probe
└─ all candidates failed → record to resolution_failures;
return ResolutionUnsatisfiable
```
## File layout
```
src/resolver/
├── resolver.cppm # public API surface for all resolver helpers
├── nixpkgs_probe.cpp # ✅ Phase 1 (committed: 1c7ff39)
├── nix_cmake_scan.cpp # Phase 2
├── conan_probe.cpp # Phase 3
├── vcpkg_probe.cpp # Phase 4
└── verify_link.cpp # Phase 5
```
`Database::discover` and the `cmd_add` wire-up land in Phase 6 by editing
`src/linkdb/curated.cpp`, `src/linkdb/overlay.cpp`, and
`src/cli/cmd_add.cpp`.
The deferred files in `TECH_SPEC.md` §1 (`nixhub.cpp`, `lazamar.cpp`,
`nixpkgs_git.cpp`) belong to a separate feature — the *version* resolver
that picks a concrete version from a range. Out of scope here.
## Critical files (re-)used
| File | Why |
| --- | --- |
| `src/linkdb/linkdb.cppm` | extend with `Database::discover()` declaration |
| `src/linkdb/curated.cpp:158` | `Database::resolve` already does overlay → curated; discovery is *not* folded in here, kept side-effect free |
| `src/linkdb/overlay.cpp` | split `overlay_insert_manual``overlay_insert_recipe(row, source)` so non-`manual` sources are persistable; add `overlay_delete_recipe`; add `overlay_record_failure` for `resolution_failures` |
| `src/cli/cmd_add.cpp:48` | after `db->resolve(...)` returns `LinkdbUnknownPackage`, call `db->discover(name, project_root)` and use the returned recipe |
| `src/exec/exec.cppm`, `src/exec/subprocess.cpp` | reuse `exec::run` for `nix eval` and `curl` — no new tooling, just new call sites |
| `src/util/util.cppm` | reuse `ResolutionUnknownPackage` (E40), `ResolutionNetworkError` (E41), `ResolutionUnsatisfiable` (E42); no new error codes |
| `src/cli/cmd_build.cpp` | called by `verify_link.cpp`; takes `overlay_path` and `project_root`; no signature change needed |
| `scripts/verify-curated-db.sh` | conceptual template for the `verify_link` flow — same pattern as that script, in code form |
## Probe specs
### A. nixpkgs_probe (✅ done — Phase 1, 1c7ff39)
```
nix eval nixpkgs#<pkg> --json --apply 'p: { version = p.version or ""; path = p.outPath; }'
```
- `--extra-experimental-features 'nix-command flakes'` baked into the call
so it works without user-side `nix.conf` flags.
- 60 s `ExecOptions.timeout`.
- Failure modes: missing attribute (`stderr` has `does not provide attribute`)
`ResolutionUnknownPackage`; otherwise `ResolutionNetworkError`.
- Returned: `NixpkgsInfo { attr, version, out_path }`.
- Field name **must** be `path`, not `outPath`. nix's `--json` mode coerces
any attrset containing `outPath` to a bare-string derivation reference,
which would lose the `version` field.
### B. nix_cmake_scan (Phase 2, next)
- Walk `<out_path>/lib/cmake/` recursively.
- For each `<X>Config.cmake` or `<X>-config.cmake`:
- `find_package` name = stem `<X>`.
- Read file. Regex
`add_library\(([^ ]+)\s+(STATIC|SHARED|INTERFACE|UNKNOWN)\s+IMPORTED\)`
to extract IMPORTED targets.
- Also pick up `add_library(<alias> ALIAS <real>)` so the canonical
`<alias>::<sub>` form gets detected.
- Pick best candidate:
1. case-insensitive equality between stem and `package_name`,
2. prefix match,
3. first config with non-empty target list.
- Returns `NixCmakeCandidate { find_package, targets, config_file }` or
`ResolutionUnknownPackage`.
### C. Conan probe (Phase 3)
- Text-only — never executes Python. SPEC §14 mandates this.
- `curl -fsSL https://raw.githubusercontent.com/conan-io/conan-center-index/master/recipes/<pkg>/all/conanfile.py`.
- Regex `cmake_target_name\s*=\s*['"]([^'"]+)['"]` and same for
`cmake_file_name`. Handle both `cpp_info.set_property("cmake_target_name", ...)`
and the legacy `self.cpp_info.names["cmake"] = "..."` forms.
- Pure parser exposed as `parse_conanfile(text)`; the network adapter
wraps `curl` via `exec::run`.
- 404 → `ResolutionUnknownPackage`; transport errors → `ResolutionNetworkError`.
### D. vcpkg probe (Phase 4)
- `curl -fsSL https://raw.githubusercontent.com/microsoft/vcpkg/master/ports/<pkg>/usage`.
- The file is plain CMake. Extract first `find_package(<name> ...)` line and
any `target_link_libraries(... <pkg>::...)` lines.
- Pure parser exposed as `parse_vcpkg_usage(text)`.
### E. verify_link (Phase 5)
```cpp
auto verify_link(const Recipe& candidate,
const std::string& name,
const std::string& version_spec,
const std::vector<std::string>& components,
const std::filesystem::path& cargoxx_overlay_path)
-> util::Result<void>;
```
- Create `<tmp>/cargoxx-verify-<name>` (mktemp).
- `cmd_new(name, /*lib_only=*/false, tmp_parent)`.
- Insert `candidate` into `cargoxx_overlay_path` with the right `source`
and `verified_at = 0` (provisional).
- Mutate the scaffolded manifest to declare `name` with `version_spec`
and `components`.
- Overwrite `src/main.cpp` with `int main() {}` — empty body. The point
is to exercise find_package + target_link_libraries + linker, *not* to
call any specific API (which would require per-package knowledge).
- Call `cmd_build(tmp_proj, no_build=false, release=false,
target=nullopt, overlay_path=cargoxx_overlay_path)`.
- On success: rewrite the overlay row with `verified_at = now()`,
return `{}`.
- On failure: delete the provisional row, return the build error.
- Always: `std::filesystem::remove_all(tmp_dir)` (RAII helper).
## Persistence semantics
| Probe path | `source` column | `verified_at` | TTL (existing `overlay_is_fresh`) |
| --- | --- | --- | --- |
| Conan probe verified | `conan` | now | 30 days |
| vcpkg probe verified | `vcpkg` | now | 30 days |
| nix-cmake-scan verified | `nix-probe` | now | 30 days |
| Manual via `linkdb add` | `manual` | now | never expires |
`resolution_failures` populated only when **all** probes fail. Subsequent
`cargoxx add` calls within 24 h skip probing and return the cached error.
## Phasing (one commit per phase)
| Phase | Status | Commit |
| --- | --- | --- |
| 1. nixpkgs_probe + JSON parser | ✅ | `1c7ff39` |
| 2. nix_cmake_scan | pending | — |
| 3. conan_probe + parse_conanfile | pending | — |
| 4. vcpkg_probe + parse_vcpkg_usage | pending | — |
| 5. verify_link (tmp project + cmd_build) | pending | — |
| 6. Database::discover + cmd_add wire-up + failure caching | pending | — |
## Testing strategy
| Test | Mechanism |
| --- | --- |
| `parse_nix_eval_json(text)` | ✅ Catch2 unit (`tests/nixpkgs_probe_parse.cpp`) |
| `nixpkgs_probe(name)` | ✅ network-gated (`tests/nixpkgs_probe_live.cpp`); requires `CARGOXX_NETWORK_TESTS=1` |
| `scan_imported_targets(text)` | Catch2 unit |
| `nix_cmake_scan(tmp)` | Catch2 unit using a fixture tree |
| `parse_conanfile(text)` | Catch2 unit; embedded conanfile.py snippets covering both old and new forms |
| `parse_vcpkg_usage(text)` | Catch2 unit |
| `conan_probe(name)` | network-gated; against `fmt` |
| `vcpkg_probe(name)` | network-gated; against `fmt` |
| `verify_link` end-to-end | network-gated; uses `simdjson` (small, present in nixpkgs, not in our curated DB) |
| `cmd_add` end-to-end on uncurated package | network-gated; full flow on `simdjson` |
Failure-mode coverage:
- Conan/vcpkg 404 → `ResolutionUnknownPackage`
- `nix eval` errors → `ResolutionUnknownPackage`
- All probes return candidates that fail to verify-link → record failure,
return `ResolutionUnsatisfiable`
- `resolution_failures` cache hit → returns the recorded error without
re-probing
## Definition of done
After Phase 6:
```sh
nix develop -c cmake --build build && \
ctest --test-dir build --output-on-failure # all unit tests green
CARGOXX_NETWORK_TESTS=1 nix develop -c ctest --test-dir build # live tests too
```
Manual smoke (matches the user's request 15):
```sh
cd /tmp && rm -rf simd-smoke && mkdir simd-smoke && cd simd-smoke
~/cargoxx/build/cargoxx new app && cd app
~/cargoxx/build/cargoxx add simdjson # not in curated; triggers discover
# Expected output:
# probing nixpkgs#simdjson ... ok (3.x.y)
# probing conan-center-index ... ok (cmake_target_name = simdjson::simdjson)
# verifying ... ok
# Added simdjson 3.x.y (linkdb: conan)
~/cargoxx/build/cargoxx build # ordinary build path now
# picks up the freshly cached
# overlay row
```
A second `cargoxx add simdjson` in another fresh project hits the overlay
directly and returns instantly — proves persistence step (5).
## Risks / known limits
- **Network**: Conan + vcpkg probes need outbound HTTPS. The
network-gated test layer covers this; the unit tests on pure parsers
don't need network.
- **Conan recipe shape variation**: ~10 % of recipes use Python
conditionals to set `cmake_target_name` per option — text parsing
will miss these. Falls through to vcpkg / nix-scan, which is the
point of the chain.
- **nix-cmake-scan heuristics**: packages without standard
`lib/cmake/<X>/<X>Config.cmake` layout won't be picked up. Acceptable
for v0.2; the manual escape hatch (`cargoxx linkdb add`) covers
edge cases.
- **Overlay growth**: long-tail packages will accumulate in the user's
overlay sqlite. No cleanup in v0.2 — not a concern at human-scale
package counts.
- **Verify-link slowness**: full `cargoxx build` per candidate. First
probe usually wins, so it's typically one build. Worst case: three
builds (Conan fail, vcpkg fail, nix-scan ok). Document as expected
behavior in the CLI output (`verifying...` progress message).