# cargoxx — technical specification Companion to `SPEC.md`. Where `SPEC.md` defines what cargoxx does, this document defines how it is built. It is the contract between the project and the implementation. This is the v0.1 design. It is intentionally conservative and rigid — the goal is a working, debuggable tool, not an extensible platform. --- ## 1. Source tree The cargoxx repository itself follows the layout cargoxx will eventually generate. This is deliberate: the moment cargoxx can build a non-trivial C++ project, we switch its own build to use itself (see §15, bootstrap). ``` cargoxx/ ├── Cargoxx.toml # populated once we self-host (§15) ├── CMakeLists.txt # hand-written until self-hosted; lives at root for now ├── flake.nix # hand-written until self-hosted ├── flake.lock ├── README.md ├── SPEC.md ├── TECH_SPEC.md ├── AGENTS.md ├── LICENSE ├── data/ │ └── linkdb.json # curated link database (§9 in SPEC.md) ├── src/ │ ├── main.cpp # CLI entry point │ ├── lib.cppm # primary module, re-exports submodules │ ├── manifest/ │ │ ├── manifest.cppm │ │ ├── parser.cpp │ │ └── writer.cpp │ ├── lockfile/ │ │ ├── lockfile.cppm │ │ └── lockfile.cpp │ ├── layout/ │ │ ├── layout.cppm # source-tree discovery, target inference │ │ └── layout.cpp │ ├── codegen/ │ │ ├── codegen.cppm │ │ ├── flake.cpp # flake.nix generator │ │ └── cmake.cpp # CMakeLists.txt generator │ ├── linkdb/ │ │ ├── linkdb.cppm │ │ ├── curated.cpp # loads embedded JSON │ │ ├── overlay.cpp # user SQLite cache │ │ └── recipe.cpp │ ├── resolver/ │ │ ├── resolver.cppm │ │ ├── nixhub.cpp │ │ ├── lazamar.cpp │ │ └── nixpkgs_git.cpp # local git fallback │ ├── exec/ │ │ ├── exec.cppm │ │ └── subprocess.cpp # wrapping reproc │ ├── cli/ │ │ ├── cli.cppm │ │ ├── cmd_new.cpp │ │ ├── cmd_add.cpp │ │ ├── cmd_remove.cpp │ │ ├── cmd_build.cpp │ │ ├── cmd_run.cpp │ │ ├── cmd_test.cpp │ │ └── cmd_clean.cpp │ └── util/ │ ├── util.cppm │ ├── error.cpp # error type and formatting │ ├── log.cpp # spdlog wrapper │ └── semver.cpp # version range matching ├── tests/ │ ├── manifest_parse.cpp │ ├── layout_discovery.cpp │ ├── linkdb_lookup.cpp │ ├── codegen_flake.cpp │ ├── codegen_cmake.cpp │ ├── semver.cpp │ └── e2e/ # golden-file tests, see §13 │ ├── projects/ │ │ ├── hello/ │ │ ├── lib_only/ │ │ ├── multi_bin/ │ │ └── with_fmt/ │ └── runner.cpp ├── third_party/ # vendored single-header libs │ ├── toml.hpp # toml++ │ ├── json.hpp # nlohmann/json │ ├── httplib.h # cpp-httplib │ ├── CLI11.hpp │ └── spdlog/ # spdlog (header-only build) └── scripts/ ├── bootstrap-build.sh # one-shot build from a clean tree └── verify-curated-db.sh # checks every entry in data/linkdb.json ``` `third_party/` is vendored on purpose. In Phase 1 and more we avoid Nix for cargoxx's own dependencies because cargoxx is the thing being bootstrapped and we want a short, debuggable path from clean clone to working binary. `reproc` and `sqlite3` are NOT vendored — they come from Nix in the bootstrap `flake.nix`. They have C sources or build systems and aren't drop-in headers. --- ## 2. Module layout All cargoxx C++ sources are modules. The dependency graph between modules is: ``` cargoxx (lib.cppm, root module) ├── cargoxx.util ├── cargoxx.exec depends on: util ├── cargoxx.manifest depends on: util ├── cargoxx.lockfile depends on: util, manifest ├── cargoxx.linkdb depends on: util ├── cargoxx.resolver depends on: util, exec, linkdb ├── cargoxx.layout depends on: util ├── cargoxx.codegen depends on: util, manifest, linkdb, layout, lockfile └── cargoxx.cli depends on: everything above ``` `main.cpp` imports `cargoxx.cli` and dispatches on argv. No business logic in `main.cpp`. Each `.cppm` declares one module: `export module cargoxx.manifest;` etc. The root `lib.cppm` is `export module cargoxx;` and re-exports submodules selectively. --- ## 3. Core types Definitions below are normative for the public interface. Implementation details (constructors, helpers) are at the agent's discretion. ```cpp // in cargoxx.manifest export module cargoxx.manifest; import std; import cargoxx.util; export namespace cargoxx::manifest { struct Dependency { std::string name; std::string version_spec; // e.g. "10.2", "^1.0", "*" std::vector components; // empty if not a componentized package }; struct BuildSettings { bool warnings_as_errors = false; std::vector sanitizers; }; enum class Edition { Cpp20, Cpp23, Cpp26 }; struct Package { std::string name; std::string version; Edition edition = Edition::Cpp23; std::vector authors; std::optional license; }; struct Manifest { Package package; std::vector dependencies; BuildSettings build; }; auto parse(const std::filesystem::path& path) -> util::Result; auto write(const Manifest& m, const std::filesystem::path& path) -> util::Result; } ``` ```cpp // in cargoxx.layout export module cargoxx.layout; import std; export namespace cargoxx::layout { enum class TargetKind { Library, Binary, Test, Example }; struct Target { TargetKind kind; std::string name; std::filesystem::path entry; // primary source file std::vector additional_sources; std::vector module_units; // .cppm files }; struct DiscoveredLayout { std::optional library; // exactly 0 or 1 std::vector binaries; // 0..N std::vector tests; // 0..N std::vector examples; // 0..N }; auto discover(const std::filesystem::path& project_root, const std::string& package_name) -> util::Result; } ``` ```cpp // in cargoxx.linkdb export module cargoxx.linkdb; import std; import cargoxx.util; export namespace cargoxx::linkdb { struct Recipe { std::string nixpkgs_attr; std::string find_package; // raw CMake snippet, post-substitution std::vector targets;// post-substitution std::string source; // 'curated' | 'manual' | etc }; struct Database { static auto open() -> util::Result; auto resolve(const std::string& package, const std::string& version, const std::vector& components) -> util::Result; auto add_manual(const std::string& package, const std::string& version_range, const Recipe& r) -> util::Result; // private: holds sqlite handle + parsed curated JSON }; } ``` ```cpp // in cargoxx.codegen export module cargoxx.codegen; import std; import cargoxx.manifest; import cargoxx.layout; import cargoxx.linkdb; import cargoxx.lockfile; export namespace cargoxx::codegen { struct GenerateInputs { const manifest::Manifest& manifest; const layout::DiscoveredLayout& layout; const lockfile::Lockfile& lock; std::vector recipes; // one per dependency, same order std::filesystem::path project_root; }; auto flake_nix(const GenerateInputs& in) -> std::string; auto cmake_lists(const GenerateInputs& in) -> std::string; } ``` The two generator functions are pure: input → string. They do no I/O. The caller writes the result. --- ## 4. Error model — implementation ```cpp // in cargoxx.util export namespace cargoxx::util { enum class ErrorCode { // Manifest (E0001-E0019) ManifestNotFound = 1, ManifestParseError, ManifestInvalidField, ManifestUnknownField, // strict-parse mode only ManifestVersionInvalid, // Layout (E0020-E0039) LayoutNoTarget = 20, LayoutAmbiguousLib, LayoutInvalidName, // Resolution (E0040-E0059) ResolutionUnknownPackage = 40, ResolutionNetworkError, ResolutionUnsatisfiable, ResolutionVersionNotFound, // Linkdb (E0060-E0079) LinkdbUnknownPackage = 60, LinkdbCorrupt, LinkdbComponentNotSupported, // Build / exec (E0080-E0099) ExecCommandFailed = 80, ExecToolNotFound, BuildCmakeFailed, BuildNixFailed, // Internal (E0100+) Internal = 100, NotImplemented, }; struct Error { ErrorCode code; std::string message; std::string hint; std::optional location; std::optional> line_col; }; template class Result { // std::expected when available; otherwise tl::expected. // Public surface: has_value(), value(), error(). }; auto format(const Error& e) -> std::string; // produces SPEC.md §12 output } ``` We do not throw exceptions across module boundaries. `Result` is the only way to propagate failure. `throw` is permitted only inside a single `.cpp` file when the catch site is in the same file. --- ## 5. Subprocess discipline All external commands go through `cargoxx::exec::run`: ```cpp struct ExecResult { int exit_code; std::string stdout_text; std::string stderr_text; }; struct ExecOptions { std::filesystem::path cwd; std::vector> env_overrides; std::optional timeout; bool inherit_stdio = false; // for `cargoxx run` }; auto run(const std::string& program, const std::vector& args, const ExecOptions& opts = {}) -> Result; ``` Backed by `reproc`. Never use `system()`, `popen()`, or shell strings — argv only, no shell expansion. Every external invocation is logged at `debug` level with the full argv and the cwd. --- ## 6. Generators — testability Generators are pure functions over POD inputs. Tests assert exact string equality against golden files in `tests/e2e/projects//expected/`. To regenerate goldens during development: ``` CARGOXX_TEST_REGENERATE=1 ctest -R codegen ``` The test runner detects the env var, writes new goldens, and reports as a notice (not a pass). CI never sets this var. Whitespace and trailing newline are part of the contract. Generators emit `\n` line endings unconditionally. --- ## 7. Manifest parser — edge cases - Comments: `toml++` preserves them on round-trip if we round-trip via `toml::table`. `cargoxx add` MUST NOT strip the user's comments. Implementation: parse to `toml::table`, mutate, serialize. - Unknown top-level keys: warn but accept. Forward-compat (see SPEC.md §4 reserved fields). - Unknown keys inside `[package]`, `[build]`: error. - Dependency value is neither string nor table: error E0003. - Empty `[dependencies]`: valid. - `name` containing characters outside `[a-zA-Z0-9_-]`: error E0022. - `name` starting with digit: error. --- ## 8. Layout discovery — algorithm ``` discover(project_root, package_name): let lib = project_root / "src" / "lib.cppm" let main = project_root / "src" / "main.cpp" let bin_dir = project_root / "src" / "bin" let tests_dir = project_root / "tests" let examples_dir = project_root / "examples" # Collect library sources if lib.cppm exists library = None if exists(lib): all_cppm = [lib] all_cpp = [] for entry in walk(project_root / "src"): if entry == lib: continue if entry.parent == bin_dir: continue if entry == main: continue if entry.ext == ".cppm": all_cppm.push(entry) elif entry.ext == ".cpp": all_cpp.push(entry) library = Target { kind: Library, name: package_name, entry: lib, module_units: all_cppm, additional_sources: all_cpp, } binaries = [] if exists(main): binaries.push(Target { kind: Binary, name: package_name, entry: main, }) if exists(bin_dir): for f in list_dir(bin_dir): if f.ext == ".cpp": binaries.push(Target { kind: Binary, name: f.stem, entry: f, }) tests = [Target { kind: Test, name: f.stem, entry: f } for f in list_dir(tests_dir) if f.ext == ".cpp"] examples = [Target { kind: Example, name: f.stem, entry: f } for f in list_dir(examples_dir) if f.ext == ".cpp"] if library is None and binaries.empty(): return Err(LayoutNoTarget) return Ok(DiscoveredLayout { library, binaries, tests, examples }) ``` `walk` is non-recursive into `bin/`, `tests/`, `examples/` — those are flat folders. It IS recursive into other subdirectories of `src/` (e.g. `src/internal/foo.cppm` is part of the library). --- ## 9. CMake generator — algorithm Pseudocode: ``` cmake_lists(in): out = StringBuilder() out += header(in.manifest) # cmake_minimum_required, project, CXX flags # find_package per dependency, in manifest order for dep, recipe in zip(in.manifest.dependencies, in.recipes): out += emit_find_package(dep, recipe) # Library target if discovered if in.layout.library: out += emit_library(in.layout.library, in.recipes) # Primary binary (src/main.cpp) — links library if present primary_bin = first(in.layout.binaries, .entry endsWith "src/main.cpp") if primary_bin: out += emit_primary_binary(in.layout, primary_bin, in.recipes) # Additional binaries from src/bin/ for b in in.layout.binaries where b is not primary_bin: out += emit_extra_binary(b, in.layout, in.recipes) # Tests if in.layout.tests: out += "enable_testing()\n" for t in in.layout.tests: out += emit_test(t, in.layout, in.recipes) # Examples for e in in.layout.examples: out += emit_example(e, in.layout, in.recipes) # Build flags out += emit_build_flags(in.manifest.build, all_target_names(in.layout)) return out.str() ``` Each `emit_*` returns a string with a trailing blank line. The output is deterministic given identical inputs — no timestamps, no nondeterministic ordering, no machine-dependent paths. ### find_package emission For a recipe with no components: ``` find_package(<>) ``` For a recipe with components and the dep specifies them: ``` find_package(<>) ``` `{{components}}` expands to a space-separated list. `{{component}}` inside `targets` expands to one entry per requested component. Example for boost with `["filesystem", "system"]`: ``` find_package(Boost REQUIRED COMPONENTS filesystem system) ``` And targets become `Boost::filesystem` and `Boost::system`. --- ## 10. flake.nix generator — algorithm ``` flake_nix(in): nixpkgs_rev = in.lock.nixpkgs_rev # all deps share one rev deps_attrs = [recipe.nixpkgs_attr for recipe in in.recipes] deduped = stable_dedup(deps_attrs) return template_substitute(FLAKE_TEMPLATE, { description: in.manifest.package.name, nixpkgs_rev: nixpkgs_rev, dep_attrs: deduped, }) ``` `FLAKE_TEMPLATE` is a string constant. Substitution is plain text replacement of `<<...>>` markers, not a Nix-aware transform. --- ## 11. Version resolution — implementation ``` class Resolver { auto resolve(deps: vector) -> Result: # 1. For each dep, query candidate versions + revisions from nixhub candidates: map> for dep in deps: candidates[dep.name] = query(dep) # 2. Aggregate revisions all_revs = union of revs across candidates values sorted_revs = sort_descending_by_commit_date(all_revs) # 3. Try each rev (newest first), find one where every dep has a matching version for rev in sorted_revs[:50]: # cap plan = [] for dep in deps: m = candidates[dep.name].filter(_.rev == rev) .filter(satisfies(_.version, dep.version_spec)) .max_by(.version) if m is None: break plan.push((dep.name, m.version, rev)) if plan.size == deps.size: return Ok(ResolutionPlan { rev, entries: plan }) return Err(ResolutionUnsatisfiable) }; ``` Network calls are wrapped in a 10-second timeout each. nixhub.io failures fall through to lazamar; both failing falls through to `nixpkgs_git`. The local nixpkgs git clone is created lazily on first use. --- ## 12. Lockfile semantics The lockfile is rewritten in full on every successful `add` / `remove` / `build`. We do not attempt incremental edits. This simplifies the writer and avoids drift between manifest and lock. `Cargoxx.lock` is read at the start of `build`: - If absent → run resolution, then write. - If present → check that every manifest dep has a satisfying entry. If yes → use as-is. If no → run resolution. `cargoxx update` (deferred to v0.2) will force re-resolution. --- ## 13. Testing strategy Three layers. ### Unit tests One `tests/.cpp` per module. Test pure functions: parser, semver, codegen helpers. Catch2 with `TEST_CASE`. No I/O outside of `tmp_path`. ### Golden-file tests Each subdirectory in `tests/e2e/projects/` contains: ``` / ├── input/ │ ├── Cargoxx.toml │ └── src/... ├── expected/ │ ├── flake.nix │ ├── CMakeLists.txt │ └── Cargoxx.lock └── meta.toml # describes the test (e.g. "fixed_rev = abc123" to make output deterministic) ``` The runner copies `input/` into a temp dir, runs cargoxx `build --no-build` with a stubbed resolver (returning `meta.toml`'s pinned rev), and diffs every generated file against `expected/`. ### End-to-end build tests A small set of projects that actually compile via Nix. Marked `slow`, skipped on dev machines without `nix` in PATH. Run in CI with cached `/nix/store`. ### Curated DB verification `scripts/verify-curated-db.sh` constructs a tiny project per package, runs `cargoxx build`, and verifies the binary links. Run on every PR that touches `data/linkdb.json`. This is how we catch upstream Nixpkgs changes that rename attrs. --- ## 14. Logging `spdlog` initialized at `info` level by default. `--verbose` raises to `debug`, `--quiet` to `warn`. Format: ``` [] : ``` Components are module names (`manifest`, `resolver`, `codegen`, …). Every external command (subprocess, HTTP) is logged at `debug` with the full argv / URL. User-facing errors are formatted via `util::format(Error)` and printed to stderr without log decoration. Diagnostic logs go through spdlog. --- ## 15. Bootstrap and self-hosting Three phases. **Phase 0 — hand-written CMake (commits before milestone M3).** `CMakeLists.txt` and `flake.nix` at the repo root are written by humans. `cargoxx` builds `cargoxx`. **Phase 1 — generated CMake, hand-written flake.** At milestone M3 (codegen complete), commit a populated `Cargoxx.toml`. Generate `build/CMakeLists.txt` from it. Delete the root `CMakeLists.txt`. The root `flake.nix` stays hand-written because cargoxx doesn't know about its own host-language deps yet. **Phase 2 — fully self-hosted.** At milestone M5, vendor all third-party headers into `third_party/` and have cargoxx generate the flake too. The bootstrap path becomes: pre-built cargoxx binary → run `cargoxx build` → produce next cargoxx. For continuity, ship a known-good cargoxx binary as a release artifact. Anyone bootstrapping from source clones the repo, downloads the latest release binary, and runs `./bootstrap-cargoxx build`. If we ever want to bootstrap from absolute zero, `scripts/bootstrap-build.sh` does it with a hand-written CMake invocation. --- ## 16. Milestones Each milestone is one mergeable PR series. No milestone is "done" until its tests pass in CI. **M0 — repo skeleton.** Empty modules, CMakeLists.txt that builds an empty `cargoxx` binary, flake.nix with toolchain. CI green. **M1 — manifest + layout.** `manifest::parse`, `manifest::write`, `layout::discover`. Unit tests. `cargoxx new` works (writes `Cargoxx.toml` and source skeleton, no codegen yet). **M2 — linkdb + curated.** `linkdb.json` with all 25 packages. `linkdb::Database::resolve` works for curated entries. SQLite overlay schema created on first run. **M3 — codegen.** `codegen::flake_nix` and `codegen::cmake_lists`. Golden tests for 4-6 representative projects. `cargoxx build --no-build` produces correct files. **M4 — exec + build.** `exec::run`. `cargoxx build` invokes nix and cmake end-to-end. `cargoxx run`, `cargoxx test`, `cargoxx clean`. **M5 — resolver + add/remove.** `resolver::Resolver` against nixhub.io. `cargoxx add fmt` works. Lockfile updates correctly. **M6 — polish.** Error message overhaul to match SPEC.md §12. `--verbose` / `--quiet`. Self-hosting (Phase 1). Post-v0.1 (out of this spec): automatic linkdb resolution, workspaces, `cargoxx publish`, Windows. --- ## 17. Coding conventions - C++23 modules, no headers in `src/` (third_party/ excepted). - Names: `snake_case` for functions and variables, `PascalCase` for types, `SCREAMING_SNAKE_CASE` for constants and enum values (`ErrorCode::ManifestNotFound`). - One module per directory. `foo/foo.cppm` exports `cargoxx.foo`. - No raw `new`/`delete`. Smart pointers or value types. - `std::filesystem::path` for paths everywhere. Strings only at the very edge (CLI parsing, JSON). - No global mutable state. Configuration is passed explicitly. - Format with `clang-format` using the config in `.clang-format` (LLVM style, 100-column). - Lints: `-Wall -Wextra -Wpedantic -Wconversion`. CI fails on warnings. --- ## 18. Performance budget cargoxx is interactive. Targets: | Operation | Budget | | --- | --- | | `cargoxx new` | < 100 ms | | `cargoxx add fmt` (cached resolution) | < 200 ms | | `cargoxx add fmt` (network resolution) | < 5 s | | `cargoxx build` (no codegen change) | < 50 ms before invoking cmake | | Codegen for a 50-source project | < 100 ms | Profile if budgets are exceeded. SQLite I/O is by far the most likely culprit — open the connection once per process, reuse prepared statements. --- ## 19. Open questions for the implementation phase These are decisions that should be made by the implementer with the user's input. Don't silently pick one — surface the choice when you reach the relevant milestone. 1. Should `cargoxx run` preserve the user's PATH or only inject Nix's? (Recommend: only Nix's, for reproducibility.) 2. Should generated `flake.nix` pin `flake-utils` or inline the function? (Recommend: pin, smaller diffs on regeneration.) 3. When the layout has both `lib.cppm` and `main.cpp`, should the binary always link the library even if it doesn't `import` it? (Recommend: yes, it's harmless and matches Cargo.) 4. Should `Cargoxx.lock` include a hash of `Cargoxx.toml` to detect tampering? (Recommend: no for v0.1, complicates the format.) 5. macOS uses `clang_18` from Nix; Linux too. Is there any reason to prefer GCC on Linux? (Recommend: no, Clang has the most mature module support.)