24 KiB
cargoxx — technical specification
Companion to SPEC.md. Where SPEC.md defines what cargoxx does, this document defines how it is built. It is the contract between the project and the implementation.
This is the v0.1 design. It is intentionally conservative and rigid — the goal is a working, debuggable tool, not an extensible platform.
1. Source tree
The cargoxx repository itself follows the layout cargoxx will eventually generate. This is deliberate: the moment cargoxx can build a non-trivial C++ project, we switch its own build to use itself (see §15, bootstrap).
cargoxx/
├── Cargoxx.toml # populated once we self-host (§15)
├── CMakeLists.txt # hand-written until self-hosted; lives at root for now
├── flake.nix # hand-written until self-hosted
├── flake.lock
├── README.md
├── SPEC.md
├── TECH_SPEC.md
├── AGENTS.md
├── LICENSE
├── data/
│ └── linkdb.json # curated link database (§9 in SPEC.md)
├── src/
│ ├── main.cpp # CLI entry point
│ ├── lib.cppm # primary module, re-exports submodules
│ ├── manifest/
│ │ ├── manifest.cppm
│ │ ├── parser.cpp
│ │ └── writer.cpp
│ ├── lockfile/
│ │ ├── lockfile.cppm
│ │ └── lockfile.cpp
│ ├── layout/
│ │ ├── layout.cppm # source-tree discovery, target inference
│ │ └── layout.cpp
│ ├── codegen/
│ │ ├── codegen.cppm
│ │ ├── flake.cpp # flake.nix generator
│ │ └── cmake.cpp # CMakeLists.txt generator
│ ├── linkdb/
│ │ ├── linkdb.cppm
│ │ ├── curated.cpp # loads embedded JSON
│ │ ├── overlay.cpp # user SQLite cache
│ │ └── recipe.cpp
│ ├── resolver/
│ │ ├── resolver.cppm
│ │ ├── nixhub.cpp
│ │ ├── lazamar.cpp
│ │ └── nixpkgs_git.cpp # local git fallback
│ ├── exec/
│ │ ├── exec.cppm
│ │ └── subprocess.cpp # wrapping reproc
│ ├── cli/
│ │ ├── cli.cppm
│ │ ├── cmd_new.cpp
│ │ ├── cmd_add.cpp
│ │ ├── cmd_remove.cpp
│ │ ├── cmd_build.cpp
│ │ ├── cmd_run.cpp
│ │ ├── cmd_test.cpp
│ │ └── cmd_clean.cpp
│ └── util/
│ ├── util.cppm
│ ├── error.cpp # error type and formatting
│ ├── log.cpp # spdlog wrapper
│ └── semver.cpp # version range matching
├── tests/
│ ├── manifest_parse.cpp
│ ├── layout_discovery.cpp
│ ├── linkdb_lookup.cpp
│ ├── codegen_flake.cpp
│ ├── codegen_cmake.cpp
│ ├── semver.cpp
│ └── e2e/ # golden-file tests, see §13
│ ├── projects/
│ │ ├── hello/
│ │ ├── lib_only/
│ │ ├── multi_bin/
│ │ └── with_fmt/
│ └── runner.cpp
├── third_party/ # vendored single-header libs
│ ├── toml.hpp # toml++
│ ├── json.hpp # nlohmann/json
│ ├── httplib.h # cpp-httplib
│ ├── CLI11.hpp
│ └── spdlog/ # spdlog (header-only build)
└── scripts/
├── bootstrap-build.sh # one-shot build from a clean tree
└── verify-curated-db.sh # checks every entry in data/linkdb.json
third_party/ is vendored on purpose. In Phase 1 and more we avoid Nix for cargoxx's own dependencies because cargoxx is the thing being bootstrapped and we want a short, debuggable path from clean clone to working binary.
reproc and sqlite3 are NOT vendored — they come from Nix in the bootstrap flake.nix. They have C sources or build systems and aren't drop-in headers.
2. Module layout
All cargoxx C++ sources are modules. The dependency graph between modules is:
cargoxx (lib.cppm, root module)
├── cargoxx.util
├── cargoxx.exec depends on: util
├── cargoxx.manifest depends on: util
├── cargoxx.lockfile depends on: util, manifest
├── cargoxx.linkdb depends on: util
├── cargoxx.resolver depends on: util, exec, linkdb
├── cargoxx.layout depends on: util
├── cargoxx.codegen depends on: util, manifest, linkdb, layout, lockfile
└── cargoxx.cli depends on: everything above
main.cpp imports cargoxx.cli and dispatches on argv. No business logic in main.cpp.
Each .cppm declares one module: export module cargoxx.manifest; etc. The root lib.cppm is export module cargoxx; and re-exports submodules selectively.
3. Core types
Definitions below are normative for the public interface. Implementation details (constructors, helpers) are at the agent's discretion.
// in cargoxx.manifest
export module cargoxx.manifest;
import std;
import cargoxx.util;
export namespace cargoxx::manifest {
struct Dependency {
std::string name;
std::string version_spec; // e.g. "10.2", "^1.0", "*"
std::vector<std::string> components; // empty if not a componentized package
};
struct BuildSettings {
bool warnings_as_errors = false;
std::vector<std::string> sanitizers;
};
enum class Edition { Cpp20, Cpp23, Cpp26 };
struct Package {
std::string name;
std::string version;
Edition edition = Edition::Cpp23;
std::vector<std::string> authors;
std::optional<std::string> license;
};
struct Manifest {
Package package;
std::vector<Dependency> dependencies;
BuildSettings build;
};
auto parse(const std::filesystem::path& path) -> util::Result<Manifest>;
auto write(const Manifest& m, const std::filesystem::path& path) -> util::Result<void>;
}
// in cargoxx.layout
export module cargoxx.layout;
import std;
export namespace cargoxx::layout {
enum class TargetKind { Library, Binary, Test, Example };
struct Target {
TargetKind kind;
std::string name;
std::filesystem::path entry; // primary source file
std::vector<std::filesystem::path> additional_sources;
std::vector<std::filesystem::path> module_units; // .cppm files
};
struct DiscoveredLayout {
std::optional<Target> library; // exactly 0 or 1
std::vector<Target> binaries; // 0..N
std::vector<Target> tests; // 0..N
std::vector<Target> examples; // 0..N
};
auto discover(const std::filesystem::path& project_root,
const std::string& package_name)
-> util::Result<DiscoveredLayout>;
}
// in cargoxx.linkdb
export module cargoxx.linkdb;
import std;
import cargoxx.util;
export namespace cargoxx::linkdb {
struct Recipe {
std::string nixpkgs_attr;
std::string find_package; // raw CMake snippet, post-substitution
std::vector<std::string> targets;// post-substitution
std::string source; // 'curated' | 'manual' | etc
};
struct Database {
static auto open() -> util::Result<Database>;
auto resolve(const std::string& package,
const std::string& version,
const std::vector<std::string>& components)
-> util::Result<Recipe>;
auto add_manual(const std::string& package,
const std::string& version_range,
const Recipe& r) -> util::Result<void>;
// private: holds sqlite handle + parsed curated JSON
};
}
// in cargoxx.codegen
export module cargoxx.codegen;
import std;
import cargoxx.manifest;
import cargoxx.layout;
import cargoxx.linkdb;
import cargoxx.lockfile;
export namespace cargoxx::codegen {
struct GenerateInputs {
const manifest::Manifest& manifest;
const layout::DiscoveredLayout& layout;
const lockfile::Lockfile& lock;
std::vector<linkdb::Recipe> recipes; // one per dependency, same order
std::filesystem::path project_root;
};
auto flake_nix(const GenerateInputs& in) -> std::string;
auto cmake_lists(const GenerateInputs& in) -> std::string;
}
The two generator functions are pure: input → string. They do no I/O. The caller writes the result.
4. Error model — implementation
// in cargoxx.util
export namespace cargoxx::util {
enum class ErrorCode {
// Manifest (E0001-E0019)
ManifestNotFound = 1,
ManifestParseError,
ManifestInvalidField,
ManifestUnknownField, // strict-parse mode only
ManifestVersionInvalid,
// Layout (E0020-E0039)
LayoutNoTarget = 20,
LayoutAmbiguousLib,
LayoutInvalidName,
// Resolution (E0040-E0059)
ResolutionUnknownPackage = 40,
ResolutionNetworkError,
ResolutionUnsatisfiable,
ResolutionVersionNotFound,
// Linkdb (E0060-E0079)
LinkdbUnknownPackage = 60,
LinkdbCorrupt,
LinkdbComponentNotSupported,
// Build / exec (E0080-E0099)
ExecCommandFailed = 80,
ExecToolNotFound,
BuildCmakeFailed,
BuildNixFailed,
// Internal (E0100+)
Internal = 100,
NotImplemented,
};
struct Error {
ErrorCode code;
std::string message;
std::string hint;
std::optional<std::filesystem::path> location;
std::optional<std::pair<int, int>> line_col;
};
template <typename T>
class Result {
// std::expected<T, Error> when available; otherwise tl::expected.
// Public surface: has_value(), value(), error().
};
auto format(const Error& e) -> std::string; // produces SPEC.md §12 output
}
We do not throw exceptions across module boundaries. Result<T> is the only way to propagate failure. throw is permitted only inside a single .cpp file when the catch site is in the same file.
5. Subprocess discipline
All external commands go through cargoxx::exec::run:
struct ExecResult {
int exit_code;
std::string stdout_text;
std::string stderr_text;
};
struct ExecOptions {
std::filesystem::path cwd;
std::vector<std::pair<std::string, std::string>> env_overrides;
std::optional<std::chrono::seconds> timeout;
bool inherit_stdio = false; // for `cargoxx run`
};
auto run(const std::string& program,
const std::vector<std::string>& args,
const ExecOptions& opts = {}) -> Result<ExecResult>;
Backed by reproc. Never use system(), popen(), or shell strings — argv only, no shell expansion. Every external invocation is logged at debug level with the full argv and the cwd.
6. Generators — testability
Generators are pure functions over POD inputs. Tests assert exact string equality against golden files in tests/e2e/projects/<name>/expected/.
To regenerate goldens during development:
CARGOXX_TEST_REGENERATE=1 ctest -R codegen
The test runner detects the env var, writes new goldens, and reports as a notice (not a pass). CI never sets this var.
Whitespace and trailing newline are part of the contract. Generators emit \n line endings unconditionally.
7. Manifest parser — edge cases
- Comments:
toml++preserves them on round-trip if we round-trip viatoml::table.cargoxx addMUST NOT strip the user's comments. Implementation: parse totoml::table, mutate, serialize. - Unknown top-level keys: warn but accept. Forward-compat (see SPEC.md §4 reserved fields).
- Unknown keys inside
[package],[build]: error. - Dependency value is neither string nor table: error E0003.
- Empty
[dependencies]: valid. namecontaining characters outside[a-zA-Z0-9_-]: error E0022.namestarting with digit: error.
8. Layout discovery — algorithm
discover(project_root, package_name):
let lib = project_root / "src" / "lib.cppm"
let main = project_root / "src" / "main.cpp"
let bin_dir = project_root / "src" / "bin"
let tests_dir = project_root / "tests"
let examples_dir = project_root / "examples"
# Collect library sources if lib.cppm exists
library = None
if exists(lib):
all_cppm = [lib]
all_cpp = []
for entry in walk(project_root / "src"):
if entry == lib: continue
if entry.parent == bin_dir: continue
if entry == main: continue
if entry.ext == ".cppm": all_cppm.push(entry)
elif entry.ext == ".cpp": all_cpp.push(entry)
library = Target {
kind: Library,
name: package_name,
entry: lib,
module_units: all_cppm,
additional_sources: all_cpp,
}
binaries = []
if exists(main):
binaries.push(Target {
kind: Binary,
name: package_name,
entry: main,
})
if exists(bin_dir):
for f in list_dir(bin_dir):
if f.ext == ".cpp":
binaries.push(Target {
kind: Binary,
name: f.stem,
entry: f,
})
tests = [Target { kind: Test, name: f.stem, entry: f }
for f in list_dir(tests_dir) if f.ext == ".cpp"]
examples = [Target { kind: Example, name: f.stem, entry: f }
for f in list_dir(examples_dir) if f.ext == ".cpp"]
if library is None and binaries.empty():
return Err(LayoutNoTarget)
return Ok(DiscoveredLayout { library, binaries, tests, examples })
walk is non-recursive into bin/, tests/, examples/ — those are flat folders. It IS recursive into other subdirectories of src/ (e.g. src/internal/foo.cppm is part of the library).
9. CMake generator — algorithm
Pseudocode:
cmake_lists(in):
out = StringBuilder()
out += header(in.manifest) # cmake_minimum_required, project, CXX flags
# find_package per dependency, in manifest order
for dep, recipe in zip(in.manifest.dependencies, in.recipes):
out += emit_find_package(dep, recipe)
# Library target if discovered
if in.layout.library:
out += emit_library(in.layout.library, in.recipes)
# Primary binary (src/main.cpp) — links library if present
primary_bin = first(in.layout.binaries, .entry endsWith "src/main.cpp")
if primary_bin:
out += emit_primary_binary(in.layout, primary_bin, in.recipes)
# Additional binaries from src/bin/
for b in in.layout.binaries where b is not primary_bin:
out += emit_extra_binary(b, in.layout, in.recipes)
# Tests
if in.layout.tests:
out += "enable_testing()\n"
for t in in.layout.tests:
out += emit_test(t, in.layout, in.recipes)
# Examples
for e in in.layout.examples:
out += emit_example(e, in.layout, in.recipes)
# Build flags
out += emit_build_flags(in.manifest.build, all_target_names(in.layout))
return out.str()
Each emit_* returns a string with a trailing blank line. The output is deterministic given identical inputs — no timestamps, no nondeterministic ordering, no machine-dependent paths.
find_package emission
For a recipe with no components:
find_package(<<recipe.find_package>>)
For a recipe with components and the dep specifies them:
find_package(<<find_package with {{components}} replaced by COMPONENTS list>>)
{{components}} expands to a space-separated list. {{component}} inside targets expands to one entry per requested component. Example for boost with ["filesystem", "system"]:
find_package(Boost REQUIRED COMPONENTS filesystem system)
And targets become Boost::filesystem and Boost::system.
10. flake.nix generator — algorithm
flake_nix(in):
nixpkgs_rev = in.lock.nixpkgs_rev # all deps share one rev
deps_attrs = [recipe.nixpkgs_attr for recipe in in.recipes]
deduped = stable_dedup(deps_attrs)
return template_substitute(FLAKE_TEMPLATE, {
description: in.manifest.package.name,
nixpkgs_rev: nixpkgs_rev,
dep_attrs: deduped,
})
FLAKE_TEMPLATE is a string constant. Substitution is plain text replacement of <<...>> markers, not a Nix-aware transform.
11. Version resolution — implementation
class Resolver {
auto resolve(deps: vector<Dependency>) -> Result<ResolutionPlan>:
# 1. For each dep, query candidate versions + revisions from nixhub
candidates: map<string, vector<(version, rev)>>
for dep in deps:
candidates[dep.name] = query(dep)
# 2. Aggregate revisions
all_revs = union of revs across candidates values
sorted_revs = sort_descending_by_commit_date(all_revs)
# 3. Try each rev (newest first), find one where every dep has a matching version
for rev in sorted_revs[:50]: # cap
plan = []
for dep in deps:
m = candidates[dep.name].filter(_.rev == rev)
.filter(satisfies(_.version, dep.version_spec))
.max_by(.version)
if m is None: break
plan.push((dep.name, m.version, rev))
if plan.size == deps.size:
return Ok(ResolutionPlan { rev, entries: plan })
return Err(ResolutionUnsatisfiable)
};
Network calls are wrapped in a 10-second timeout each. nixhub.io failures fall through to lazamar; both failing falls through to nixpkgs_git. The local nixpkgs git clone is created lazily on first use.
12. Lockfile semantics
The lockfile is rewritten in full on every successful add / remove / build. We do not attempt incremental edits. This simplifies the writer and avoids drift between manifest and lock.
Cargoxx.lock is read at the start of build:
- If absent → run resolution, then write.
- If present → check that every manifest dep has a satisfying entry. If yes → use as-is. If no → run resolution.
cargoxx update (deferred to v0.2) will force re-resolution.
13. Testing strategy
Three layers.
Unit tests
One tests/<feature>.cpp per module. Test pure functions: parser, semver, codegen helpers. Catch2 with TEST_CASE. No I/O outside of tmp_path.
Golden-file tests
Each subdirectory in tests/e2e/projects/ contains:
<project>/
├── input/
│ ├── Cargoxx.toml
│ └── src/...
├── expected/
│ ├── flake.nix
│ ├── CMakeLists.txt
│ └── Cargoxx.lock
└── meta.toml # describes the test (e.g. "fixed_rev = abc123" to make output deterministic)
The runner copies input/ into a temp dir, runs cargoxx build --no-build with a stubbed resolver (returning meta.toml's pinned rev), and diffs every generated file against expected/.
End-to-end build tests
A small set of projects that actually compile via Nix. Marked slow, skipped on dev machines without nix in PATH. Run in CI with cached /nix/store.
Curated DB verification
scripts/verify-curated-db.sh constructs a tiny project per package, runs cargoxx build, and verifies the binary links. Run on every PR that touches data/linkdb.json. This is how we catch upstream Nixpkgs changes that rename attrs.
14. Logging
spdlog initialized at info level by default. --verbose raises to debug, --quiet to warn.
Format:
[<level>] <component>: <message>
Components are module names (manifest, resolver, codegen, …). Every external command (subprocess, HTTP) is logged at debug with the full argv / URL.
User-facing errors are formatted via util::format(Error) and printed to stderr without log decoration. Diagnostic logs go through spdlog.
15. Bootstrap and self-hosting
Phase 0 (historical) — hand-written CMakeLists.txt and flake.nix at the repo root.
Phase 2 (current, since M6) — fully self-hosted.
Cargoxx.toml describes cargoxx's own deps with nixpkgs names: sqlite, reproc, catch2_3. cargoxx build runs the auto-resolver chain (nixpkgs probe → realize → nix_cmake_scan → pc_scan), confirms each recipe via verify_link, and generates build/CMakeLists.txt and the root flake.nix. Both files are committed (tracked) so the build is reproducible without first building cargoxx, and [build].include_dirs = ["third_party"] keeps the vendored headers on the include path.
Bootstrap path:
pre-built cargoxx → cargoxx build → next cargoxx
A clean clone with an empty ~/.cache/cargoxx/linkdb.sqlite auto-resolves all three deps on first cargoxx build (sqlite goes through pkg-config because nixpkgs ships no SQLite3Config.cmake; reproc/catch2_3 go through nix_cmake_scan). For continuity, a known-good cargoxx binary is shipped as a release artifact; from-scratch bootstrap is not in v0.1 scope.
16. Milestones
Each milestone is one mergeable PR series. No milestone is "done" until its tests pass in CI.
M0 — repo skeleton. Empty modules, CMakeLists.txt that builds an empty cargoxx binary, flake.nix with toolchain. CI green.
M1 — manifest + layout. manifest::parse, manifest::write, layout::discover. Unit tests. cargoxx new works (writes Cargoxx.toml and source skeleton, no codegen yet).
M2 — linkdb + curated. linkdb.json with all 25 packages. linkdb::Database::resolve works for curated entries. SQLite overlay schema created on first run.
M3 — codegen. codegen::flake_nix and codegen::cmake_lists. Golden tests for 4-6 representative projects. cargoxx build --no-build produces correct files.
M4 — exec + build. exec::run. cargoxx build invokes nix and cmake end-to-end. cargoxx run, cargoxx test, cargoxx clean.
M5 — resolver + add/remove. resolver::Resolver against nixhub.io. cargoxx add fmt works. Lockfile updates correctly.
M6 — polish. Error message overhaul to match SPEC.md §12. --verbose / --quiet. Self-hosting (Phase 1).
Post-v0.1 (out of this spec): automatic linkdb resolution, workspaces, cargoxx publish, Windows.
17. Coding conventions
- C++23 modules, no headers in
src/(third_party/ excepted). - Names:
snake_casefor functions and variables,PascalCasefor types,SCREAMING_SNAKE_CASEfor constants and enum values (ErrorCode::ManifestNotFound). - One module per directory.
foo/foo.cppmexportscargoxx.foo. - No raw
new/delete. Smart pointers or value types. std::filesystem::pathfor paths everywhere. Strings only at the very edge (CLI parsing, JSON).- No global mutable state. Configuration is passed explicitly.
- Format with
clang-formatusing the config in.clang-format(LLVM style, 100-column). - Lints:
-Wall -Wextra -Wpedantic -Wconversion. CI fails on warnings.
18. Performance budget
cargoxx is interactive. Targets:
| Operation | Budget |
|---|---|
cargoxx new |
< 100 ms |
cargoxx add fmt (cached resolution) |
< 200 ms |
cargoxx add fmt (network resolution) |
< 5 s |
cargoxx build (no codegen change) |
< 50 ms before invoking cmake |
| Codegen for a 50-source project | < 100 ms |
Profile if budgets are exceeded. SQLite I/O is by far the most likely culprit — open the connection once per process, reuse prepared statements.
19. Open questions for the implementation phase
These are decisions that should be made by the implementer with the user's input. Don't silently pick one — surface the choice when you reach the relevant milestone.
- Should
cargoxx runpreserve the user's PATH or only inject Nix's? (Recommend: only Nix's, for reproducibility.) - Should generated
flake.nixpinflake-utilsor inline the function? (Recommend: pin, smaller diffs on regeneration.) - When the layout has both
lib.cppmandmain.cpp, should the binary always link the library even if it doesn'timportit? (Recommend: yes, it's harmless and matches Cargo.) - Should
Cargoxx.lockinclude a hash ofCargoxx.tomlto detect tampering? (Recommend: no for v0.1, complicates the format.) - macOS uses
clang_18from Nix; Linux too. Is there any reason to prefer GCC on Linux? (Recommend: no, Clang has the most mature module support.)