Files
cargoxx/TECH_SPEC.md
2026-05-07 23:32:46 +00:00

24 KiB

cargoxx — technical specification

Companion to SPEC.md. Where SPEC.md defines what cargoxx does, this document defines how it is built. It is the contract between the project and the implementation.

This is the v0.1 design. It is intentionally conservative and rigid — the goal is a working, debuggable tool, not an extensible platform.


1. Source tree

The cargoxx repository itself follows the layout cargoxx will eventually generate. This is deliberate: the moment cargoxx can build a non-trivial C++ project, we switch its own build to use itself (see §15, bootstrap).

cargoxx/
├── Cargoxx.toml                # populated once we self-host (§15)
├── CMakeLists.txt              # hand-written until self-hosted; lives at root for now
├── flake.nix                   # hand-written until self-hosted
├── flake.lock
├── README.md
├── SPEC.md
├── TECH_SPEC.md
├── AGENTS.md
├── LICENSE
├── data/
│   └── linkdb.json             # curated link database (§9 in SPEC.md)
├── src/
│   ├── main.cpp                # CLI entry point
│   ├── lib.cppm                # primary module, re-exports submodules
│   ├── manifest/
│   │   ├── manifest.cppm
│   │   ├── parser.cpp
│   │   └── writer.cpp
│   ├── lockfile/
│   │   ├── lockfile.cppm
│   │   └── lockfile.cpp
│   ├── layout/
│   │   ├── layout.cppm         # source-tree discovery, target inference
│   │   └── layout.cpp
│   ├── codegen/
│   │   ├── codegen.cppm
│   │   ├── flake.cpp           # flake.nix generator
│   │   └── cmake.cpp           # CMakeLists.txt generator
│   ├── linkdb/
│   │   ├── linkdb.cppm
│   │   ├── curated.cpp         # loads embedded JSON
│   │   ├── overlay.cpp         # user SQLite cache
│   │   └── recipe.cpp
│   ├── resolver/
│   │   ├── resolver.cppm
│   │   ├── nixhub.cpp
│   │   ├── lazamar.cpp
│   │   └── nixpkgs_git.cpp     # local git fallback
│   ├── exec/
│   │   ├── exec.cppm
│   │   └── subprocess.cpp      # wrapping reproc
│   ├── cli/
│   │   ├── cli.cppm
│   │   ├── cmd_new.cpp
│   │   ├── cmd_add.cpp
│   │   ├── cmd_remove.cpp
│   │   ├── cmd_build.cpp
│   │   ├── cmd_run.cpp
│   │   ├── cmd_test.cpp
│   │   └── cmd_clean.cpp
│   └── util/
│       ├── util.cppm
│       ├── error.cpp           # error type and formatting
│       ├── log.cpp             # spdlog wrapper
│       └── semver.cpp          # version range matching
├── tests/
│   ├── manifest_parse.cpp
│   ├── layout_discovery.cpp
│   ├── linkdb_lookup.cpp
│   ├── codegen_flake.cpp
│   ├── codegen_cmake.cpp
│   ├── semver.cpp
│   └── e2e/                    # golden-file tests, see §13
│       ├── projects/
│       │   ├── hello/
│       │   ├── lib_only/
│       │   ├── multi_bin/
│       │   └── with_fmt/
│       └── runner.cpp
├── third_party/                # vendored single-header libs
│   ├── toml.hpp                # toml++
│   ├── json.hpp                # nlohmann/json
│   ├── httplib.h               # cpp-httplib
│   ├── CLI11.hpp
│   └── spdlog/                 # spdlog (header-only build)
└── scripts/
    ├── bootstrap-build.sh      # one-shot build from a clean tree
    └── verify-curated-db.sh    # checks every entry in data/linkdb.json

third_party/ is vendored on purpose. In Phase 1 and more we avoid Nix for cargoxx's own dependencies because cargoxx is the thing being bootstrapped and we want a short, debuggable path from clean clone to working binary.

reproc and sqlite3 are NOT vendored — they come from Nix in the bootstrap flake.nix. They have C sources or build systems and aren't drop-in headers.


2. Module layout

All cargoxx C++ sources are modules. The dependency graph between modules is:

cargoxx (lib.cppm, root module)
├── cargoxx.util
├── cargoxx.exec        depends on: util
├── cargoxx.manifest    depends on: util
├── cargoxx.lockfile    depends on: util, manifest
├── cargoxx.linkdb      depends on: util
├── cargoxx.resolver    depends on: util, exec, linkdb
├── cargoxx.layout      depends on: util
├── cargoxx.codegen     depends on: util, manifest, linkdb, layout, lockfile
└── cargoxx.cli         depends on: everything above

main.cpp imports cargoxx.cli and dispatches on argv. No business logic in main.cpp.

Each .cppm declares one module: export module cargoxx.manifest; etc. The root lib.cppm is export module cargoxx; and re-exports submodules selectively.


3. Core types

Definitions below are normative for the public interface. Implementation details (constructors, helpers) are at the agent's discretion.

// in cargoxx.manifest
export module cargoxx.manifest;

import std;
import cargoxx.util;

export namespace cargoxx::manifest {

struct Dependency {
    std::string name;
    std::string version_spec;            // e.g. "10.2", "^1.0", "*"
    std::vector<std::string> components; // empty if not a componentized package
};

struct BuildSettings {
    bool warnings_as_errors = false;
    std::vector<std::string> sanitizers;
};

enum class Edition { Cpp20, Cpp23, Cpp26 };

struct Package {
    std::string name;
    std::string version;
    Edition edition = Edition::Cpp23;
    std::vector<std::string> authors;
    std::optional<std::string> license;
};

struct Manifest {
    Package package;
    std::vector<Dependency> dependencies;
    BuildSettings build;
};

auto parse(const std::filesystem::path& path) -> util::Result<Manifest>;
auto write(const Manifest& m, const std::filesystem::path& path) -> util::Result<void>;

}
// in cargoxx.layout
export module cargoxx.layout;

import std;

export namespace cargoxx::layout {

enum class TargetKind { Library, Binary, Test, Example };

struct Target {
    TargetKind kind;
    std::string name;
    std::filesystem::path entry;             // primary source file
    std::vector<std::filesystem::path> additional_sources;
    std::vector<std::filesystem::path> module_units;  // .cppm files
};

struct DiscoveredLayout {
    std::optional<Target> library;           // exactly 0 or 1
    std::vector<Target> binaries;            // 0..N
    std::vector<Target> tests;               // 0..N
    std::vector<Target> examples;            // 0..N
};

auto discover(const std::filesystem::path& project_root,
              const std::string& package_name)
    -> util::Result<DiscoveredLayout>;

}
// in cargoxx.linkdb
export module cargoxx.linkdb;

import std;
import cargoxx.util;

export namespace cargoxx::linkdb {

struct Recipe {
    std::string nixpkgs_attr;
    std::string find_package;        // raw CMake snippet, post-substitution
    std::vector<std::string> targets;// post-substitution
    std::string source;              // 'curated' | 'manual' | etc
};

struct Database {
    static auto open() -> util::Result<Database>;

    auto resolve(const std::string& package,
                 const std::string& version,
                 const std::vector<std::string>& components)
        -> util::Result<Recipe>;

    auto add_manual(const std::string& package,
                    const std::string& version_range,
                    const Recipe& r) -> util::Result<void>;

    // private: holds sqlite handle + parsed curated JSON
};

}
// in cargoxx.codegen
export module cargoxx.codegen;

import std;
import cargoxx.manifest;
import cargoxx.layout;
import cargoxx.linkdb;
import cargoxx.lockfile;

export namespace cargoxx::codegen {

struct GenerateInputs {
    const manifest::Manifest& manifest;
    const layout::DiscoveredLayout& layout;
    const lockfile::Lockfile& lock;
    std::vector<linkdb::Recipe> recipes;     // one per dependency, same order
    std::filesystem::path project_root;
};

auto flake_nix(const GenerateInputs& in) -> std::string;
auto cmake_lists(const GenerateInputs& in) -> std::string;

}

The two generator functions are pure: input → string. They do no I/O. The caller writes the result.


4. Error model — implementation

// in cargoxx.util
export namespace cargoxx::util {

enum class ErrorCode {
    // Manifest (E0001-E0019)
    ManifestNotFound = 1,
    ManifestParseError,
    ManifestInvalidField,
    ManifestUnknownField,         // strict-parse mode only
    ManifestVersionInvalid,

    // Layout (E0020-E0039)
    LayoutNoTarget = 20,
    LayoutAmbiguousLib,
    LayoutInvalidName,

    // Resolution (E0040-E0059)
    ResolutionUnknownPackage = 40,
    ResolutionNetworkError,
    ResolutionUnsatisfiable,
    ResolutionVersionNotFound,

    // Linkdb (E0060-E0079)
    LinkdbUnknownPackage = 60,
    LinkdbCorrupt,
    LinkdbComponentNotSupported,

    // Build / exec (E0080-E0099)
    ExecCommandFailed = 80,
    ExecToolNotFound,
    BuildCmakeFailed,
    BuildNixFailed,

    // Internal (E0100+)
    Internal = 100,
    NotImplemented,
};

struct Error {
    ErrorCode code;
    std::string message;
    std::string hint;
    std::optional<std::filesystem::path> location;
    std::optional<std::pair<int, int>> line_col;
};

template <typename T>
class Result {
    // std::expected<T, Error> when available; otherwise tl::expected.
    // Public surface: has_value(), value(), error().
};

auto format(const Error& e) -> std::string;     // produces SPEC.md §12 output

}

We do not throw exceptions across module boundaries. Result<T> is the only way to propagate failure. throw is permitted only inside a single .cpp file when the catch site is in the same file.


5. Subprocess discipline

All external commands go through cargoxx::exec::run:

struct ExecResult {
    int exit_code;
    std::string stdout_text;
    std::string stderr_text;
};

struct ExecOptions {
    std::filesystem::path cwd;
    std::vector<std::pair<std::string, std::string>> env_overrides;
    std::optional<std::chrono::seconds> timeout;
    bool inherit_stdio = false;          // for `cargoxx run`
};

auto run(const std::string& program,
         const std::vector<std::string>& args,
         const ExecOptions& opts = {}) -> Result<ExecResult>;

Backed by reproc. Never use system(), popen(), or shell strings — argv only, no shell expansion. Every external invocation is logged at debug level with the full argv and the cwd.


6. Generators — testability

Generators are pure functions over POD inputs. Tests assert exact string equality against golden files in tests/e2e/projects/<name>/expected/.

To regenerate goldens during development:

CARGOXX_TEST_REGENERATE=1 ctest -R codegen

The test runner detects the env var, writes new goldens, and reports as a notice (not a pass). CI never sets this var.

Whitespace and trailing newline are part of the contract. Generators emit \n line endings unconditionally.


7. Manifest parser — edge cases

  • Comments: toml++ preserves them on round-trip if we round-trip via toml::table. cargoxx add MUST NOT strip the user's comments. Implementation: parse to toml::table, mutate, serialize.
  • Unknown top-level keys: warn but accept. Forward-compat (see SPEC.md §4 reserved fields).
  • Unknown keys inside [package], [build]: error.
  • Dependency value is neither string nor table: error E0003.
  • Empty [dependencies]: valid.
  • name containing characters outside [a-zA-Z0-9_-]: error E0022.
  • name starting with digit: error.

8. Layout discovery — algorithm

discover(project_root, package_name):
    let lib = project_root / "src" / "lib.cppm"
    let main = project_root / "src" / "main.cpp"
    let bin_dir = project_root / "src" / "bin"
    let tests_dir = project_root / "tests"
    let examples_dir = project_root / "examples"

    # Collect library sources if lib.cppm exists
    library = None
    if exists(lib):
        all_cppm = [lib]
        all_cpp = []
        for entry in walk(project_root / "src"):
            if entry == lib: continue
            if entry.parent == bin_dir: continue
            if entry == main: continue
            if entry.ext == ".cppm": all_cppm.push(entry)
            elif entry.ext == ".cpp":  all_cpp.push(entry)
        library = Target {
            kind: Library,
            name: package_name,
            entry: lib,
            module_units: all_cppm,
            additional_sources: all_cpp,
        }

    binaries = []
    if exists(main):
        binaries.push(Target {
            kind: Binary,
            name: package_name,
            entry: main,
        })
    if exists(bin_dir):
        for f in list_dir(bin_dir):
            if f.ext == ".cpp":
                binaries.push(Target {
                    kind: Binary,
                    name: f.stem,
                    entry: f,
                })

    tests = [Target { kind: Test, name: f.stem, entry: f }
             for f in list_dir(tests_dir) if f.ext == ".cpp"]
    examples = [Target { kind: Example, name: f.stem, entry: f }
                for f in list_dir(examples_dir) if f.ext == ".cpp"]

    if library is None and binaries.empty():
        return Err(LayoutNoTarget)

    return Ok(DiscoveredLayout { library, binaries, tests, examples })

walk is non-recursive into bin/, tests/, examples/ — those are flat folders. It IS recursive into other subdirectories of src/ (e.g. src/internal/foo.cppm is part of the library).


9. CMake generator — algorithm

Pseudocode:

cmake_lists(in):
    out = StringBuilder()

    out += header(in.manifest)            # cmake_minimum_required, project, CXX flags

    # find_package per dependency, in manifest order
    for dep, recipe in zip(in.manifest.dependencies, in.recipes):
        out += emit_find_package(dep, recipe)

    # Library target if discovered
    if in.layout.library:
        out += emit_library(in.layout.library, in.recipes)

    # Primary binary (src/main.cpp) — links library if present
    primary_bin = first(in.layout.binaries, .entry endsWith "src/main.cpp")
    if primary_bin:
        out += emit_primary_binary(in.layout, primary_bin, in.recipes)

    # Additional binaries from src/bin/
    for b in in.layout.binaries where b is not primary_bin:
        out += emit_extra_binary(b, in.layout, in.recipes)

    # Tests
    if in.layout.tests:
        out += "enable_testing()\n"
        for t in in.layout.tests:
            out += emit_test(t, in.layout, in.recipes)

    # Examples
    for e in in.layout.examples:
        out += emit_example(e, in.layout, in.recipes)

    # Build flags
    out += emit_build_flags(in.manifest.build, all_target_names(in.layout))

    return out.str()

Each emit_* returns a string with a trailing blank line. The output is deterministic given identical inputs — no timestamps, no nondeterministic ordering, no machine-dependent paths.

find_package emission

For a recipe with no components:

find_package(<<recipe.find_package>>)

For a recipe with components and the dep specifies them:

find_package(<<find_package with {{components}} replaced by COMPONENTS list>>)

{{components}} expands to a space-separated list. {{component}} inside targets expands to one entry per requested component. Example for boost with ["filesystem", "system"]:

find_package(Boost REQUIRED COMPONENTS filesystem system)

And targets become Boost::filesystem and Boost::system.


10. flake.nix generator — algorithm

flake_nix(in):
    nixpkgs_rev = in.lock.nixpkgs_rev    # all deps share one rev
    deps_attrs = [recipe.nixpkgs_attr for recipe in in.recipes]
    deduped = stable_dedup(deps_attrs)

    return template_substitute(FLAKE_TEMPLATE, {
        description: in.manifest.package.name,
        nixpkgs_rev: nixpkgs_rev,
        dep_attrs: deduped,
    })

FLAKE_TEMPLATE is a string constant. Substitution is plain text replacement of <<...>> markers, not a Nix-aware transform.


11. Version resolution — implementation

class Resolver {
    auto resolve(deps: vector<Dependency>) -> Result<ResolutionPlan>:
        # 1. For each dep, query candidate versions + revisions from nixhub
        candidates: map<string, vector<(version, rev)>>
        for dep in deps:
            candidates[dep.name] = query(dep)

        # 2. Aggregate revisions
        all_revs = union of revs across candidates values
        sorted_revs = sort_descending_by_commit_date(all_revs)

        # 3. Try each rev (newest first), find one where every dep has a matching version
        for rev in sorted_revs[:50]:   # cap
            plan = []
            for dep in deps:
                m = candidates[dep.name].filter(_.rev == rev)
                            .filter(satisfies(_.version, dep.version_spec))
                            .max_by(.version)
                if m is None: break
                plan.push((dep.name, m.version, rev))
            if plan.size == deps.size:
                return Ok(ResolutionPlan { rev, entries: plan })

        return Err(ResolutionUnsatisfiable)
};

Network calls are wrapped in a 10-second timeout each. nixhub.io failures fall through to lazamar; both failing falls through to nixpkgs_git. The local nixpkgs git clone is created lazily on first use.


12. Lockfile semantics

The lockfile is rewritten in full on every successful add / remove / build. We do not attempt incremental edits. This simplifies the writer and avoids drift between manifest and lock.

Cargoxx.lock is read at the start of build:

  • If absent → run resolution, then write.
  • If present → check that every manifest dep has a satisfying entry. If yes → use as-is. If no → run resolution.

cargoxx update (deferred to v0.2) will force re-resolution.


13. Testing strategy

Three layers.

Unit tests

One tests/<feature>.cpp per module. Test pure functions: parser, semver, codegen helpers. Catch2 with TEST_CASE. No I/O outside of tmp_path.

Golden-file tests

Each subdirectory in tests/e2e/projects/ contains:

<project>/
├── input/
│   ├── Cargoxx.toml
│   └── src/...
├── expected/
│   ├── flake.nix
│   ├── CMakeLists.txt
│   └── Cargoxx.lock
└── meta.toml         # describes the test (e.g. "fixed_rev = abc123" to make output deterministic)

The runner copies input/ into a temp dir, runs cargoxx build --no-build with a stubbed resolver (returning meta.toml's pinned rev), and diffs every generated file against expected/.

End-to-end build tests

A small set of projects that actually compile via Nix. Marked slow, skipped on dev machines without nix in PATH. Run in CI with cached /nix/store.

Curated DB verification

scripts/verify-curated-db.sh constructs a tiny project per package, runs cargoxx build, and verifies the binary links. Run on every PR that touches data/linkdb.json. This is how we catch upstream Nixpkgs changes that rename attrs.


14. Logging

spdlog initialized at info level by default. --verbose raises to debug, --quiet to warn.

Format:

[<level>] <component>: <message>

Components are module names (manifest, resolver, codegen, …). Every external command (subprocess, HTTP) is logged at debug with the full argv / URL.

User-facing errors are formatted via util::format(Error) and printed to stderr without log decoration. Diagnostic logs go through spdlog.


15. Bootstrap and self-hosting

Three phases.

Phase 0 — hand-written CMake (commits before milestone M3). CMakeLists.txt and flake.nix at the repo root are written by humans. cargoxx builds cargoxx.

Phase 1 — generated CMake, hand-written flake. At milestone M3 (codegen complete), commit a populated Cargoxx.toml. Generate build/CMakeLists.txt from it. Delete the root CMakeLists.txt. The root flake.nix stays hand-written because cargoxx doesn't know about its own host-language deps yet.

Phase 2 — fully self-hosted. At milestone M5, vendor all third-party headers into third_party/ and have cargoxx generate the flake too. The bootstrap path becomes: pre-built cargoxx binary → run cargoxx build → produce next cargoxx.

For continuity, ship a known-good cargoxx binary as a release artifact. Anyone bootstrapping from source clones the repo, downloads the latest release binary, and runs ./bootstrap-cargoxx build. If we ever want to bootstrap from absolute zero, scripts/bootstrap-build.sh does it with a hand-written CMake invocation.


16. Milestones

Each milestone is one mergeable PR series. No milestone is "done" until its tests pass in CI.

M0 — repo skeleton. Empty modules, CMakeLists.txt that builds an empty cargoxx binary, flake.nix with toolchain. CI green.

M1 — manifest + layout. manifest::parse, manifest::write, layout::discover. Unit tests. cargoxx new works (writes Cargoxx.toml and source skeleton, no codegen yet).

M2 — linkdb + curated. linkdb.json with all 25 packages. linkdb::Database::resolve works for curated entries. SQLite overlay schema created on first run.

M3 — codegen. codegen::flake_nix and codegen::cmake_lists. Golden tests for 4-6 representative projects. cargoxx build --no-build produces correct files.

M4 — exec + build. exec::run. cargoxx build invokes nix and cmake end-to-end. cargoxx run, cargoxx test, cargoxx clean.

M5 — resolver + add/remove. resolver::Resolver against nixhub.io. cargoxx add fmt works. Lockfile updates correctly.

M6 — polish. Error message overhaul to match SPEC.md §12. --verbose / --quiet. Self-hosting (Phase 1).

Post-v0.1 (out of this spec): automatic linkdb resolution, workspaces, cargoxx publish, Windows.


17. Coding conventions

  • C++23 modules, no headers in src/ (third_party/ excepted).
  • Names: snake_case for functions and variables, PascalCase for types, SCREAMING_SNAKE_CASE for constants and enum values (ErrorCode::ManifestNotFound).
  • One module per directory. foo/foo.cppm exports cargoxx.foo.
  • No raw new/delete. Smart pointers or value types.
  • std::filesystem::path for paths everywhere. Strings only at the very edge (CLI parsing, JSON).
  • No global mutable state. Configuration is passed explicitly.
  • Format with clang-format using the config in .clang-format (LLVM style, 100-column).
  • Lints: -Wall -Wextra -Wpedantic -Wconversion. CI fails on warnings.

18. Performance budget

cargoxx is interactive. Targets:

Operation Budget
cargoxx new < 100 ms
cargoxx add fmt (cached resolution) < 200 ms
cargoxx add fmt (network resolution) < 5 s
cargoxx build (no codegen change) < 50 ms before invoking cmake
Codegen for a 50-source project < 100 ms

Profile if budgets are exceeded. SQLite I/O is by far the most likely culprit — open the connection once per process, reuse prepared statements.


19. Open questions for the implementation phase

These are decisions that should be made by the implementer with the user's input. Don't silently pick one — surface the choice when you reach the relevant milestone.

  1. Should cargoxx run preserve the user's PATH or only inject Nix's? (Recommend: only Nix's, for reproducibility.)
  2. Should generated flake.nix pin flake-utils or inline the function? (Recommend: pin, smaller diffs on regeneration.)
  3. When the layout has both lib.cppm and main.cpp, should the binary always link the library even if it doesn't import it? (Recommend: yes, it's harmless and matches Cargo.)
  4. Should Cargoxx.lock include a hash of Cargoxx.toml to detect tampering? (Recommend: no for v0.1, complicates the format.)
  5. macOS uses clang_18 from Nix; Linux too. Is there any reason to prefer GCC on Linux? (Recommend: no, Clang has the most mature module support.)