Initial commit

This commit is contained in:
2026-05-07 23:32:46 +00:00
commit 6e922b7249
20 changed files with 1616 additions and 0 deletions

696
TECH_SPEC.md Normal file
View File

@@ -0,0 +1,696 @@
# cargoxx — technical specification
Companion to `SPEC.md`. Where `SPEC.md` defines what cargoxx does, this document defines how it is built. It is the contract between the project and the implementation.
This is the v0.1 design. It is intentionally conservative and rigid — the goal is a working, debuggable tool, not an extensible platform.
---
## 1. Source tree
The cargoxx repository itself follows the layout cargoxx will eventually generate. This is deliberate: the moment cargoxx can build a non-trivial C++ project, we switch its own build to use itself (see §15, bootstrap).
```
cargoxx/
├── Cargoxx.toml # populated once we self-host (§15)
├── CMakeLists.txt # hand-written until self-hosted; lives at root for now
├── flake.nix # hand-written until self-hosted
├── flake.lock
├── README.md
├── SPEC.md
├── TECH_SPEC.md
├── AGENTS.md
├── LICENSE
├── data/
│ └── linkdb.json # curated link database (§9 in SPEC.md)
├── src/
│ ├── main.cpp # CLI entry point
│ ├── lib.cppm # primary module, re-exports submodules
│ ├── manifest/
│ │ ├── manifest.cppm
│ │ ├── parser.cpp
│ │ └── writer.cpp
│ ├── lockfile/
│ │ ├── lockfile.cppm
│ │ └── lockfile.cpp
│ ├── layout/
│ │ ├── layout.cppm # source-tree discovery, target inference
│ │ └── layout.cpp
│ ├── codegen/
│ │ ├── codegen.cppm
│ │ ├── flake.cpp # flake.nix generator
│ │ └── cmake.cpp # CMakeLists.txt generator
│ ├── linkdb/
│ │ ├── linkdb.cppm
│ │ ├── curated.cpp # loads embedded JSON
│ │ ├── overlay.cpp # user SQLite cache
│ │ └── recipe.cpp
│ ├── resolver/
│ │ ├── resolver.cppm
│ │ ├── nixhub.cpp
│ │ ├── lazamar.cpp
│ │ └── nixpkgs_git.cpp # local git fallback
│ ├── exec/
│ │ ├── exec.cppm
│ │ └── subprocess.cpp # wrapping reproc
│ ├── cli/
│ │ ├── cli.cppm
│ │ ├── cmd_new.cpp
│ │ ├── cmd_add.cpp
│ │ ├── cmd_remove.cpp
│ │ ├── cmd_build.cpp
│ │ ├── cmd_run.cpp
│ │ ├── cmd_test.cpp
│ │ └── cmd_clean.cpp
│ └── util/
│ ├── util.cppm
│ ├── error.cpp # error type and formatting
│ ├── log.cpp # spdlog wrapper
│ └── semver.cpp # version range matching
├── tests/
│ ├── manifest_parse.cpp
│ ├── layout_discovery.cpp
│ ├── linkdb_lookup.cpp
│ ├── codegen_flake.cpp
│ ├── codegen_cmake.cpp
│ ├── semver.cpp
│ └── e2e/ # golden-file tests, see §13
│ ├── projects/
│ │ ├── hello/
│ │ ├── lib_only/
│ │ ├── multi_bin/
│ │ └── with_fmt/
│ └── runner.cpp
├── third_party/ # vendored single-header libs
│ ├── toml.hpp # toml++
│ ├── json.hpp # nlohmann/json
│ ├── httplib.h # cpp-httplib
│ ├── CLI11.hpp
│ └── spdlog/ # spdlog (header-only build)
└── scripts/
├── bootstrap-build.sh # one-shot build from a clean tree
└── verify-curated-db.sh # checks every entry in data/linkdb.json
```
`third_party/` is vendored on purpose. In Phase 1 and more we avoid Nix for cargoxx's own dependencies because cargoxx is the thing being bootstrapped and we want a short, debuggable path from clean clone to working binary.
`reproc` and `sqlite3` are NOT vendored — they come from Nix in the bootstrap `flake.nix`. They have C sources or build systems and aren't drop-in headers.
---
## 2. Module layout
All cargoxx C++ sources are modules. The dependency graph between modules is:
```
cargoxx (lib.cppm, root module)
├── cargoxx.util
├── cargoxx.exec depends on: util
├── cargoxx.manifest depends on: util
├── cargoxx.lockfile depends on: util, manifest
├── cargoxx.linkdb depends on: util
├── cargoxx.resolver depends on: util, exec, linkdb
├── cargoxx.layout depends on: util
├── cargoxx.codegen depends on: util, manifest, linkdb, layout, lockfile
└── cargoxx.cli depends on: everything above
```
`main.cpp` imports `cargoxx.cli` and dispatches on argv. No business logic in `main.cpp`.
Each `.cppm` declares one module: `export module cargoxx.manifest;` etc. The root `lib.cppm` is `export module cargoxx;` and re-exports submodules selectively.
---
## 3. Core types
Definitions below are normative for the public interface. Implementation details (constructors, helpers) are at the agent's discretion.
```cpp
// in cargoxx.manifest
export module cargoxx.manifest;
import std;
import cargoxx.util;
export namespace cargoxx::manifest {
struct Dependency {
std::string name;
std::string version_spec; // e.g. "10.2", "^1.0", "*"
std::vector<std::string> components; // empty if not a componentized package
};
struct BuildSettings {
bool warnings_as_errors = false;
std::vector<std::string> sanitizers;
};
enum class Edition { Cpp20, Cpp23, Cpp26 };
struct Package {
std::string name;
std::string version;
Edition edition = Edition::Cpp23;
std::vector<std::string> authors;
std::optional<std::string> license;
};
struct Manifest {
Package package;
std::vector<Dependency> dependencies;
BuildSettings build;
};
auto parse(const std::filesystem::path& path) -> util::Result<Manifest>;
auto write(const Manifest& m, const std::filesystem::path& path) -> util::Result<void>;
}
```
```cpp
// in cargoxx.layout
export module cargoxx.layout;
import std;
export namespace cargoxx::layout {
enum class TargetKind { Library, Binary, Test, Example };
struct Target {
TargetKind kind;
std::string name;
std::filesystem::path entry; // primary source file
std::vector<std::filesystem::path> additional_sources;
std::vector<std::filesystem::path> module_units; // .cppm files
};
struct DiscoveredLayout {
std::optional<Target> library; // exactly 0 or 1
std::vector<Target> binaries; // 0..N
std::vector<Target> tests; // 0..N
std::vector<Target> examples; // 0..N
};
auto discover(const std::filesystem::path& project_root,
const std::string& package_name)
-> util::Result<DiscoveredLayout>;
}
```
```cpp
// in cargoxx.linkdb
export module cargoxx.linkdb;
import std;
import cargoxx.util;
export namespace cargoxx::linkdb {
struct Recipe {
std::string nixpkgs_attr;
std::string find_package; // raw CMake snippet, post-substitution
std::vector<std::string> targets;// post-substitution
std::string source; // 'curated' | 'manual' | etc
};
struct Database {
static auto open() -> util::Result<Database>;
auto resolve(const std::string& package,
const std::string& version,
const std::vector<std::string>& components)
-> util::Result<Recipe>;
auto add_manual(const std::string& package,
const std::string& version_range,
const Recipe& r) -> util::Result<void>;
// private: holds sqlite handle + parsed curated JSON
};
}
```
```cpp
// in cargoxx.codegen
export module cargoxx.codegen;
import std;
import cargoxx.manifest;
import cargoxx.layout;
import cargoxx.linkdb;
import cargoxx.lockfile;
export namespace cargoxx::codegen {
struct GenerateInputs {
const manifest::Manifest& manifest;
const layout::DiscoveredLayout& layout;
const lockfile::Lockfile& lock;
std::vector<linkdb::Recipe> recipes; // one per dependency, same order
std::filesystem::path project_root;
};
auto flake_nix(const GenerateInputs& in) -> std::string;
auto cmake_lists(const GenerateInputs& in) -> std::string;
}
```
The two generator functions are pure: input → string. They do no I/O. The caller writes the result.
---
## 4. Error model — implementation
```cpp
// in cargoxx.util
export namespace cargoxx::util {
enum class ErrorCode {
// Manifest (E0001-E0019)
ManifestNotFound = 1,
ManifestParseError,
ManifestInvalidField,
ManifestUnknownField, // strict-parse mode only
ManifestVersionInvalid,
// Layout (E0020-E0039)
LayoutNoTarget = 20,
LayoutAmbiguousLib,
LayoutInvalidName,
// Resolution (E0040-E0059)
ResolutionUnknownPackage = 40,
ResolutionNetworkError,
ResolutionUnsatisfiable,
ResolutionVersionNotFound,
// Linkdb (E0060-E0079)
LinkdbUnknownPackage = 60,
LinkdbCorrupt,
LinkdbComponentNotSupported,
// Build / exec (E0080-E0099)
ExecCommandFailed = 80,
ExecToolNotFound,
BuildCmakeFailed,
BuildNixFailed,
// Internal (E0100+)
Internal = 100,
NotImplemented,
};
struct Error {
ErrorCode code;
std::string message;
std::string hint;
std::optional<std::filesystem::path> location;
std::optional<std::pair<int, int>> line_col;
};
template <typename T>
class Result {
// std::expected<T, Error> when available; otherwise tl::expected.
// Public surface: has_value(), value(), error().
};
auto format(const Error& e) -> std::string; // produces SPEC.md §12 output
}
```
We do not throw exceptions across module boundaries. `Result<T>` is the only way to propagate failure. `throw` is permitted only inside a single `.cpp` file when the catch site is in the same file.
---
## 5. Subprocess discipline
All external commands go through `cargoxx::exec::run`:
```cpp
struct ExecResult {
int exit_code;
std::string stdout_text;
std::string stderr_text;
};
struct ExecOptions {
std::filesystem::path cwd;
std::vector<std::pair<std::string, std::string>> env_overrides;
std::optional<std::chrono::seconds> timeout;
bool inherit_stdio = false; // for `cargoxx run`
};
auto run(const std::string& program,
const std::vector<std::string>& args,
const ExecOptions& opts = {}) -> Result<ExecResult>;
```
Backed by `reproc`. Never use `system()`, `popen()`, or shell strings — argv only, no shell expansion. Every external invocation is logged at `debug` level with the full argv and the cwd.
---
## 6. Generators — testability
Generators are pure functions over POD inputs. Tests assert exact string equality against golden files in `tests/e2e/projects/<name>/expected/`.
To regenerate goldens during development:
```
CARGOXX_TEST_REGENERATE=1 ctest -R codegen
```
The test runner detects the env var, writes new goldens, and reports as a notice (not a pass). CI never sets this var.
Whitespace and trailing newline are part of the contract. Generators emit `\n` line endings unconditionally.
---
## 7. Manifest parser — edge cases
- Comments: `toml++` preserves them on round-trip if we round-trip via `toml::table`. `cargoxx add` MUST NOT strip the user's comments. Implementation: parse to `toml::table`, mutate, serialize.
- Unknown top-level keys: warn but accept. Forward-compat (see SPEC.md §4 reserved fields).
- Unknown keys inside `[package]`, `[build]`: error.
- Dependency value is neither string nor table: error E0003.
- Empty `[dependencies]`: valid.
- `name` containing characters outside `[a-zA-Z0-9_-]`: error E0022.
- `name` starting with digit: error.
---
## 8. Layout discovery — algorithm
```
discover(project_root, package_name):
let lib = project_root / "src" / "lib.cppm"
let main = project_root / "src" / "main.cpp"
let bin_dir = project_root / "src" / "bin"
let tests_dir = project_root / "tests"
let examples_dir = project_root / "examples"
# Collect library sources if lib.cppm exists
library = None
if exists(lib):
all_cppm = [lib]
all_cpp = []
for entry in walk(project_root / "src"):
if entry == lib: continue
if entry.parent == bin_dir: continue
if entry == main: continue
if entry.ext == ".cppm": all_cppm.push(entry)
elif entry.ext == ".cpp": all_cpp.push(entry)
library = Target {
kind: Library,
name: package_name,
entry: lib,
module_units: all_cppm,
additional_sources: all_cpp,
}
binaries = []
if exists(main):
binaries.push(Target {
kind: Binary,
name: package_name,
entry: main,
})
if exists(bin_dir):
for f in list_dir(bin_dir):
if f.ext == ".cpp":
binaries.push(Target {
kind: Binary,
name: f.stem,
entry: f,
})
tests = [Target { kind: Test, name: f.stem, entry: f }
for f in list_dir(tests_dir) if f.ext == ".cpp"]
examples = [Target { kind: Example, name: f.stem, entry: f }
for f in list_dir(examples_dir) if f.ext == ".cpp"]
if library is None and binaries.empty():
return Err(LayoutNoTarget)
return Ok(DiscoveredLayout { library, binaries, tests, examples })
```
`walk` is non-recursive into `bin/`, `tests/`, `examples/` — those are flat folders. It IS recursive into other subdirectories of `src/` (e.g. `src/internal/foo.cppm` is part of the library).
---
## 9. CMake generator — algorithm
Pseudocode:
```
cmake_lists(in):
out = StringBuilder()
out += header(in.manifest) # cmake_minimum_required, project, CXX flags
# find_package per dependency, in manifest order
for dep, recipe in zip(in.manifest.dependencies, in.recipes):
out += emit_find_package(dep, recipe)
# Library target if discovered
if in.layout.library:
out += emit_library(in.layout.library, in.recipes)
# Primary binary (src/main.cpp) — links library if present
primary_bin = first(in.layout.binaries, .entry endsWith "src/main.cpp")
if primary_bin:
out += emit_primary_binary(in.layout, primary_bin, in.recipes)
# Additional binaries from src/bin/
for b in in.layout.binaries where b is not primary_bin:
out += emit_extra_binary(b, in.layout, in.recipes)
# Tests
if in.layout.tests:
out += "enable_testing()\n"
for t in in.layout.tests:
out += emit_test(t, in.layout, in.recipes)
# Examples
for e in in.layout.examples:
out += emit_example(e, in.layout, in.recipes)
# Build flags
out += emit_build_flags(in.manifest.build, all_target_names(in.layout))
return out.str()
```
Each `emit_*` returns a string with a trailing blank line. The output is deterministic given identical inputs — no timestamps, no nondeterministic ordering, no machine-dependent paths.
### find_package emission
For a recipe with no components:
```
find_package(<<recipe.find_package>>)
```
For a recipe with components and the dep specifies them:
```
find_package(<<find_package with {{components}} replaced by COMPONENTS list>>)
```
`{{components}}` expands to a space-separated list. `{{component}}` inside `targets` expands to one entry per requested component. Example for boost with `["filesystem", "system"]`:
```
find_package(Boost REQUIRED COMPONENTS filesystem system)
```
And targets become `Boost::filesystem` and `Boost::system`.
---
## 10. flake.nix generator — algorithm
```
flake_nix(in):
nixpkgs_rev = in.lock.nixpkgs_rev # all deps share one rev
deps_attrs = [recipe.nixpkgs_attr for recipe in in.recipes]
deduped = stable_dedup(deps_attrs)
return template_substitute(FLAKE_TEMPLATE, {
description: in.manifest.package.name,
nixpkgs_rev: nixpkgs_rev,
dep_attrs: deduped,
})
```
`FLAKE_TEMPLATE` is a string constant. Substitution is plain text replacement of `<<...>>` markers, not a Nix-aware transform.
---
## 11. Version resolution — implementation
```
class Resolver {
auto resolve(deps: vector<Dependency>) -> Result<ResolutionPlan>:
# 1. For each dep, query candidate versions + revisions from nixhub
candidates: map<string, vector<(version, rev)>>
for dep in deps:
candidates[dep.name] = query(dep)
# 2. Aggregate revisions
all_revs = union of revs across candidates values
sorted_revs = sort_descending_by_commit_date(all_revs)
# 3. Try each rev (newest first), find one where every dep has a matching version
for rev in sorted_revs[:50]: # cap
plan = []
for dep in deps:
m = candidates[dep.name].filter(_.rev == rev)
.filter(satisfies(_.version, dep.version_spec))
.max_by(.version)
if m is None: break
plan.push((dep.name, m.version, rev))
if plan.size == deps.size:
return Ok(ResolutionPlan { rev, entries: plan })
return Err(ResolutionUnsatisfiable)
};
```
Network calls are wrapped in a 10-second timeout each. nixhub.io failures fall through to lazamar; both failing falls through to `nixpkgs_git`. The local nixpkgs git clone is created lazily on first use.
---
## 12. Lockfile semantics
The lockfile is rewritten in full on every successful `add` / `remove` / `build`. We do not attempt incremental edits. This simplifies the writer and avoids drift between manifest and lock.
`Cargoxx.lock` is read at the start of `build`:
- If absent → run resolution, then write.
- If present → check that every manifest dep has a satisfying entry. If yes → use as-is. If no → run resolution.
`cargoxx update` (deferred to v0.2) will force re-resolution.
---
## 13. Testing strategy
Three layers.
### Unit tests
One `tests/<feature>.cpp` per module. Test pure functions: parser, semver, codegen helpers. Catch2 with `TEST_CASE`. No I/O outside of `tmp_path`.
### Golden-file tests
Each subdirectory in `tests/e2e/projects/` contains:
```
<project>/
├── input/
│ ├── Cargoxx.toml
│ └── src/...
├── expected/
│ ├── flake.nix
│ ├── CMakeLists.txt
│ └── Cargoxx.lock
└── meta.toml # describes the test (e.g. "fixed_rev = abc123" to make output deterministic)
```
The runner copies `input/` into a temp dir, runs cargoxx `build --no-build` with a stubbed resolver (returning `meta.toml`'s pinned rev), and diffs every generated file against `expected/`.
### End-to-end build tests
A small set of projects that actually compile via Nix. Marked `slow`, skipped on dev machines without `nix` in PATH. Run in CI with cached `/nix/store`.
### Curated DB verification
`scripts/verify-curated-db.sh` constructs a tiny project per package, runs `cargoxx build`, and verifies the binary links. Run on every PR that touches `data/linkdb.json`. This is how we catch upstream Nixpkgs changes that rename attrs.
---
## 14. Logging
`spdlog` initialized at `info` level by default. `--verbose` raises to `debug`, `--quiet` to `warn`.
Format:
```
[<level>] <component>: <message>
```
Components are module names (`manifest`, `resolver`, `codegen`, …). Every external command (subprocess, HTTP) is logged at `debug` with the full argv / URL.
User-facing errors are formatted via `util::format(Error)` and printed to stderr without log decoration. Diagnostic logs go through spdlog.
---
## 15. Bootstrap and self-hosting
Three phases.
**Phase 0 — hand-written CMake (commits before milestone M3).**
`CMakeLists.txt` and `flake.nix` at the repo root are written by humans. `cargoxx` builds `cargoxx`.
**Phase 1 — generated CMake, hand-written flake.**
At milestone M3 (codegen complete), commit a populated `Cargoxx.toml`. Generate `build/CMakeLists.txt` from it. Delete the root `CMakeLists.txt`. The root `flake.nix` stays hand-written because cargoxx doesn't know about its own host-language deps yet.
**Phase 2 — fully self-hosted.**
At milestone M5, vendor all third-party headers into `third_party/` and have cargoxx generate the flake too. The bootstrap path becomes: pre-built cargoxx binary → run `cargoxx build` → produce next cargoxx.
For continuity, ship a known-good cargoxx binary as a release artifact. Anyone bootstrapping from source clones the repo, downloads the latest release binary, and runs `./bootstrap-cargoxx build`. If we ever want to bootstrap from absolute zero, `scripts/bootstrap-build.sh` does it with a hand-written CMake invocation.
---
## 16. Milestones
Each milestone is one mergeable PR series. No milestone is "done" until its tests pass in CI.
**M0 — repo skeleton.** Empty modules, CMakeLists.txt that builds an empty `cargoxx` binary, flake.nix with toolchain. CI green.
**M1 — manifest + layout.** `manifest::parse`, `manifest::write`, `layout::discover`. Unit tests. `cargoxx new` works (writes `Cargoxx.toml` and source skeleton, no codegen yet).
**M2 — linkdb + curated.** `linkdb.json` with all 25 packages. `linkdb::Database::resolve` works for curated entries. SQLite overlay schema created on first run.
**M3 — codegen.** `codegen::flake_nix` and `codegen::cmake_lists`. Golden tests for 4-6 representative projects. `cargoxx build --no-build` produces correct files.
**M4 — exec + build.** `exec::run`. `cargoxx build` invokes nix and cmake end-to-end. `cargoxx run`, `cargoxx test`, `cargoxx clean`.
**M5 — resolver + add/remove.** `resolver::Resolver` against nixhub.io. `cargoxx add fmt` works. Lockfile updates correctly.
**M6 — polish.** Error message overhaul to match SPEC.md §12. `--verbose` / `--quiet`. Self-hosting (Phase 1).
Post-v0.1 (out of this spec): automatic linkdb resolution, workspaces, `cargoxx publish`, Windows.
---
## 17. Coding conventions
- C++23 modules, no headers in `src/` (third_party/ excepted).
- Names: `snake_case` for functions and variables, `PascalCase` for types, `SCREAMING_SNAKE_CASE` for constants and enum values (`ErrorCode::ManifestNotFound`).
- One module per directory. `foo/foo.cppm` exports `cargoxx.foo`.
- No raw `new`/`delete`. Smart pointers or value types.
- `std::filesystem::path` for paths everywhere. Strings only at the very edge (CLI parsing, JSON).
- No global mutable state. Configuration is passed explicitly.
- Format with `clang-format` using the config in `.clang-format` (LLVM style, 100-column).
- Lints: `-Wall -Wextra -Wpedantic -Wconversion`. CI fails on warnings.
---
## 18. Performance budget
cargoxx is interactive. Targets:
| Operation | Budget |
| --- | --- |
| `cargoxx new` | < 100 ms |
| `cargoxx add fmt` (cached resolution) | < 200 ms |
| `cargoxx add fmt` (network resolution) | < 5 s |
| `cargoxx build` (no codegen change) | < 50 ms before invoking cmake |
| Codegen for a 50-source project | < 100 ms |
Profile if budgets are exceeded. SQLite I/O is by far the most likely culprit — open the connection once per process, reuse prepared statements.
---
## 19. Open questions for the implementation phase
These are decisions that should be made by the implementer with the user's input. Don't silently pick one — surface the choice when you reach the relevant milestone.
1. Should `cargoxx run` preserve the user's PATH or only inject Nix's? (Recommend: only Nix's, for reproducibility.)
2. Should generated `flake.nix` pin `flake-utils` or inline the function? (Recommend: pin, smaller diffs on regeneration.)
3. When the layout has both `lib.cppm` and `main.cpp`, should the binary always link the library even if it doesn't `import` it? (Recommend: yes, it's harmless and matches Cargo.)
4. Should `Cargoxx.lock` include a hash of `Cargoxx.toml` to detect tampering? (Recommend: no for v0.1, complicates the format.)
5. macOS uses `clang_18` from Nix; Linux too. Is there any reason to prefer GCC on Linux? (Recommend: no, Clang has the most mature module support.)