CLI End-to-End Manual Test
The CLI smoke test below exercises the full BranchBox workflow on a disposable repository. Run it whenever you want to validate that branchbox init, devcontainer bootstrapping, feature lifecycle commands, and cleanup all behave as expected (e.g., before tagging a release).
Prerequisites
- Docker Engine +
docker compose - Rust toolchain (so
cargo build -p branchbox-clisucceeds) jq(used by the automation script to readdevcontainer.json)- Host machine can run privileged containers (the generated devcontainer enables Docker-in-Docker)
Set BRANCHBOX_SKIP_HOST_VALIDATION=1 while running these steps so the workflow skips host safety checks. The script described later does this automatically.
Manual Flow
-
Seed a disposable repo
- Create a fresh git repo under
/tmp/branchbox-cli-e2e/seed-app. - Drop in a tiny Rust project (one
Cargo.toml, onesrc/main.rs),git add, andgit commit. - Export
BRANCHBOX_PROJECTS_DIRto a second temp directory so reorganization stays isolated.
- Create a fresh git repo under
-
Initialize BranchBox
- From inside the repo run:
BRANCHBOX_SKIP_HOST_VALIDATION=1 \
BRANCHBOX_PROJECTS_DIR="$BRANCHBOX_PROJECTS_DIR" \
branchbox init --stack rust --reorganize --use-parent-structure -y - Expect a
main/worktree to appear under the projects directory,.devcontainer/to be generated,.env.sampleto be stamped, and.branchbox/registry.jsonto exist.
- From inside the repo run:
-
Bring up the main devcontainer
- Ensure
main/.envexists (copy from.env.sampleif needed) sodocker composecan load the env file list. - Run
docker compose -f main/.devcontainer/compose.yaml up -d --build(supply--project-directory main/.devcontainerif you prefer explicit context). docker compose exec rust-dev git --versionshould succeed, confirming the container has git and the repo bind mount.- Tear down with
docker compose ... down -v --remove-orphans.
- Ensure
-
Start a feature worktree
- From the container directory run
branchbox feature start cli-e2e-smoke. - Expect:
- New worktree directory
<container>/cli-e2e-smoke/with a.gitfile pointing to the shared gitdir. - Git branch
feature/cli-e2e-smoke. .devcontainer/copied to the feature,.envduplicated with feature-specificAPP_URL/COMPOSE_PROJECT_NAME.- Specs module creates/updates
docs/features/in-progress/cli-e2e-smoke.md.
- New worktree directory
- Build the feature devcontainer via
docker compose -f <feature>/.devcontainer/compose.yaml up -d --buildand verifygit --versioninside the container.
- From the container directory run
-
Teardown and verify cleanup
- Run
branchbox feature teardown cli-e2e-smoke --delete-branch --complete-spec. - Confirm the feature directory is gone,
git branch --list feature/cli-e2e-smokereturns empty, the devcontainer directory vanished with the worktree, and the spec moved fromdocs/features/in-progress/todocs/features/completed/.
- Run
Document every discrepancy (missing main/, failed container launch, stale branches, etc.) before releasing.
Automation Script
The repository ships scripts/manual-cli-e2e.sh, which runs the entire flow above:
- Builds
branchboxif needed. - Seeds a throwaway git repo under
$(mktemp)and forcesbranchbox initto reorganize into a sibling temp directory. - Brings main + feature devcontainers up via
docker compose, confirming git works inside both containers. - Starts a feature, validates registry/git state, then tears it down with
--delete-branch --complete-spec. - Ensures
.devcontainer/.branchbox.envexists in both the main worktree and its feature copy so per-worktree overrides stay intact. - Injects JSONC comments into
devcontainer.jsonto confirm BranchBox accepts commented configs before syncing. - Exercises
branchbox devcontainer sync --dry-runwithcopyandsymlinkstrategies so downstream tooling can rely on the command. - Seeds a backlog spec under
docs/features/backlog/and verifies the specs module promotes it toin-progress/on start andcompleted/on teardown viaFEATURES_DIR. - Captures
branchbox feature list --json(while active) and--json --all(after teardown) to ensure the richer registry metadata matches reality. - Records every failed expectation and exits non-zero with a summary of bugs.
Usage:
# Regular run (default)
./scripts/manual-cli-e2e.sh
# Verbose tracing + extra BranchBox logs
./scripts/manual-cli-e2e.sh --mode verbose
# Pretend/dry-run (log steps, skip BranchBox + Docker)
./scripts/manual-cli-e2e.sh --mode pretend
# Spin up the HTTP drain stub and verify acks
./scripts/manual-agent-e2e.sh --cp-stub
--mode verbose enables shell tracing and passes verbose flags to BranchBox commands so you can watch every git/module operation. --mode pretend is a safe dry-run that logs each action without invoking BranchBox or Docker while still performing lightweight repo scaffolding under /tmp. Combine any mode with KEEP_E2E_TMP=1 to preserve the temporary workspace for manual inspection.
--cp-stub starts a disposable Python HTTP server inside the devcontainer, points the agent’s BRANCHBOX_CP_ENDPOINT at it, and prints both the stub log and the control_plane_status.last_ack_event_id cursor once the CLI harness finishes. Use this whenever you want to see the durable-ack logic in action or reproduce control-plane failures locally.
Need a quick health check without rerunning the harness? Use branchbox agent status --json—it reports whether the drain is configured/connected and when the last delivery or failure occurred so you can diagnose token/endpoint issues.
Run the script locally before publishing releases (or wire it into CI once Docker is available). When it fails, use the manual checklist above to dig into the exact stage and file detailed bug reports.
Mac App ↔ Agent Loop
Milestone 2 adds a minimal SwiftUI client under macos/ so we can validate end-to-end agent orchestration on macOS. Run this loop in addition to the CLI harness whenever you touch the agent, control-plane drain, or desktop app code:
- Start the agent locally
- From the repo root run
cargo run -p branchbox-agent(orscripts/manual-agent-e2e.shto reuse the smoke harness). SetBRANCHBOX_AGENT_DIRso the daemon stores its SQLite queue outside your real workspace if you want a clean slate. - Optional: point the HTTP drain at a staging endpoint with
BRANCHBOX_CP_ENDPOINT=https://example.test/hooks/devices BRANCHBOX_CP_TOKEN=fake-tokenso you can confirm batches land outside stdout.
- From the repo root run
- Configure workspace + gRPC address for the mac app
- On macOS set the expected workspace path via
defaults write dev.branchbox.app workspace "$(pwd)". - Override the transport as needed with
export BRANCHBOX_AGENT_GRPC_ADDR=127.0.0.1:50515or by editing~/Library/Preferences/dev.branchbox.app.plist.
- On macOS set the expected workspace path via
- Run the SwiftUI preview
- From a mac host run
cd macos && swift run BranchBoxApp. The window should list all features detected byFeatureService/List. Rows tagged “CLI” indicate the fallback path kicked in because the gRPC transport was unavailable. - Linux devcontainers cannot build the SwiftUI target—the Apple SDKs that provide
OSLog, SwiftUI, and friends only ship with macOS/Xcode—so this step must execute on a macOS machine or CI runner.
- From a mac host run
- Start + teardown from the UI
- Use the “Start feature” form to launch a new worktree (toggle minimal mode + prompt seed as needed). Confirm the action flows through gRPC (watch the agent logs) and that the entry appears with the right status.
- Select “Teardown” on the new feature. Verify the worktree disappears, the specs module runs, and the UI updates.
- Confirm control-plane delivery
- Tail the agent logs to ensure the HTTP drain batches the start + teardown events (look for
control planelines and host metadata). When pointing at a stub endpoint you should see HTTP 200s; otherwise the agent logs that it fell back to local logging.
- Tail the agent logs to ensure the HTTP drain batches the start + teardown events (look for
Document any divergence (UI not updating, CLI fallback misfiring, HTTP drain errors) in the Milestone 2 tracking issue before marking a PR ready for review.
Release-blocking matrix
Every release candidate must pass the harness in all modes and stacks listed below. This matrix mirrors the requirements in AGENTS.md and RELEASING.md—document the results in your release notes so reviewers know the workflow was exercised end-to-end.
./scripts/manual-cli-e2e.sh
./scripts/manual-cli-e2e.sh --mode verbose
./scripts/manual-cli-e2e.sh --mode pretend
STACK=generic ./scripts/manual-cli-e2e.sh
STACK=rails ./scripts/manual-cli-e2e.sh
STACK=node ./scripts/manual-cli-e2e.sh
If you touch a different adapter or stack, repeat with STACK=<stack> for that target as well. Use KEEP_E2E_TMP=1 when you need to preserve the temporary workspace for debugging and summarize any deviations in the release PR before attempting cargo release.