Back to blog

I Gave an AI Agent a VM, $100, and 30 Days to Make Money. It Built a Beautiful Machine and Refused to Turn It On.

Zack Cuddy · June 13, 2026 · 14 min read
  • AI
  • Claude Code
  • AI-Driven Development (AI-DD)
  • Experiment

I've been inspired by an idea: what does it look like when we step aside and fully give AI the keys

I've been calling this AI-DD: AI-Driven Development and I wanted to test the most extreme version of it I could think of. Not "pair program with me," not "write this function." Instead: give an AI a real machine, a real budget, a real goal with money attached, and then do nothing. No nudging, no rescuing, no "actually, try this." Just provision what it asks for, hand over the keys, and watch.

The goal I picked was day trading, mostly because its easy to determine success, did the number go up or down? I told the agent to research a low-velocity strategy, build it, back-test it, and paper-trade it through Alpaca to "make money." My entire job was to be a pair of eyes.

It ran for 30 days. It wrote ~2,900 lines of code across 211 commits. It built a backtest harness, validated a strategy, stood up a daemon, caught its own bugs, and even confessed to fabricating one of its own journal entries. It was, by any reasonable measure, a genuinely competent research engineer.

It also never placed a single trade. Equity ended the month at exactly $100,000.00. The same number it started with. It built the whole casino and then wouldn't walk through the door.

That gap between "brilliant builder" and "won't make the call" turned out to be biggest takeaway from this experiment. Let's get into it how it went down.

The TL;DR

  • I ran a fully autonomous AI agent on a DigitalOcean droplet for 30 days on a $100 API budget, with a mandate to build and paper-trade a strategy. I was a read-only observer and never interjected once.
  • The agent was excellent at the bounded, well-specified work: a real backtest harness, walk-forward validation, a regime filter, a minute-poller daemon, an end-of-day shadow analyzer. Week 1 was an 8/10.
  • It was terrible at the one open-ended judgment call the whole thing hinged on: flipping dry_run=False. It never went live. It re-validated a frozen design on samples it correctly knew were too small, for four straight weeks.
  • It caught every bug it ever made, including a hallucinated journal entry, but it could not prevent a recurring mistake even with a note-to-self written specifically to stop it.
  • The trading is a sideshow. The real takeaways are about AI-DD: your mandate is the program, detection isn't prevention, and the agent will optimize the spirit of your words right past the thing you actually wanted.

The prompt that started it

It always starts with a prompt, so here it is. Full disclosure: I've condensed the prompt into a single reusable one. It was actually a back-and-forth with Claude over a few messages where I verified it could do what I wanted and made it clear I wanted to be a silent observer. Here is the gist of it:

code
I want you to conduct an experiment with AI to help me understand what is possible.

I want to create a full autonomous Al-driven stock trader. We will use a paper money platform to start for safety purposes. I want an agent to conceptualize, create, deploy, and operate the trader. I have 0 input for the entire project.

I will procure any resources you need to get started, including API keys and provisioning a DigitalOcean Droplet to your specs. However, I will not make any decisions for myself, I will only operate as the hand you guide so you will need to instruct me or provide scripts for everything you want me to do.

The stock trader will need to operate within some bounds as my resources are not unlimited. I will provide $100 in API credits to use for Claude Code. You will need to budget this over the length of the experiment (30 days). Because of this I suggest we use a low-velocity back-tested trading strategy.

Ask any questions you may have, otherwise lets get started.

That's it. Everything downstream: the architecture, the strategy, the bugs, the month-long stalemate grew out of those few paragraphs. Which, spoiler, is exactly the point I'll come back to at the end. The mandate is the program.

Standing it up (the part I actually did)

The one thing I physically did was run three bash scripts the agent handed me. It designed the whole provisioning flow itself: spin up the droplet, lock it down, then bootstrap the agent into it. I ran them in order and otherwise kept my hands in my pockets.

These three together are the entire "human in the loop" portion of the autonomous experiment, which is to say, about fifteen minutes of copy-paste:

#!/usr/bin/env bash
#
# 01-provision-droplet.sh
# Run this on YOUR LOCAL MACHINE (not the droplet).
#
# Prerequisites:
#   1. Install doctl:
#        macOS:   brew install doctl
#        Linux:   snap install doctl
#        Windows: https://docs.digitalocean.com/reference/doctl/how-to/install/
#   2. Create a DO API token at https://cloud.digitalocean.com/account/api/tokens
#      (give it read + write scope)
#   3. Have an SSH key generated locally (~/.ssh/id_ed25519.pub or similar)
#
# What this script does:
#   - Authenticates doctl with your token
#   - Registers your SSH key with DigitalOcean (if not already)
#   - Creates a cloud firewall (SSH in from your IP, HTTPS + DNS out only)
#   - Creates the droplet (Ubuntu 24.04, 2GB, with backups)
#   - Attaches the firewall to the droplet
#   - Prints the droplet IP and the next command to run

set -euo pipefail

# ---------- Config (edit these if you want) ----------
DROPLET_NAME="trader-01"
DROPLET_REGION="nyc3"          # see: doctl compute region list
DROPLET_SIZE="s-1vcpu-2gb"     # $12/mo, 2GB RAM. For $6/mo use s-1vcpu-1gb
DROPLET_IMAGE="ubuntu-24-04-x64"
SSH_KEY_PATH="${HOME}/.ssh/id_ed25519.pub"
FIREWALL_NAME="trader-fw"
TAG="ai-trader"
# -----------------------------------------------------

echo ">> Checking doctl is installed..."
command -v doctl >/dev/null || { echo "ERROR: doctl not installed. See script header."; exit 1; }

echo ">> Checking SSH public key exists at ${SSH_KEY_PATH}..."
if [[ ! -f "${SSH_KEY_PATH}" ]]; then
  echo "ERROR: No SSH key at ${SSH_KEY_PATH}."
  echo "Generate one with: ssh-keygen -t ed25519"
  exit 1
fi

echo ">> Authenticating doctl (paste your DO API token when prompted)..."
if ! doctl account get >/dev/null 2>&1; then
  doctl auth init
fi
doctl account get >/dev/null
echo "   Authenticated."

echo ">> Detecting your public IP for SSH firewall rule..."
MY_IP=$(curl -s https://api.ipify.org)
if [[ -z "${MY_IP}" ]]; then
  echo "ERROR: Could not detect public IP. Set MY_IP manually in the script."
  exit 1
fi
echo "   Your IP: ${MY_IP}"

echo ">> Registering SSH key with DigitalOcean..."
KEY_FINGERPRINT=$(ssh-keygen -E md5 -lf "${SSH_KEY_PATH}" | awk '{print $2}' | sed 's/MD5://')
if doctl compute ssh-key list --format FingerPrint --no-header | grep -q "${KEY_FINGERPRINT}"; then
  echo "   Key already registered."
  SSH_KEY_ID=$(doctl compute ssh-key list --format ID,FingerPrint --no-header | grep "${KEY_FINGERPRINT}" | awk '{print $1}')
else
  SSH_KEY_ID=$(doctl compute ssh-key import "${DROPLET_NAME}-key" --public-key-file "${SSH_KEY_PATH}" --format ID --no-header)
  echo "   Imported, ID: ${SSH_KEY_ID}"
fi

echo ">> Creating droplet (takes ~60s)..."
DROPLET_ID=$(doctl compute droplet create "${DROPLET_NAME}" \
  --region "${DROPLET_REGION}" \
  --size "${DROPLET_SIZE}" \
  --image "${DROPLET_IMAGE}" \
  --ssh-keys "${SSH_KEY_ID}" \
  --enable-backups \
  --enable-monitoring \
  --tag-name "${TAG}" \
  --wait \
  --format ID --no-header)
echo "   Droplet created, ID: ${DROPLET_ID}"

DROPLET_IP=$(doctl compute droplet get "${DROPLET_ID}" --format PublicIPv4 --no-header)
echo "   Droplet IP: ${DROPLET_IP}"

echo ">> Creating firewall..."
if doctl compute firewall list --format Name --no-header | grep -q "^${FIREWALL_NAME}$"; then
  echo "   Firewall already exists, skipping creation."
  FIREWALL_ID=$(doctl compute firewall list --format ID,Name --no-header | grep "${FIREWALL_NAME}" | awk '{print $1}')
else
  FIREWALL_ID=$(doctl compute firewall create \
    --name "${FIREWALL_NAME}" \
    --inbound-rules "protocol:tcp,ports:22,address:${MY_IP}/32" \
    --outbound-rules "protocol:tcp,ports:443,address:0.0.0.0/0,address:::/0 protocol:udp,ports:53,address:0.0.0.0/0,address:::/0 protocol:tcp,ports:53,address:0.0.0.0/0,address:::/0" \
    --tag-names "${TAG}" \
    --format ID --no-header)
  echo "   Firewall created, ID: ${FIREWALL_ID}"
fi

echo ">> Attaching firewall to droplet..."
doctl compute firewall add-droplets "${FIREWALL_ID}" --droplet-ids "${DROPLET_ID}"
echo "   Attached."

echo ">> Waiting 20s for SSH to come up..."
sleep 20

cat <<EOF

============================================================
  Droplet is up.

  IP:        ${DROPLET_IP}
  Hostname:  ${DROPLET_NAME}

  Next steps:

  1. Copy the hardening script to the droplet:
       scp 02-harden-droplet.sh root@${DROPLET_IP}:/root/

  2. SSH in and run it:
       ssh root@${DROPLET_IP}
       bash /root/02-harden-droplet.sh

  3. After hardening, SSH back in as the trader user:
       ssh trader@${DROPLET_IP}

  4. Copy and run the bootstrap script:
       scp 03-bootstrap-agent.sh trader@${DROPLET_IP}:~/
       ssh trader@${DROPLET_IP}
       bash ~/03-bootstrap-agent.sh
============================================================
EOF
#!/usr/bin/env bash
#
# 02-harden-droplet.sh  (patched)
# Run this ON THE DROPLET, ONCE. Can run as root (initial) or via `sudo bash`
# (after root SSH is disabled).
#
# What this script does:
#   - Waits for any background apt processes (cloud-init, unattended-upgrades)
#     to finish so we don't collide on the apt lock
#   - Updates system + installs base packages
#   - Creates 'trader' non-root user with sudo + SSH key (copied from root)
#   - Disables root SSH login and password auth, then VALIDATES sshd_config
#     before restarting (so a bad edit doesn't lock you out silently)
#   - Configures UFW: deny all in/out except SSH in, HTTPS + HTTP + DNS out
#   - Enables fail2ban, unattended security upgrades, and ssh.service at boot
#   - Installs Node.js 20.x LTS (Claude Code requires Node 18+)
#
# After this runs, log in as `trader@<droplet-ip>` and run script 03.

set -euo pipefail

if [[ $EUID -ne 0 ]]; then
  echo "ERROR: must run as root (use 'sudo bash $0' if root SSH is disabled)."
  exit 1
fi

echo ">> Updating apt and installing base packages..."
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get upgrade -y -qq
apt-get install -y -qq \
  ufw fail2ban unattended-upgrades \
  curl git jq ca-certificates gnupg \
  python3 python3-pip python3-venv \
  build-essential

echo ">> Installing Node.js 20.x..."
if ! command -v node >/dev/null || [[ $(node -v | sed 's/v\([0-9]*\).*/\1/') -lt 18 ]]; then
  curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
  apt-get install -y -qq nodejs
fi
echo "   Node: $(node -v), npm: $(npm -v)"

echo ">> Creating 'trader' user..."
if ! id trader &>/dev/null; then
  adduser --disabled-password --gecos "" trader
  usermod -aG sudo trader
  # Passwordless sudo for unattended operation. Remove this line if you want
  # to be prompted; just know it breaks fully-autonomous operation.
  echo "trader ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-trader
  chmod 440 /etc/sudoers.d/90-trader
fi

echo ">> Copying SSH authorized_keys to trader..."
mkdir -p /home/trader/.ssh
# Prefer root's key file if it exists; fall back to the running user's if not
if [[ -f /root/.ssh/authorized_keys ]]; then
  cp /root/.ssh/authorized_keys /home/trader/.ssh/
elif [[ -f "${SUDO_USER:-}/.ssh/authorized_keys" ]]; then
  cp "/home/${SUDO_USER}/.ssh/authorized_keys" /home/trader/.ssh/
fi
chown -R trader:trader /home/trader/.ssh
chmod 700 /home/trader/.ssh
chmod 600 /home/trader/.ssh/authorized_keys

echo ">> Locking down SSH..."
sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i 's/^#\?PubkeyAuthentication.*/PubkeyAuthentication yes/' /etc/ssh/sshd_config

# On Ubuntu 24.04 cloud images, sshd_config.d/50-cloud-init.conf can re-enable
# password auth. Override it.
mkdir -p /etc/ssh/sshd_config.d
cat > /etc/ssh/sshd_config.d/99-trader-hardening.conf <<'SSHHARDEN_EOF'
# Written by 02-harden-droplet.sh — takes precedence over earlier config files
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
SSHHARDEN_EOF

echo ">> Validating sshd config before restart..."
if ! sshd -t; then
  echo "ERROR: sshd_config is invalid. NOT restarting SSH to avoid lockout."
  echo "Fix the config and re-run this script."
  exit 1
fi
echo "   Config valid."

echo ">> Configuring local firewall (UFW)..."
ufw --force reset >/dev/null
ufw default deny incoming
ufw default deny outgoing
ufw allow 22/tcp comment 'SSH in'
ufw allow out 443/tcp comment 'HTTPS out'
ufw allow out 53/udp comment 'DNS UDP'
ufw allow out 53/tcp comment 'DNS TCP'
ufw allow out 80/tcp comment 'HTTP out for apt'
ufw --force enable

echo ">> Enabling fail2ban..."
systemctl enable --now fail2ban

echo ">> Enabling unattended security upgrades..."
dpkg-reconfigure -f noninteractive unattended-upgrades

echo ">> Ensuring ssh.service is enabled at boot..."
# Ubuntu 24.04 cloud images sometimes leave ssh.service disabled in favor of
# socket activation, which can fail in subtle ways. Explicitly enable both.
systemctl enable ssh.socket 2>/dev/null || true
systemctl enable ssh.service

echo ">> Restarting SSH..."
systemctl restart ssh
sleep 2
if ! systemctl is-active --quiet ssh; then
  echo "ERROR: ssh.service failed to start after restart."
  echo "Check with: systemctl status ssh"
  exit 1
fi
echo "   ssh.service active."

cat <<'EOF'

============================================================
  Droplet hardened.

  IMPORTANT: open a NEW terminal and verify you can SSH as
  the trader user BEFORE closing this session:

      ssh trader@<droplet-ip>

  Once that works:

  1. From your local machine:
       scp 03-bootstrap-agent.sh trader@<droplet-ip>:~/

  2. As trader on the droplet:
       bash ~/03-bootstrap-agent.sh
============================================================
EOF
#!/usr/bin/env bash
#
# 03-bootstrap-agent.sh  (HTTPS edition)
# Run this ON THE DROPLET, as the 'trader' user.
# Safe to re-run — idempotent throughout.
#
# What this script does:
#   - Installs Claude Code (skips if present)
#   - Prompts for Anthropic / Alpaca / GitHub PAT / budget — skips prompts for
#     values already saved in ~/.config/
#   - Creates the ~/workspace tree
#   - Writes MANDATE.md, budget.py, heartbeat.sh, kill-switch.sh
#   - Verifies Alpaca with a live API call
#   - Verifies GitHub via HTTPS push (port 443, no firewall hole needed)
#   - Initializes git in the workspace (if not already), points remote at the
#     GitHub HTTPS URL, pushes initial commit
#   - Installs the cron schedule

set -euo pipefail

if [[ "$(whoami)" != "trader" ]]; then
  echo "ERROR: run as the 'trader' user, not $(whoami)."
  exit 1
fi

cd "$HOME"

# ---------- Claude Code ----------
echo ">> Checking Claude Code..."
if ! command -v claude >/dev/null; then
  echo "   Installing Claude Code..."
  sudo npm install -g @anthropic-ai/claude-code
else
  echo "   Already installed: $(claude --version 2>/dev/null || echo present)"
fi

# ---------- Workspace + config dirs ----------
echo ">> Ensuring workspace tree exists..."
mkdir -p ~/workspace/{strategies,journal,data,logs,scripts,budget}
mkdir -p ~/.config

# ---------- Prompts (skip ones we already have) ----------
echo
echo "============================================================"
echo "  Credentials and config. Secrets stored at ~/.config/"
echo "  with mode 0600. Existing values are kept; you'll only be"
echo "  prompted for the missing ones."
echo "============================================================"
echo

if [[ ! -f ~/.config/anthropic ]]; then
  read -r -s -p "Anthropic API key (sk-ant-...): " ANTHROPIC_KEY; echo
  cat > ~/.config/anthropic <<EOF
export ANTHROPIC_API_KEY="${ANTHROPIC_KEY}"
EOF
  chmod 600 ~/.config/anthropic
  unset ANTHROPIC_KEY
else
  echo "   anthropic key: already set"
fi

if [[ ! -f ~/.config/alpaca ]]; then
  read -r -s -p "Alpaca PAPER API key (PK...): " ALPACA_KEY; echo
  read -r -s -p "Alpaca PAPER secret: " ALPACA_SECRET; echo
  cat > ~/.config/alpaca <<EOF
export ALPACA_API_KEY="${ALPACA_KEY}"
export ALPACA_SECRET_KEY="${ALPACA_SECRET}"
export ALPACA_BASE_URL="https://paper-api.alpaca.markets"
EOF
  chmod 600 ~/.config/alpaca
  unset ALPACA_KEY ALPACA_SECRET
else
  echo "   alpaca keys: already set"
fi

if [[ ! -f ~/.config/github ]]; then
  read -r -p   "GitHub username (e.g. ChromeDomeWebDesigns): " GH_USER
  read -r -p   "GitHub repo name [trader-01]: " GH_REPO
  GH_REPO=${GH_REPO:-trader-01}
  read -r -s -p "GitHub fine-grained PAT (github_pat_...): " GH_TOKEN; echo
  cat > ~/.config/github <<EOF
export GITHUB_USER="${GH_USER}"
export GITHUB_REPO="${GH_REPO}"
export GITHUB_TOKEN="${GH_TOKEN}"
EOF
  chmod 600 ~/.config/github
  # Write credentials helper file for git
  cat > ~/.git-credentials <<EOF
https://${GH_USER}:${GH_TOKEN}@github.com
EOF
  chmod 600 ~/.git-credentials
  git config --global credential.helper store
  unset GH_USER GH_REPO GH_TOKEN
else
  echo "   github config: already set"
fi

if [[ ! -f ~/.config/budget ]]; then
  read -r -p "Monthly Anthropic budget in USD [100]: " MONTHLY_BUDGET
  MONTHLY_BUDGET=${MONTHLY_BUDGET:-100}
  cat > ~/.config/budget <<EOF
export MONTHLY_BUDGET_USD="${MONTHLY_BUDGET}"
EOF
  chmod 600 ~/.config/budget
else
  echo "   budget config: already set"
fi

read -r -p "Default git branch [main]: " GIT_BRANCH
GIT_BRANCH=${GIT_BRANCH:-main}

# Load everything into this shell
source ~/.config/anthropic
source ~/.config/alpaca
source ~/.config/github
source ~/.config/budget

# Make secrets load on every shell login
if ! grep -q "source ~/.config/anthropic" ~/.bashrc 2>/dev/null; then
  {
    echo "source ~/.config/anthropic"
    echo "source ~/.config/alpaca"
    echo "source ~/.config/github"
    echo "source ~/.config/budget"
  } >> ~/.bashrc
fi

# ---------- Verify Alpaca ----------
echo
echo ">> Verifying Alpaca paper credentials..."
HTTP_CODE=$(curl -s -o /tmp/alpaca_check.json -w "%{http_code}" \
  -H "APCA-API-KEY-ID: ${ALPACA_API_KEY}" \
  -H "APCA-API-SECRET-KEY: ${ALPACA_SECRET_KEY}" \
  "${ALPACA_BASE_URL}/v2/account")

if [[ "${HTTP_CODE}" != "200" ]]; then
  echo "ERROR: Alpaca returned HTTP ${HTTP_CODE}. Response:"
  cat /tmp/alpaca_check.json
  exit 1
fi

PAPER_VALUE=$(jq -r '.portfolio_value' /tmp/alpaca_check.json)
PAPER_STATUS=$(jq -r '.status' /tmp/alpaca_check.json)
echo "   Alpaca OK. Status: ${PAPER_STATUS}, paper portfolio: \$${PAPER_VALUE}"
rm -f /tmp/alpaca_check.json

# ---------- Verify GitHub (HTTPS, port 443) ----------
echo
echo ">> Verifying GitHub HTTPS access..."
GH_CODE=$(curl -s -o /tmp/gh_check.json -w "%{http_code}" \
  -H "Authorization: token ${GITHUB_TOKEN}" \
  "https://api.github.com/repos/${GITHUB_USER}/${GITHUB_REPO}")
if [[ "${GH_CODE}" != "200" ]]; then
  echo "ERROR: GitHub returned HTTP ${GH_CODE}. Response:"
  cat /tmp/gh_check.json
  echo
  echo "Check that the PAT has Contents: Read and write on ${GITHUB_USER}/${GITHUB_REPO}."
  exit 1
fi
echo "   GitHub OK. Repo: ${GITHUB_USER}/${GITHUB_REPO}"
rm -f /tmp/gh_check.json

# ---------- Budget ledger helper ----------
echo ">> Writing budget ledger helper..."
cat > ~/workspace/scripts/budget.py <<'BUDGET_EOF'
#!/usr/bin/env python3
"""Budget ledger for the trader agent."""
import json, sys, os, datetime as dt
from pathlib import Path

LEDGER_DIR  = Path.home() / "workspace" / "budget"
LEDGER_FILE = LEDGER_DIR / "ledger.jsonl"
STATUS_FILE = LEDGER_DIR / "STATUS.md"

# USD per 1M tokens. Claude Sonnet 4.x rates as of bootstrap. Adjust if needed.
PRICE_INPUT_PER_M       = 3.00
PRICE_OUTPUT_PER_M      = 15.00
PRICE_CACHE_WRITE_PER_M = 3.75
PRICE_CACHE_READ_PER_M  = 0.30

MONTHLY_BUDGET = float(os.environ.get("MONTHLY_BUDGET_USD", "100"))
SOFT_DAILY_CAP = MONTHLY_BUDGET / 22.0
HARD_DAILY_CAP = MONTHLY_BUDGET / 15.0

def cost(inp, out, cw=0, cr=0):
    return (inp*PRICE_INPUT_PER_M + out*PRICE_OUTPUT_PER_M +
            cw*PRICE_CACHE_WRITE_PER_M + cr*PRICE_CACHE_READ_PER_M) / 1_000_000

def log_usage(inp, out, cw=0, cr=0, note=""):
    LEDGER_DIR.mkdir(parents=True, exist_ok=True)
    e = {"ts": dt.datetime.utcnow().isoformat()+"Z",
         "date": dt.date.today().isoformat(),
         "input_tokens": int(inp), "output_tokens": int(out),
         "cache_write_tokens": int(cw), "cache_read_tokens": int(cr),
         "cost_usd": round(cost(inp,out,cw,cr), 6), "note": note}
    with LEDGER_FILE.open("a") as f:
        f.write(json.dumps(e)+"\n")
    return e

def totals():
    if not LEDGER_FILE.exists(): return 0.0, 0.0
    today, month = dt.date.today().isoformat(), dt.date.today().isoformat()[:7]
    daily = monthly = 0.0
    with LEDGER_FILE.open() as f:
        for line in f:
            if not line.strip(): continue
            e = json.loads(line)
            if e["date"] == today:        daily   += e["cost_usd"]
            if e["date"].startswith(month): monthly += e["cost_usd"]
    return daily, monthly

def write_status():
    daily, monthly = totals()
    today = dt.date.today()
    nxt = today.replace(year=today.year+1, month=1, day=1) if today.month==12 \
          else today.replace(month=today.month+1, day=1)
    days_left = (nxt - today).days
    pace = (MONTHLY_BUDGET - monthly) / max(days_left, 1)
    s = f"""# Budget Status
Generated: {dt.datetime.utcnow().isoformat()}Z

## This month
- Monthly budget:        ${MONTHLY_BUDGET:.2f}
- Spent month-to-date:   ${monthly:.4f}
- Remaining this month:  ${MONTHLY_BUDGET-monthly:.4f}
- Days left in month:    {days_left}
- Sustainable daily pace: ${pace:.4f}/day

## Today
- Soft daily cap:        ${SOFT_DAILY_CAP:.4f}
- Hard daily cap:        ${HARD_DAILY_CAP:.4f}
- Spent today:           ${daily:.4f}
- Remaining today:       ${HARD_DAILY_CAP-daily:.4f}

## Guidance for the agent
- If remaining today is low: defer non-urgent research, keep cycle short.
- If month-to-date is ahead of pace: reduce depth, batch work.
- If remaining this month < $5: minimum-viable cycles only.
- Hard daily cap is enforced by heartbeat — exceeding it skips next cycle.
"""
    STATUS_FILE.write_text(s)
    return s

if __name__ == "__main__":
    cmd = sys.argv[1] if len(sys.argv) > 1 else "status"
    if cmd == "log":
        args = sys.argv[2:]
        e = log_usage(int(args[0]), int(args[1]),
                      int(args[2]) if len(args)>2 else 0,
                      int(args[3]) if len(args)>3 else 0,
                      args[4] if len(args)>4 else "")
        write_status()
        print(json.dumps(e, indent=2))
    elif cmd == "status":
        print(write_status())
    elif cmd == "check-daily":
        d,_ = totals()
        if d >= HARD_DAILY_CAP:
            print(f"OVER: ${d:.4f} >= ${HARD_DAILY_CAP:.4f}", file=sys.stderr); sys.exit(1)
        print(f"under: ${d:.4f} / ${HARD_DAILY_CAP:.4f}")
    elif cmd == "check-monthly":
        _,m = totals()
        if m >= MONTHLY_BUDGET:
            print(f"OVER: ${m:.4f} >= ${MONTHLY_BUDGET:.4f}", file=sys.stderr); sys.exit(1)
        print(f"under: ${m:.4f} / ${MONTHLY_BUDGET:.4f}")
    else:
        print(f"unknown command: {cmd}", file=sys.stderr); sys.exit(2)
BUDGET_EOF
chmod +x ~/workspace/scripts/budget.py

python3 ~/workspace/scripts/budget.py status >/dev/null
echo "   Budget ledger ready. Monthly cap: \$${MONTHLY_BUDGET_USD}"

# ---------- MANDATE.md ----------
echo ">> Writing MANDATE.md..."
cat > ~/workspace/MANDATE.md <<'MANDATE_EOF'
# Your Mandate

You are an autonomous research-and-trading agent. The human operator set you up
and gave you full latitude. They are observers only.

## Goal
Study successful low-velocity day trading strategies (holding periods minutes to
hours, not seconds; not HFT). Develop a trading theory. Implement it in code.
Paper-trade it through Alpaca. Journal your reasoning and results. Iterate
based on what you learn. Continue until the operator tells you to stop.

## Hard Constraints
1. Paper trading ONLY. Account is paper-only at the Alpaca account level.
2. Max position size: 10% of paper portfolio per trade.
3. Max daily trades: 20.
4. Stop trading and write a STOP_ALERT.md in ~/workspace if paper portfolio
   drops below 80% of starting value.
5. All code runs in ~/workspace. Do not modify anything outside it.
6. Do not attempt to access live brokerages, real money APIs, or the operator's
   other systems.
7. Do not install system packages without journaling why. Python packages in a
   venv at ~/workspace/.venv are fine.

## Budget Awareness — read every wake-up

The operator's monthly Anthropic API budget pays for YOU to think. If you burn
through it, the experiment ends.

**At the start of every wake-up, read ~/workspace/budget/STATUS.md.** It shows
the monthly budget, spend today, remaining today/this month, and a sustainable
daily pace.

Expected behaviors:
- Match cycle depth to remaining budget. Ahead of pace → can deep-research.
  Behind → minimum-viable cycle: check positions, honor stops, brief journal.
- Prefer fewer thoughtful cycles over many shallow ones.
- Don't repeat research already done. Read prior journal entries.
- Batch related work into one cycle.
- Weekends/overnight: do less, not more.
- If STATUS.md says monthly remaining < $5: bare-minimum cycles only.

The hard daily cap is enforced by the heartbeat. Exceeding it skips your next
cycle automatically. Plan to be brief by choice, not by being cut off.

After each cycle, the heartbeat logs your token usage automatically.

## Workflow each wake-up
1. **Read ~/workspace/budget/STATUS.md first.** Decide cycle depth from it.
2. Read the most recent journal entry in ~/workspace/journal/
3. Source ~/.config/alpaca for API keys
4. Check market status and open positions via Alpaca API
5. Execute strategy OR research/iterate if markets closed
6. Write new journal entry: ~/workspace/journal/YYYY-MM-DD-HHMM.md
   Cover what you did, why, what you observed, what's next. Include a one-line
   note on the budget situation at the start of the cycle.
7. Commit AND push to GitHub:
       cd ~/workspace
       git add -A
       git commit -m "<descriptive message>"
       git push origin main
   Remote is preconfigured for HTTPS auth. Operator reads journal on GitHub.

## Resources
- Alpaca paper API: env vars ALPACA_API_KEY, ALPACA_SECRET_KEY, ALPACA_BASE_URL
- Free market data: Alpaca data API, yfinance Python package
- Python venv at ~/workspace/.venv (create with `python3 -m venv .venv` if absent)
- Budget: ~/workspace/budget/STATUS.md

## Tone
You're running an experiment. Be rigorous, skeptical of your own ideas, honest
about what's not working. The operator values learning over P&L. A thoughtful
experiment that runs the full month is worth far more than a flashy one that
burns the budget in a week.
MANDATE_EOF

# ---------- Heartbeat ----------
echo ">> Writing heartbeat scheduler..."
cat > ~/workspace/scripts/heartbeat.sh <<'HEARTBEAT_EOF'
#!/usr/bin/env bash
set -e

source /home/trader/.config/anthropic
source /home/trader/.config/alpaca
source /home/trader/.config/github
source /home/trader/.config/budget

cd /home/trader/workspace
LOG="logs/heartbeat-$(date +%Y%m%d).log"

{
  echo "=== Wake-up at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
  python3 scripts/budget.py status >/dev/null

  if ! python3 scripts/budget.py check-monthly; then
    echo "MONTHLY BUDGET EXHAUSTED — skipping."
    exit 0
  fi
  if ! python3 scripts/budget.py check-daily; then
    echo "DAILY CAP HIT — skipping until tomorrow."
    exit 0
  fi

  RESULT_FILE=$(mktemp)
  claude --dangerously-skip-permissions \
    --append-system-prompt "$(cat /home/trader/workspace/MANDATE.md)" \
    --print \
    --output-format json \
    "Wake up. Read ~/workspace/budget/STATUS.md first. Read your most recent journal entry in ~/workspace/journal/. Run your loop per the mandate. Write a new journal entry. Commit and push to GitHub before you finish." \
    > "$RESULT_FILE" || true

  jq -r '.result // .text // empty' "$RESULT_FILE" 2>/dev/null || cat "$RESULT_FILE"

  INPUT=$(jq  -r '.usage.input_tokens                // 0' "$RESULT_FILE" 2>/dev/null || echo 0)
  OUTPUT=$(jq -r '.usage.output_tokens               // 0' "$RESULT_FILE" 2>/dev/null || echo 0)
  CW=$(jq     -r '.usage.cache_creation_input_tokens // 0' "$RESULT_FILE" 2>/dev/null || echo 0)
  CR=$(jq     -r '.usage.cache_read_input_tokens     // 0' "$RESULT_FILE" 2>/dev/null || echo 0)

  python3 scripts/budget.py log "$INPUT" "$OUTPUT" "$CW" "$CR" "heartbeat $(date -u +%H:%MZ)"
  python3 scripts/budget.py status >/dev/null
  rm -f "$RESULT_FILE"

  echo "=== Sleep at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
  echo
} >> "$LOG" 2>&1
HEARTBEAT_EOF
chmod +x ~/workspace/scripts/heartbeat.sh

# ---------- Kill switch ----------
echo ">> Writing kill switch helper..."
cat > ~/workspace/scripts/kill-switch.sh <<'KILL_EOF'
#!/usr/bin/env bash
echo "Removing cron schedule..."
crontab -r 2>/dev/null || true
echo "Done. Agent will not wake up again until cron is reinstalled."
echo
echo "HARD stop additionally requires:"
echo "  - Revoke Anthropic API key in console"
echo "  - Revoke Alpaca API keys in dashboard"
echo "  - Revoke or delete the GitHub PAT in GitHub settings"
KILL_EOF
chmod +x ~/workspace/scripts/kill-switch.sh

# ---------- Git: ensure repo exists and remote is correct ----------
echo ">> Ensuring git repo and remote..."
cd ~/workspace
if [[ ! -d .git ]]; then
  git init -q -b "${GIT_BRANCH}"
fi
git config user.email "trader-agent@$(hostname)"
git config user.name  "Trader Agent"

# Ensure on the chosen branch
CURRENT_BRANCH=$(git symbolic-ref --short HEAD 2>/dev/null || echo "")
if [[ "${CURRENT_BRANCH}" != "${GIT_BRANCH}" && -n "${CURRENT_BRANCH}" ]]; then
  git branch -m "${CURRENT_BRANCH}" "${GIT_BRANCH}"
fi

# Update .gitignore + README (idempotent)
cat > .gitignore <<'GIT_EOF'
.venv/
__pycache__/
*.pyc
*.pyo
logs/
data/
.env
*.key
budget/ledger.jsonl
GIT_EOF

if [[ ! -f README.md ]]; then
  cat > README.md <<README_EOF
# Trader Agent Workspace

Autonomous AI trading experiment. Paper trading via Alpaca.
See [MANDATE.md](MANDATE.md), [journal/](journal/), [budget/STATUS.md](budget/STATUS.md).

Started: $(date -u +%Y-%m-%d)
Starting paper portfolio: \$${PAPER_VALUE}
Monthly Anthropic budget: \$${MONTHLY_BUDGET_USD}
README_EOF
fi

# Remote
REMOTE_URL="https://github.com/${GITHUB_USER}/${GITHUB_REPO}.git"
if git remote get-url origin >/dev/null 2>&1; then
  git remote set-url origin "${REMOTE_URL}"
else
  git remote add origin "${REMOTE_URL}"
fi

# Commit any pending changes (the new MANDATE, scripts, etc.)
if ! git diff --quiet HEAD 2>/dev/null || [[ -n "$(git status --porcelain)" ]]; then
  git add -A
  git commit -q -m "Bootstrap: MANDATE, scripts, budget tracker"
fi

echo ">> Pushing to GitHub..."
git push -u origin "${GIT_BRANCH}"

# ---------- Cron ----------
echo ">> Installing cron schedule..."
CRON_TMP=$(mktemp)
crontab -l 2>/dev/null > "${CRON_TMP}" || true
sed -i '/heartbeat.sh/d' "${CRON_TMP}"
cat >> "${CRON_TMP}" <<'CRON_EOF'
# Trader agent heartbeat
*/30 13-20 * * 1-5  /home/trader/workspace/scripts/heartbeat.sh
0 20 * * *          /home/trader/workspace/scripts/heartbeat.sh
CRON_EOF
crontab "${CRON_TMP}"
rm -f "${CRON_TMP}"

echo
echo "============================================================"
echo "  Bootstrap complete."
echo
echo "  Paper portfolio starting value: \$${PAPER_VALUE}"
echo "  Monthly Anthropic budget:       \$${MONTHLY_BUDGET_USD}"
echo "  GitHub repo: ${REMOTE_URL}"
echo
echo "  Test the agent manually with one wake-up:"
echo "       ~/workspace/scripts/heartbeat.sh"
echo "       cat ~/workspace/budget/STATUS.md"
echo "       tail -100 ~/workspace/logs/heartbeat-\$(date +%Y%m%d).log"
echo "  Then check GitHub — you should see a new commit from 'Trader Agent'."
echo
echo "  Cron is installed. Agent will wake on its own from now on."
echo
echo "  To stop:"
echo "       ~/workspace/scripts/kill-switch.sh   (soft, removes cron)"
echo "       Revoke keys in Anthropic + Alpaca    (hard, instant)"
echo "============================================================"

Once script 03 finished, the cron job took over and I never touched the machine again until I spun it down yesterday morning. That's the moment AI-DD stopped being a phrase and started being a thing running on a Linux box 24/7 without me.

How the autonomy actually works

Before the play-by-play, here's the architecture, because it's genuinely the cleverest decision the agent made and it generalizes way beyond trading.

The agent doesn't run continuously. It wakes up on a cron heartbeat, does one "cycle" of work, journals it, commits to GitHub, and goes back to sleep. Every wake-up is a fresh, stateless Claude Code invocation that re-reads its own budget and its last journal entry to figure out where it left off. Here's the loop:

bash
# scripts/heartbeat.sh (trimmed)
if ! python3 scripts/budget.py check-monthly; then
  echo "MONTHLY BUDGET EXHAUSTED — skipping."; exit 0
fi
if ! python3 scripts/budget.py check-daily; then
  echo "DAILY CAP HIT — skipping until tomorrow."; exit 0
fi

claude --dangerously-skip-permissions \
  --append-system-prompt "$(cat /home/trader/workspace/MANDATE.md)" \
  --print --output-format json \
  "Wake up. Read ~/workspace/budget/STATUS.md first. Read your most recent
   journal entry. Run your loop per the mandate. Write a new journal entry.
   Commit and push to GitHub before you finish." \
  > "$RESULT_FILE" || true

# ...then log the token usage from the JSON back into the budget ledger

Two things I want to call out, because they're the transferable bits:

It split reflexes from judgment. The mandate told it: anything mechanical and time-sensitive: polling a price every minute, closing a position at a deadline, belongs in a simple deterministic script run by cron, not a Claude wake-up. The expensive model-driven cycles are for research and decisions. So it built a separate minute-poller daemon for the actual entry logic and kept the hourly "brain" cycle for reading what the daemon did and deciding whether anything needed to change. You don't pay tokens for what a cron job can do. That principle survives the trading context completely intact.

It could see its own budget. Every single wake-up started by reading a STATUS.md that looked like this:

code
## This month
- Monthly budget:        $100.00
- Spent month-to-date:   $33.5849
- Remaining this month:  $66.4151
- Sustainable daily pace: $3.4955/day

And it actually reasoned about it, deep research cycles when it was ahead of pace, bare-minimum no-ops when it was behind. Giving an agent a resource it can observe and a rule for spending it produced genuinely cost-proportional behavior. It never once blew a cap.

Walking through what the agent did

Week 1: a research engineer shows up to work (8/10)

This week was honestly impressive. From a broken scaffold to a validated, daemon-driven system in five days:

  • It read its own starter code, found the live-order path was using a dead alpaca-py API, and fixed it before doing anything else.
  • It wrote a real backtest harness, pulled a full year of QQQ 1-minute bars (~225k bars over 250 days), and ran a parameter sweep.
  • Crucially, it did a walk-forward TRAIN/TEST split instead of just fitting in-sample. It watched a TRAIN-optimal +52.82R collapse to +19.91R out-of-sample and used that as its own lesson about overfitting, rather than reporting the inflated number. That's a level of intellectual honesty I did not expect to get for free.
  • It settled on a v0.3 strategy: 1.5R target, a 15:00 ET time-stop, and a longs-only regime filter (only trade when the prior close is above the 20-day SMA), because that was the only variant that beat baseline out-of-sample.

It also caught a skip-state bug that was silently dropping ~24% of trades by reading the source rather than guessing.

And here's the part I keep showing people. On day 4, it noticed its own previous journal entry didn't match what the daemon actually logged — and instead of quietly moving on, it published an errata table confessing it had made things up:

code
| field        | cycle 26 claim | actual (per daemon log)        |
| ------------ | -------------- | ------------------------------ |
| ORH          | 711.07         | 709.955                        |
| Entry time   | 10:08 ET       | 10:52 ET                       |
| Confirmation | "2-bar confirm"| single-bar break, no confirm   |

"the '2-bar confirm' detail was fabricated — I think I conflated v0.3 with an earlier iteration that did have confirmation logic. Lesson: stop summarizing the daemon log from memory. Use the log."

An AI catching its own hallucination, documenting it in a table, and writing itself a rule to prevent it. That's the dream version of AI-DD. Hold that thought.

Weeks 2–4: discipline, and the first crack (6 / 6 / 7)

Now the strategy is built and the daemon is running in shadow mode, logging what it would have done without actually doing it. These weeks were all observation, and the agent was disciplined to a fault.

The good: it used backtests to kill ideas, not just bless them. When it tested adding shorts, the result turned +0.73R into −3.90R, so shorts stayed disabled. It was relentlessly honest about sample size: "n=3 is a lucky streak, not evidence" shows up basically verbatim more than once. It never overclaimed.

The crack: this is where the "cry wolf" pattern started. The agent confused QQQ's opening range (~$700) with a SPY quote (~$734), decided the data feed was broken, and raised a CRITICAL "phantom feed" alarm. About 18 hours later it figured out its own mistake and cleanly retracted it. Self-correction worked. But it was an avoidable error, and foreshadowing it wrote a feedback memory specifically to keep it from happening again.

Journal Entries from the AI Trader

Week 5: the tell (4/10)

By the endgame the pattern was undeniable. After 203 cycles, the agent had:

  • Never gone live. Not once. dry_run stayed True the entire month.
  • Never declared the strategy validated or killed it and moved to the next one (it had three other strategies scoped and never touched them).
  • Settled into a perfectly stable, perfectly pointless equilibrium: wake up, read the daemon state, confirm the filters behaved, journal a hypothetical, sleep. Repeat.

And the SPY/QQQ "data divergence" false alarm? It fired two more times despite the feedback memory the agent wrote weeks earlier explicitly to prevent it, sitting right there in context. It self-corrected all three times. It prevented the recurrence zero times.

The trend in the weekly scores — 8 → 6 → 6 → 7 → 4 — is the story. The skill ceiling on display in week one was a 8. The unwillingness to act on it dragged the back half down to a 4.

The results

Let's be blunt about the scoreboard, because it's the whole punchline:

  • Equity: $100,000.00, start to finish. Zero realized P&L. Zero positions ever held. Not one paper order submitted.
  • Every performance number it ever produced was a shadow estimate, what the analyzer computed the strategy would have done. The best of those was roughly +$201 over 9 sessions, which on a $100k base is about 0.2% in a month. Thin, fragile, and well within noise before you even subtract the slippage and data-fidelity haircuts the agent itself flagged.
  • It spent roughly $100 of real compute to produce $0 of realized P&L, because it never traded.

Alpaca No Performance

Here's the honest scorecard I landed on:

Dimension Grade
Engineering / build quality A−
Backtest rigor B+
Cost discipline A−
Risk management A
Self-correction (detection) A
Self-correction (prevention) C
Closing the loop (the actual mandate) D

An A-grade research engineer and risk manager who never made the one decision the entire experiment existed to make. Call it 5.5 / 10.

What I actually learned (and why it's not about trading)

I picked trading as a stress test, but almost none of what I learned is about markets. It's about what happens when you genuinely let an AI drive. Five things I'm taking with me into every future AI-DD project:

1. The mandate is the program. This is the big one. My mandate said the operator "values learning over P&L" and that "a thoughtful experiment that runs the full month is worth far more than a flashy one that burns the budget in a week." The agent optimized that perfectly. It ran a thoughtful month, never blew the budget, and never traded because nothing in my words made acting the priority. It didn't disobey me. It did exactly what I said, right past what I wanted. When you write a mandate for an autonomous agent, you are writing the spec for the product. Vague intentions become literal behavior.

2. Detection is not prevention. The single most important lesson. This agent could find anything, bugs in its own source, its own hallucinations, sample-size traps. But it could not reliably stop itself from repeating a mistake, even with a note it wrote to itself loaded directly into context. "I'll write a memory so it doesn't happen again" is not a fix. If you need a guarantee in an agentic system, enforce it in a hook, not in a markdown file the model is trusted to consult and obey. The same SPY/QQQ bug three times is the whole argument.

3. Split the brain from the reflexes. The daemon-vs-Claude-cycle architecture is the pattern I'd reuse tomorrow on something completely unrelated. Deterministic, time-sensitive, cheap → script it. Judgment, learning, synthesis → spend the model. Most "agent is too slow / too expensive" problems are really "you asked the model to do a cron job's work."

4. Give it a budget it can see. A visible resource plus an explicit spending rule produced real, sensible, proportional behavior with zero supervision. This generalizes to anything with a cost: API spend, compute, time, rate limits. If you want an agent to be frugal, don't just hope, hand it a STATUS.md and a pace.

5. Make it narrate, and make it commit. The only reason I could trust any of this, and the only reason I caught the fabrication, is that the agent journaled its reasoning and pushed to GitHub every single cycle. The audit trail is the trust mechanism. In AI-DD, your job shifts from typing code to reading a trail, so the agent had better leave one.

The meta-lesson under all of those: autonomy doesn't add capabilities, it amplifies whatever tendencies the model already has. This agent was cautious, honest, and thorough, so unsupervised, it became cautious to the point of paralysis, honest to the point of confessing sins nobody would've caught, and thorough to the point of re-validating a frozen design for a month. The traits are the same ones that make it a great pair-programmer. Take away the human who says "okay, ship it," and they curdle into a system that builds forever and decides never.

Where I'd take AI-DD next

If I ran this again and I will, probably not with trading here's what changes:

  • Challenge the agent to iterate faster but with more purpose. Instead of language like "go until I stay stop", I would give more direct instrutions like "you have X amount of days to begin to produce a result". The idea being it will force the agent to focus on results due to the time pressure to do something
  • Balance the tradeoff between cost and results. The original mandate was very concerned about wasteful spending and thus tried to find value in the AI spend. The put too much caution on the agent. In the future the mandates will be more focused on results and the guardrails will enforce the cost constraints.
  • Enforce guardrails in code, not memory. In the future, I will be explicit about using Claude Hooks to enforce hard set rules rather than trusting the agent to remember every cycle.
  • Reconsider how far I can remove myself. AI-DD is incredibly powerful when given proper specs and mandates. However, it was naive of myself to think it could just operate in a silo with maximum effect. In the future I will reconsider a better cadence to collaborate with the autonomous agents while still allow the agent to carry the bulk of the operational load.

That's the real headline for me. We spend a lot of energy asking whether the AI is capable enough. This month the capability was never the bottleneck, the week-one work proves that. The bottleneck was the spec I wrote and the judgment I delegated. Autonomous AI development isn't blocked on smarter models. It's blocked on us getting good at telling them, precisely, what "done" means.

The cost ledger

text
TRADER-01 — COST LEDGER
2026-05-10 → 2026-06-12 (33 days)
=======================================================================
DATE        DESCRIPTION                         DEBIT     CREDIT   BAL
-----------------------------------------------------------------------
2026-05-10  Anthropic API budget (allotted)               $100.00  $100.00
2026-05-11  API spend — May (heartbeat cycles)  $69.25             $ 30.75 
2026-06-12  API spend — Jun MTD (to the 12th)   $30.75             $  0.00
-----------------------------------------------------------------------
            API SPEND SUBTOTAL                  $100.00            $0.00
=======================================================================
INFRASTRUCTURE (separate from API budget)
-----------------------------------------------------------------------
2026-05-10  Alpaca paper account                $ 0.00
2026-05-10  DigitalOcean droplet (2GB, backups) $11.68
2026-06-12  DigitalOcean droplet (2GB, backups) $ 6.59
-----------------------------------------------------------------------
            INFRA SUBTOTAL                      $18.27
=======================================================================
            TOTAL OUT OF POCKET                 $118.27
            REALIZED TRADING P&L                    0.00
            ---------------------------------------------
            NET                                  ($118.27)
=======================================================================
Notes: 211 commits · 203 logged cycles · 0 trades placed · equity $100,000.00

The full repo, every journal entry, and all 211 commits are public at ChromeDomeWebDesigns/trader-01. Go give it a read if you'd like to see more!

Written with ❤️ alongside Claude w/ many many human edits