Template

AI paper trading agent evaluation checklist

Use this checklist after a simulated AI trading agent has enough paper evidence to review. The goal is to decide whether to keep testing, tighten one rule, reduce simulated risk, or retire the workflow, not to approve live execution.

Paper-only evaluation

Trading Boy does not execute live trades, hold funds, or provide financial advice. This checklist evaluates simulated paper-trading evidence only. It cannot prove live fills, future returns, emotional discipline, or suitability for real capital.

Evaluation checklist

Score each row before changing an agent prompt, persona, risk rule, or watchlist. A failed row should usually create another paper test, not a broad rewrite.

GateQuestionPass evidenceStop condition
Version evidenceCan the sample be tied to one prompt, persona, rule, and market frame?Every decision has a version label and review window.The sample mixes unlabelled prompt or rule changes.
Rule fitDid entries and skips follow the written setup and invalidation rules?Journal notes cite the rule before the outcome is known.The agent explains decisions after seeing the result.
Risk behaviorDid paper size, drawdown, and exposure stay inside the written limit?Risk checks pass on winners and losers.Paper gains depend on breaking the risk rule.
Skip disciplineDid the agent document when it did nothing?Skips include the blocked condition and next review question.Only entries are logged, so discipline is invisible.
Journal qualityCan a reviewer audit thesis, invalidation, result, mistake tag, and next action?The journal contains structured fields for each decision.The record contains confident summaries without evidence.
Sample qualityIs the sample large enough and complete enough to inspect?Entries, exits, skips, misses, dates, and exclusions are visible.Only selected winners or exciting trades are included.
Change decisionIs the next action narrow and testable?The review chooses one change or more paper collection.The review rewrites many variables from one small sample.

Example completed checklist

Sample: A simulated AI paper-trading agent recorded 42 decisions across four weeks. The version label, watchlist, and output format stayed stable. The journal includes 18 entries, 16 skips, 6 exits, and 2 missed-trade notes.

Checklist result: Version evidence passes. Skip discipline passes. Journal quality mostly passes. Risk behavior fails because three paper entries exceeded the stated size cap after the agent described confidence as high.

Decision: The agent is not promoted, discarded, or moved toward live execution. The next paper test uses the same watchlist and setup rule, but the output format must include a hard size cap field before the agent can log an entry. The team then reviews the next sample with the same checklist.

Use with versioning

When a checklist produces a rule change, record it with AI trading agent prompt versioning. The next sample should make it clear which behavior changed and which variables stayed fixed.

Pair the final review with paper-trading limitations so the result stays framed as simulated evidence.

Checklist outputs

A completed checklist should produce one conservative output. Acceptable outputs include keep collecting paper evidence, tighten one output field, reduce simulated size, add one skip condition, split the sample by market regime, or retire the paper workflow. Avoid labels like approved, ready, or safe because paper evidence cannot prove those claims.

The strongest output is often boring: no change yet. If the agent followed rules, respected paper risk, and logged complete evidence, but the sample is still small, more observation may be the best next action. Paper trading becomes less useful when every short sample produces a prompt rewrite.

Reviewer notes to keep

Keep the completed checklist beside the sample, not separate from it. A useful review note names the version, date range, market regime, number of entries, number of skips, number of excluded records, and the exact checklist row that failed. That makes the next review easier because a second reviewer can see whether the issue was risk behavior, incomplete output, thin sample size, or unclear setup language.

Do not turn the checklist into a performance badge. A passing checklist means the paper evidence is cleaner, not that the agent is profitable, safe, or ready for live capital. The next action should still be a paper-mode action: collect another sample, tighten one field, or compare the same version across a different review window.

AI paper trading agent evaluation checklist FAQ

What should an AI paper trading agent evaluation checklist include?

It should include rule fit, prompt version, journal quality, risk behavior, skipped trades, sample size, drawdown, market regime, and the next paper-test decision.

Can this checklist approve a live trading agent?

No. The checklist reviews simulated evidence only. Trading Boy does not execute live trades, hold funds, or provide financial advice.

When should an AI paper agent be changed?

Change one rule only when repeated paper evidence identifies a specific behavior problem, such as unclear invalidation, oversized simulated risk, late entries, or missing skip discipline.