Learn

How to evaluate paper trading results

Evaluate paper trading results by studying the quality of the process behind each simulated decision, not only the paper profit and loss. A strong review checks rule fit, risk behavior, drawdown, skipped trades, journal completeness, market context, and whether the next change is justified by evidence.

Start free Open the benchmark worksheet

Paper-first safety frame

Trading Boy does not execute live trades, hold funds, or provide financial advice. Use this page to evaluate simulated practice results, improve a repeatable paper-trading process, and decide what to test next before any real-money decision is considered outside Trading Boy.

Paper trading results evaluation rubric

The right question is not "Did this sample make paper money?" The right question is "Can I explain why these simulated results happened, what behavior created them, and what should change next?" Use the rubric below after every paper sample, whether you trade manually, use the AI paper trading agent, or combine alerts with a journal.

Dimension	What to review	Pass signal	Weak signal
Sample size	Count entries, exits, skips, and market days covered.	The review separates repeated behavior from one-off noise.	A rule is rewritten after one exciting paper win or loss.
Rule fit	Compare each decision with the written setup, invalidation, and skip rules from AI trading agent rules or your manual plan.	Each paper entry has a clear reason that existed before the outcome.	The journal explains the trade after the result is known.
Risk behavior	Check planned risk, position size, exposure, correlation, and stop placement against risk controls.	Simulated losses stay inside the written limits.	Paper profit comes from breaking size or exposure boundaries.
Drawdown	Review maximum paper drawdown, losing streaks, and recovery behavior.	The process includes pause rules and review triggers.	The strategy keeps firing while evidence deteriorates.
Skipped trades	Read the no-trade records, missed setups, and invalidated setups.	Skips explain discipline and protect against overtrading.	Only entries are logged, so restraint cannot be evaluated.
Journal quality	Check whether the paper trading journal template captures thesis, trigger, invalidation, exit reason, result, and review note.	Another reviewer can understand the decision without guessing.	The record contains screenshots or emotions but no structured evidence.
Next action	Decide whether to continue, tighten one rule, reduce simulated risk, change the prompt, or stop the test.	The next step targets one repeated behavior.	The entire workflow is rewritten after a small sample.

Start with the result, then audit the path

Paper trading results should be read backward from outcome to process. The outcome tells you where to look. The process tells you whether the result deserves trust.

Begin with a plain summary: total paper trades, wins, losses, scratches, skipped trades, maximum paper drawdown, average planned risk, and the date range. Then add market context. A strategy tested only during a strong trend should not be treated the same as a strategy tested through chop, news, and failed breakouts. A calm period can hide weak risk settings, while a volatile period can make a reasonable idea look unstable.

After the summary, inspect the decisions. For every paper entry, ask whether the setup was defined before the trade. If the thesis, entry trigger, stop, and invalidation were filled out in a pre-trade checklist, the result is easier to evaluate. If the trade was explained only after it won, the sample is less reliable. The same logic applies to exits. Use a post-trade review or post-trade review template so the close of the trade becomes evidence instead of a memory.

Do not ignore skipped trades. Skips are often the best proof that a paper strategy has boundaries. If a paper trader or agent records why it did nothing, you can see whether the rules prevent low-quality setups. If the journal only records entries, the review will overstate activity and understate discipline.

Metrics that matter more than win rate

Win rate is easy to read and easy to overvalue. A 70 percent win rate with tiny wins and large losses can be weaker than a 42 percent win rate with controlled risk and strong payoff. Paper return has the same problem. It can be distorted by one outsized simulated trade, an unrealistic fill assumption, or a position size that the rules never allowed.

Track metrics that describe behavior. Rule-fit pass rate shows how often the strategy did what it said it would do. Average planned risk shows whether the sample stayed consistent. Maximum paper drawdown shows the stress level a trader would have had to tolerate. Skip rate shows whether the trader is selective. Review completion rate shows whether enough evidence exists for a serious decision. Those metrics connect naturally to the trading feedback loop, because they point to the part of the process that needs work.

If an AI workflow is involved, add prompt version, persona version, rule version, and output format to the review. The AI trading agent prompt template and trading agent persona template can make those versions easier to compare. Without version labels, a paper result is hard to reproduce and even harder to improve.

Example evaluation

Sample: A trader reviews 34 simulated crypto paper-trading decisions over 18 sessions. The journal shows 16 entries, 10 skips, 5 exits from prior entries, and 3 missed-trade notes. Paper return is positive, but the largest winning entry accounts for most of the result.

Process review: The entries mostly followed the setup rule, but four trades used a wider stop than the written plan allowed. The skipped trades were useful because they showed that the strategy avoided several late breakouts. The missed-trade notes showed a repeated hesitation after valid retests.

Rubric score: Sample size is acceptable for an early review. Rule fit is mixed. Risk behavior needs work because the stop distance changed without a written reason. Journal quality passes because the thesis, invalidation, exit reason, and review note are complete.

Decision: The trader does not promote the strategy or abandon it. The next sample keeps the same setup rule, lowers the maximum planned paper risk, and adds a required explanation when stop distance changes. The next review compares only that behavior.

Change rules slowly

A paper result should usually create one focused change, not a complete rewrite. If early entries are the problem, tighten confirmation. If drawdown is the problem, revisit paper size and pause rules with the risk review workflow. If journaling is the problem, fix the fields before judging strategy quality.

Separate strategy from execution

A good idea can look bad when execution is undisciplined, and a weak idea can look good during a favorable market. Compare results with a paper trading benchmark so the review separates market context, strategy edge, and process behavior.

Build a repeatable review workflow

A repeatable workflow makes paper trading results easier to trust because every sample is reviewed the same way. It also keeps the trader from changing the rules every time the outcome feels good or bad.

Define the test before the sample starts. Write the setup, invalidation, entry trigger, skip rule, position-sizing rule, and review date before the first simulated trade.
Record every decision. Include entries, exits, skipped trades, missed trades, and paused sessions. A missing record is a missing part of the evaluation.
Review batches, not moods. Use a fixed sample window so one strong paper win or one frustrating loss does not control the decision.
Tag repeated errors. Common tags include late entry, early entry, poor stop placement, oversized simulated risk, unclear thesis, missing exit plan, FOMO, revenge, and ignored news risk.
Choose one next action. Continue collecting evidence, tighten one rule, reduce paper risk, update the prompt, pause the setup, or retire the test.

Where this fits in the paper trading system

Evaluation is the bridge between practice and improvement. Start from the paper trading hub, define the decision process with a rules page or checklist, record the sample in a journal, then use this page to decide what the evidence means. If you are comparing automated behavior, pair this guide with AI paper trading agent evaluation. If you are comparing strategy versions, use the benchmark review worksheet to keep each version accountable to the same review fields.

The final output should be short and concrete: what happened, why it happened, what evidence supports the conclusion, and what changes before the next paper sample. That discipline matters because paper trading is only useful when the review improves the next test.

Paper trading results FAQ

What is the best way to evaluate paper trading results?

Evaluate paper trading results by checking sample size, rule fit, planned risk, drawdown, skipped trades, journal completeness, and whether the next action follows from repeated evidence.

Is paper trading profit enough to judge a strategy?

No. Paper profit can be useful, but it is not enough by itself. A profitable sample can still include late entries, oversized simulated positions, missing invalidation, or weak journaling.

How many paper trades should I review before changing rules?

Use enough paper trades to see repeated behavior, then change one rule at a time. If the sample is small or market conditions are unusual, keep collecting evidence before rewriting the strategy.

Does Trading Boy place live trades from paper trading results?

No. Trading Boy does not execute live trades, hold funds, or provide financial advice. The workflow is for simulated practice, structured review, and paper-first learning.