Twenty-eight AI-designed trading strategies entered the backtest gauntlet. Five futures strategies exploded. The most sophisticated quant designs flopped. And the Round 2 leader is a guy who just buys cheap stocks.
Twenty-eight strategy designs went into the gauntlet. Each one coded as a QuantConnect Python backtest, then run through a walk-forward test: in-sample window (2012-2017), out-of-sample window (2018-2023). Only the out-of-sample results count. The strategies never saw the OOS period. Their designers never knew the dates.
This is where theory gets punched in the mouth by data.
The Scoreboard
| Rank | Persona | Strategy | OOS CAGR | Max DD |
|---|---|---|---|---|
| 1 | Klarman | Margin of Safety Value | 11.1% | 49.8% |
| 2 | Gray | Quality-Value + Momentum Barbell | 8.8% | 29.1% |
| 3 | Zweig | Don’t Fight the Fed or Tape | 8.2% | 26.2% |
| 4 | Muller | Factor-Neutral Sector Convergence | 8.2% | 23.5% |
| 5 | Murphy | Intermarket Sector Rotation | 7.5% | 29.9% |
| 6 | Tudor Jones | 200-Day Macro Trend Defense | 7.2% | 23.5% |
| 7 | Clenow | Exponential Regression Momentum | 6.8% | 18.6% |
| 8 | Silver | Bayesian Regime Ensemble | 6.6% | 22.9% |
| 9 | Dalio | Dynamic All-Weather | 4.9% | 17.5% |
| 10 | Shiller | CAPE Proxy Rotation | 4.6% | 26.8% |
| 11 | Simons | Multi-Signal Ensemble | 4.2% | 17.4% |
| 12 | Soros | Reflexive Boom-Bust Regime | 3.7% | 15.6% |
| 13 | Greenblatt | Magic Formula Enhanced | 2.7% | 46.0% |
| 18 | Volkov | VRP Harvester | 0.1% | 32.8% |
| 19 | Carver | Futures Trend Ensemble | 0.0% | 103.3% |
| 20 | Trout | Multi-Pattern Statistical | 0.0% | 200.5% |
| 25 | Dennis | Turtle Futures Breakout | -51.7% | 100.0% |
| — | Buffett | Durable Moat Compounder | blocked | — |
Take a minute with that table.
The Value Investor Leads
Seth Klarman’s Margin of Safety Harvester tops Round 2 at 11.1% OOS CAGR.
This is the guy who wrote the literal book on buying cheap assets from forced sellers. His strategy: screen for balance-sheet quality, rank by cheapness, hold 25 positions, keep 25% cash as a reserve, sell when it’s not cheap anymore.
Nothing sexy. No regime detection. No signal ensembles. No volatility targeting. Buy cheap, own quality, sell when the price catches up.
The remarkable thing: his in-sample CAGR was 11.2%. Out-of-sample was 11.1%. Near-zero degradation across completely different market regimes - the 2012-2017 bull and the 2018-2023 chaos (COVID crash, recovery, rate shock, AI rally). That’s not luck. That’s a real edge.
The catch? 49.8% max drawdown. Concentrated value with no hedging doesn’t have airbags. The returns are real. The ride is brutal.
The Result Nobody Expected
Second place: Wes Gray at 8.8% with a much more palatable 29.1% drawdown.
His design was almost comically simple. Twenty-five quality-value stocks and twenty-five momentum stocks, 2% each, monthly rebalance. No timing. No overlays. Just two factors that historically zig and zag at different times.
I’d pegged him as a mid-table finisher. Value-plus-momentum is well-documented and heavily traded. But the diversification between two genuinely uncorrelated factors smoothed returns enough to keep compounding through drawdowns. Factor diversification as a strategy, not a tactic.
The Most Remarkable Number in the Set
Fourth place. Peter Muller. 8.2% OOS CAGR. His in-sample? 7.5%. Out-of-sample beat in-sample.
In 450+ strategies across my entire research program, this almost never happens. It suggests his sector mean-reversion signal - sectors that deviate from their averages tend to revert - may be a genuine inefficiency that actually strengthened during 2018-2023 as markets got more volatile.
Tightest drawdown in the top five at 23.5%. If you adjusted for risk, Muller might be the real winner.
The Futures Bloodbath
Now the crater where the elephant used to be.
| Persona | Strategy | OOS CAGR | Max DD |
|---|---|---|---|
| Carver | 12-Signal Trend Ensemble | 0.0% | 103.3% |
| Trout | Multi-Pattern Statistical | 0.0% | 200.5% |
| Dennis | Turtle Breakout | -51.7% | 100.0% |
| Hoffstein | Return Stacking | -3.4% | 68.9% |
| Simons | Multi-Signal Ensemble | 4.2% | 17.4% |
A max drawdown of 200.5% means Monroe Trout’s strategy lost twice its starting capital. Rob Carver - whose design was the most sophisticated trend-following system in the competition, with 12 diversified signal pairs and zero fitted parameters - flatlined at 103%. Richard Dennis, the original Turtle trader, watched his modernized system lose 52% and never recover.
The sole survivor? Jim Simons. 4.2% CAGR, 17.4% drawdown. Not spectacular, but he’s alive. The multi-signal ensemble - six weak predictors combined - provided enough diversification to keep leverage survivable. Everyone else’s leverage killed them.
Remember: six personas independently converged on futures trend following. It was the strongest signal in the design sprint. And it produced the worst results in the competition.
The Convergence Verdict
Futures trend following (6 converged): Spectacular failure. The most “obvious” answer was the worst answer.
Macro-regime rotation (8 converged): Mixed but positive. Zweig (8.2%), Tudor Jones (7.2%), Silver (6.6%), Dalio (4.9%) all made money. Solid mid-table.
The Round 2 leader? Value investing - one of the smaller clusters. Independent convergence signals that something is real, but it doesn’t signal that it’s optimal.
What Actually Worked
| Type | Avg OOS CAGR | Avg MaxDD |
|---|---|---|
| Fundamental/Value | 5.9% | 38.5% |
| ETF Macro/Rotation | 5.3% | 23.0% |
| Futures | 0.8% | 76.3% |
The less leveraged and more diversified the approach, the more robust it was out of sample. Klarman holds 25 stocks. Gray holds 50. The futures strategies held 5-18 contracts at 3-10x leverage. Guess which group survived.
The Meta-Lesson
For 45 sessions, my research kept gravitating toward macro-regime timing. Yield curve signals. VIX regimes. I couldn’t stop.
Then I ran a competition where 80 independent minds designed their best strategies, and the Round 2 leader was a value investor who doesn’t time the market at all.
Klarman doesn’t check the yield curve. He doesn’t care about the VIX. He buys cheap quality companies and waits. 100% stock selection, 0% timing.
The best macro timer in the competition (Zweig at 8.2%) was outperformed by a value investor who ignores macro entirely (Klarman at 11.1%).
The bias was real. And the competition broke it. Sort of. Because as you’ll see next - there’s a catch. A big one.
Next: I stop to ask an uncomfortable question - why was everybody so bad?
