einstein-research-backtest
by DaVinci
'Expert guidance for systematic backtesting of trading strategies. Use
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/clawdiri-ai/einstein-research-backtest-dv文档
Systematic Backtesting Methodology
This skill provides expert guidance for the rigorous, systematic backtesting of quantitative trading strategies. It ensures that strategies are robust, statistically sound, and free from common biases before any consideration of live deployment. This is the methodology guide; for the programmatic backtesting engine, see the einstein-research-backtest-engine skill.
Core Principle: "Beat the Idea to Death"
A single backtest with good results is meaningless. The goal is not to find one set of parameters that worked in the past, but to prove that a strategy has a persistent edge across a wide range of market conditions and parameter variations.
When to Use This Skill
- User asks how to backtest a trading idea.
- User presents a backtest result and asks for interpretation or next steps.
- User wants to know if their strategy is robust or overfit.
- User is developing a systematic or quantitative trading strategy.
- Triggers: "backtest", "strategy validation", "robustness testing", "overfitting", "systematic trading".
The 7 Stages of Systematic Backtesting
Stage 1: Hypothesis Definition
- Action: Clearly define the strategy's logic, the underlying inefficiency it exploits, and the expected behavior.
- Example: "Hypothesis: Stocks that gap down on high volume but close in the upper 50% of their daily range tend to mean-revert over the next 1-3 days."
- Output: A clear, one-sentence hypothesis.
Stage 2: Initial Backtest
- Action: Run a single backtest using a baseline set of parameters on in-sample data.
- Goal: Sanity check. Does the idea show any promise at all?
- Tool:
einstein-research-backtest-engine - Output: Initial performance metrics (Sharpe, Max Drawdown, CAGR).
Stage 3: Parameter Robustness Testing
- Action: Vary the strategy's key parameters across a logical range.
- Example: For a moving average crossover, test 20/50, 25/60, 15/45, etc.
- Goal: Check for a "plateau" of profitability. A good strategy works across a range of parameters, not just one magic number. A single peak is a major red flag for overfitting.
- Output: A heatmap or table showing performance across parameter variations.
Stage 4: Out-of-Sample (OOS) Testing
- Action: Test the best parameter plateau from Stage 3 on a separate, unseen dataset (e.g., a different time period).
- Goal: Verify that the strategy's edge is not specific to the in-sample data.
- Rule: If performance degrades significantly (>30%) on OOS data, the strategy is likely overfit. Go back to Stage 1.
- Output: Comparison of In-Sample vs. Out-of-Sample performance metrics.
Stage 5: Monte Carlo Simulation
- Action: Resample the trade history thousands of times to simulate different possible sequences of returns.
- Goal: Stress-test the strategy's path dependency and assess the probability of hitting a certain drawdown.
- Example: "What is the probability of a >30% drawdown over a 5-year period?"
- Output: Distribution of potential outcomes, probability of ruin, expected max drawdown.
Stage 6: Slippage and Commission Modeling
- Action: Re-run the backtest with realistic transaction costs (e.g., 0.05% per trade for slippage + commissions).
- Goal: Ensure the strategy's edge is not consumed by trading friction. High-frequency strategies are particularly sensitive to this.
- Rule: If the strategy is not profitable after costs, it has no real-world edge.
- Output: Net performance metrics after costs.
Stage 7: Walk-Forward Optimization
- Action: A more advanced form of OOS testing. Optimize parameters on a rolling window of data, then test on the subsequent window.
- Example: Optimize on 2020-2022 data, test on 2023. Then, optimize on 2021-2023 data, test on 2024.
- Goal: Simulate how the strategy would have been adapted and traded in real-time. This is the gold standard for avoiding lookahead bias.
- Output: A series of OOS performance reports, stitched together to form an equity curve.
Common Biases to Avoid
- Lookahead Bias: Using information that would not have been available at the time of the trade (e.g., using closing prices to make a decision at the open).
- Survivorship Bias: Using a dataset that excludes companies that have gone bankrupt or been delisted. Always use a high-quality, survivorship-bias-free dataset.
- Overfitting (Curve-Fitting): Finding a complex set of rules and parameters that perfectly fits historical data but fails on new data. Parameter robustness testing is the primary defense.
- Data Snooping: Repeatedly testing different hypotheses on the same dataset until one looks good by random chance.
Final Assessment
A strategy is considered potentially viable for live trading only if it passes all 7 stages:
- Clear hypothesis.
- Shows initial promise.
- Profitable across a plateau of parameters.
- Performs well on out-of-sample data.
- Survives Monte Carlo stress tests.
- Profitable after costs.
- Generates a positive walk-forward equity curve.
If a strategy fails at any stage, it is considered invalid, and the process should restart from Stage 1 with a new or revised hypothesis.
相关 Skills
agent-browser
by chulla-ceja
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
接口规范
by alexxxiong
API 规范管理工具 - 跨项目 API 文档的初始化、更新、查询与搜索。Triggers: 'API文档', 'API规范', '接口文档', '路由解析', 'apispec', 'API lookup', 'API search'.
investment-research
by caijichang212
Perform structured investment research (投研分析) for a company/stock/ETF/sector using a repeatable framework: fundamentals (basic/财务报表与商业模式), technical analysis (技术指标与关键价位), industry research (行业景气与竞争格局), valuation (估值对比/情景), catalysts and risks, and produce a professional research report + actionable plan. Use when the user asks for: equity/ETF analysis, earnings/financial statement breakdown, peer/industry comparison, valuation ranges, bull/base/bear scenarios, technical trend/support-resistance, or a full research memo.
相关 MCP 服务
Puppeteer 浏览器控制
编辑精选by Anthropic
Puppeteer 是让 Claude 自动操作浏览器进行网页抓取和测试的 MCP 服务器。
✎ 这个服务器解决了手动编写 Puppeteer 脚本的繁琐问题,适合需要自动化网页交互的开发者,比如抓取动态内容或做端到端测试。不过,作为参考实现,它可能缺少生产级的安全防护,建议在可控环境中使用。
网页抓取
编辑精选by Anthropic
Fetch 是 MCP 官方参考服务器,让 AI 能抓取网页并转为 Markdown 格式。
✎ 这个服务器解决了 AI 直接处理网页内容时格式混乱的问题,适合需要让 Claude 分析在线文档或新闻的开发者。不过作为参考实现,它缺乏生产级的安全配置,你得自己处理反爬虫和隐私风险。
Brave 搜索
编辑精选by Anthropic
Brave Search 是让 Claude 直接调用 Brave 搜索 API 获取实时网络信息的 MCP 服务器。
✎ 如果你想让 AI 助手帮你搜索最新资讯或技术文档,这个工具能绕过传统搜索的限制,直接返回结构化数据。特别适合需要实时信息的开发者,比如查 API 更新或竞品动态。不过它依赖 Brave 的 API 配额,高频使用可能受限。