财务数据采集
financial-data-collector
by daymade
从 yfinance 抓取美国上市公司股价、财报、WACC 参数和分析师预期,校验后输出带来源标注的标准化 JSON,适合 DCF、可比估值和财务分析前置取数。
做美股基本面分析时,它能从免费公开源一键抓取财务与市场数据,直接产出适配DCF和可比分析的结构化JSON,省去手工整理。
安装
claude skill add --url github.com/daymade/claude-code-skills/tree/main/financial-data-collector文档
Financial Data Collector
Collect and validate real financial data for US public companies using free data sources. Output is a standardized JSON file ready for consumption by other financial skills.
Critical Constraints
NO FALLBACK values. If a field cannot be retrieved, set it to null with _source: "missing".
Never substitute defaults (e.g., beta or 1.0). The downstream skill decides how to handle missing data.
Data source attribution is mandatory. Every data section must have a _source field.
CapEx sign convention: yfinance returns CapEx as negative (cash outflow). Preserve the original sign. Document the convention in output metadata. Do NOT flip signs.
yfinance FCF ≠ Investment bank FCF. yfinance FCF = Operating CF + CapEx (no SBC deduction). Flag this in output metadata so downstream DCF skills don't overstate FCF.
Workflow
Step 1: Collect Data
Run the collection script:
python scripts/collect_data.py TICKER [--years 5] [--output path/to/output.json]
The script collects in this priority:
- yfinance — market data, historical financials, beta, analyst estimates
- yfinance ^TNX — 10Y Treasury yield as risk-free rate proxy
- User supplement — for years where yfinance returns NaN (report to user, do not guess)
Step 2: Validate Data
python scripts/validate_data.py path/to/output.json
Checks: field completeness, cross-field consistency (Market Cap = Price × Shares), range sanity (WACC 5-20%, beta 0.3-3.0), sign conventions.
Step 3: Deliver JSON
Single file: {TICKER}_financial_data.json. Schema in references/output-schema.md.
Do NOT create: README, CSV, summary reports, or any auxiliary files.
Output Schema (Summary)
{
"ticker": "META",
"company_name": "Meta Platforms, Inc.",
"data_date": "2026-03-02",
"currency": "USD",
"unit": "millions_usd",
"data_sources": { "market_data": "...", "2022_to_2024": "..." },
"market_data": { "current_price": 648.18, "shares_outstanding_millions": 2187, "market_cap_millions": 1639607, "beta_5y_monthly": 1.284 },
"income_statement": { "2024": { "revenue": 164501, "ebit": 69380, "tax_expense": ..., "net_income": ..., "_source": "yfinance" } },
"cash_flow": { "2024": { "operating_cash_flow": ..., "capex": -37256, "depreciation_amortization": 15498, "free_cash_flow": ..., "change_in_nwc": ..., "_source": "yfinance" } },
"balance_sheet": { "2024": { "total_debt": 30768, "cash_and_equivalents": 77815, "net_debt": -47047, "current_assets": ..., "current_liabilities": ..., "_source": "yfinance" } },
"wacc_inputs": { "risk_free_rate": 0.0396, "beta": 1.284, "credit_rating": null, "_source": "yfinance + ^TNX" },
"analyst_estimates": { "revenue_next_fy": 251113, "revenue_fy_after": 295558, "eps_next_fy": 29.59, "_source": "yfinance" },
"metadata": { "_capex_convention": "negative = cash outflow", "_fcf_note": "yfinance FCF = OperatingCF + CapEx. Does NOT deduct SBC." }
}
Full schema with all field definitions: references/output-schema.md
<correct_patterns>
Handling Missing Years
if pd.isna(revenue):
result[year] = {"revenue": None, "_source": "yfinance returned NaN — supplement from 10-K"}
# Report missing years to the user. Do NOT skip or fill with estimates.
CapEx Sign Preservation
capex = cash_flow.loc["Capital Expenditure", year_col] # -37256.0
result["capex"] = float(capex) # Preserve negative
Datetime Column Indexing
year_col = [c for c in financials.columns if c.year == target_year][0]
revenue = financials.loc["Total Revenue", year_col]
Field Name Guards
if "Total Revenue" in financials.index:
revenue = financials.loc["Total Revenue", year_col]
elif "Revenue" in financials.index:
revenue = financials.loc["Revenue", year_col]
else:
revenue = None
</correct_patterns>
<common_mistakes>
Mistake 1: Default Values for Missing Data
# ❌ WRONG
beta = info.get("beta", 1.0)
growth = data.get("growth") or 0.02
# ✅ RIGHT
beta = info.get("beta") # May be None — that's OK
Mistake 2: Assuming All Years Have Data
# ❌ WRONG — 2020-2021 may be NaN
revenue = float(financials.loc["Total Revenue", year_col])
# ✅ RIGHT
value = financials.loc["Total Revenue", year_col]
revenue = float(value) if pd.notna(value) else None
Mistake 3: Using yfinance FCF in DCF Models Directly
yfinance FCF does NOT deduct SBC. For mega-caps like META, SBC can be $20-30B/yr, making yfinance FCF ~30% higher than investment-bank FCF. Always flag this in output.
Mistake 4: Flipping CapEx Sign
# ❌ WRONG — double-negation risk downstream
capex = abs(cash_flow.loc["Capital Expenditure", year_col])
# ✅ RIGHT — preserve original, document convention
capex = float(cash_flow.loc["Capital Expenditure", year_col]) # -37256.0
</common_mistakes>
Known yfinance Pitfalls
See references/yfinance-pitfalls.md for detailed field mapping and workarounds.
相关 Skills
资深数据工程师
by alirezarezvani
聚焦生产级数据工程,覆盖 ETL/ELT、批处理与流式管道、数据建模、Airflow/dbt/Spark 优化和数据质量治理,适合设计数据架构、搭建现代数据栈与排查性能问题。
✎ 复杂数据管道、ETL/ELT 和治理难题交给它,凭 Spark、Airflow、dbt 等现代数据栈经验,能更稳地搭起可扩展的数据基础设施。
技术栈评估
by alirezarezvani
对比框架、数据库和云服务,结合 5 年 TCO、安全风险、生态活力与迁移复杂度做量化评估,适合技术选型、栈升级和替换路线决策。
✎ 帮你系统比较技术栈优劣,不只看功能,还把TCO、安全性和生态健康度一起量化,选型和迁移决策更稳。
迁移架构师
by alirezarezvani
为数据库、API 与基础设施迁移制定分阶段零停机方案,提前校验兼容性与风险,生成回滚策略、验证关卡和时间线,适合复杂系统平滑切换。
✎ 做数据库与存储迁移时,用它统一梳理表结构和数据搬迁流程,架构视角更完整,复杂迁移也更稳。
相关 MCP 服务
SQLite 数据库
编辑精选by Anthropic
SQLite 是让 AI 直接查询本地数据库进行数据分析的 MCP 服务器。
✎ 这个服务器解决了 AI 无法直接访问 SQLite 数据库的问题,适合需要快速分析本地数据集的开发者。不过,作为参考实现,它可能缺乏生产级的安全特性,建议在受控环境中使用。
PostgreSQL 数据库
编辑精选by Anthropic
PostgreSQL 是让 Claude 直接查询和管理你的数据库的 MCP 服务器。
✎ 这个服务器解决了开发者需要手动编写 SQL 查询的痛点,特别适合数据分析师或后端开发者快速探索数据库结构。不过,由于是参考实现,生产环境使用前务必评估安全风险,别指望它能处理复杂事务。
Firecrawl 智能爬虫
编辑精选by Firecrawl
Firecrawl 是让 AI 直接抓取网页并提取结构化数据的 MCP 服务器。
✎ 它解决了手动写爬虫的麻烦,让 Claude 能直接访问动态网页内容。最适合需要实时数据的研究者或开发者,比如监控竞品价格或抓取新闻。但要注意,它依赖第三方 API,可能涉及隐私和成本问题。