网页爬取
firecrawl
by antonia-sz
Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrape multiple URLs, or build knowledge bases from web content. Supports single page extraction, site-wide crawling, batch processing, and structured data extraction with CSS selectors.
安装
claude skill add --url https://github.com/openclaw/skills文档
Firecrawl Skill
Powerful web scraping powered by Firecrawl - turn websites into LLM-ready markdown.
Overview
Firecrawl provides APIs for:
- Scrape - Single page extraction to markdown
- Crawl - Entire site crawling with depth control
- Map - URL discovery from a starting point
- Batch - Multiple URL processing
- Extract - Structured data extraction with schemas
Prerequisites
- Firecrawl API Key - Get free tier at https://firecrawl.dev
- Install Python dependencies:
requests
Configuration
Set environment variable:
export FIRECRAWL_API_KEY="fc-your-api-key"
Usage
Single Page Scraping
# Basic scrape
firecrawl scrape https://example.com
# With specific options
firecrawl scrape https://example.com --formats markdown,html --only-main-content
# Wait for JS rendering
firecrawl scrape https://spa-app.com --wait-for 2000
Site Crawling
# Crawl entire site (up to limit)
firecrawl crawl https://docs.example.com --limit 50
# With depth control
firecrawl crawl https://blog.example.com --max-depth 2 --limit 100
# Include/exclude patterns
firecrawl crawl https://site.com --include "/blog/*" --exclude "/admin/*"
# Custom formats
firecrawl crawl https://docs.example.com --formats markdown,links
URL Mapping
# Discover all URLs from a site
firecrawl map https://example.com
# With search term
firecrawl map https://docs.python.org --search "tutorial"
Batch Processing
# Scrape multiple URLs
firecrawl batch urls.txt --output ./scraped/
# From JSON list
firecrawl batch urls.json --formats markdown --concurrency 5
Structured Extraction
# Extract specific data using CSS selectors
firecrawl extract https://example.com/products \
--schema '{"name": ".product-title", "price": ".price", "description": ".desc"}'
# Extract to JSON
firecrawl extract https://news.example.com/article --schema article-schema.json
Output Formats
Markdown
Clean, LLM-ready markdown with:
- Headings preserved
- Links converted to markdown format
- Images with alt text
- Tables formatted as markdown tables
HTML
Raw or cleaned HTML
Links
Extracted link lists for further crawling
Screenshot
Page screenshot (if requested)
Use Cases
Knowledge Base Building
# Crawl documentation site
firecrawl crawl https://docs.framework.com --limit 200 -o ./kb/
# Merge into single file for RAG
cat ./kb/*.md > knowledge-base.md
Research & Analysis
# Scrape competitor pricing
firecrawl batch competitors.txt --extract pricing-schema.json
# Monitor blog updates
firecrawl map https://blog.company.com --since 2024-01-01
Content Migration
# Export old CMS content
firecrawl crawl https://old-site.com --formats markdown,html -o ./export/
Scripts
All functionality via scripts/firecrawl.py:
- Handles API authentication
- Automatic rate limiting
- Retry logic for failures
- Progress tracking for large crawls
Integration
Works well with:
markdown-sync-pro- Sync scraped content to Notion/GitHubarxiv-paper- Combine with academic paper downloadsmaybe-finance- Scrape financial data for analysis
相关 Skills
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
相关 MCP 服务
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。