网页爬取

firecrawl

by antonia-sz

Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrape multiple URLs, or build knowledge bases from web content. Supports single page extraction, site-wide crawling, batch processing, and structured data extraction with CSS selectors.

4.5k编码与调试未扫描2026年4月20日

安装

claude skill add --url https://github.com/openclaw/skills

文档

Firecrawl Skill

Powerful web scraping powered by Firecrawl - turn websites into LLM-ready markdown.

Overview

Firecrawl provides APIs for:

  • Scrape - Single page extraction to markdown
  • Crawl - Entire site crawling with depth control
  • Map - URL discovery from a starting point
  • Batch - Multiple URL processing
  • Extract - Structured data extraction with schemas

Prerequisites

  1. Firecrawl API Key - Get free tier at https://firecrawl.dev
  2. Install Python dependencies: requests

Configuration

Set environment variable:

bash
export FIRECRAWL_API_KEY="fc-your-api-key"

Usage

Single Page Scraping

bash
# Basic scrape
firecrawl scrape https://example.com

# With specific options
firecrawl scrape https://example.com --formats markdown,html --only-main-content

# Wait for JS rendering
firecrawl scrape https://spa-app.com --wait-for 2000

Site Crawling

bash
# Crawl entire site (up to limit)
firecrawl crawl https://docs.example.com --limit 50

# With depth control
firecrawl crawl https://blog.example.com --max-depth 2 --limit 100

# Include/exclude patterns
firecrawl crawl https://site.com --include "/blog/*" --exclude "/admin/*"

# Custom formats
firecrawl crawl https://docs.example.com --formats markdown,links

URL Mapping

bash
# Discover all URLs from a site
firecrawl map https://example.com

# With search term
firecrawl map https://docs.python.org --search "tutorial"

Batch Processing

bash
# Scrape multiple URLs
firecrawl batch urls.txt --output ./scraped/

# From JSON list
firecrawl batch urls.json --formats markdown --concurrency 5

Structured Extraction

bash
# Extract specific data using CSS selectors
firecrawl extract https://example.com/products \
  --schema '{"name": ".product-title", "price": ".price", "description": ".desc"}'

# Extract to JSON
firecrawl extract https://news.example.com/article --schema article-schema.json

Output Formats

Markdown

Clean, LLM-ready markdown with:

  • Headings preserved
  • Links converted to markdown format
  • Images with alt text
  • Tables formatted as markdown tables

HTML

Raw or cleaned HTML

Links

Extracted link lists for further crawling

Screenshot

Page screenshot (if requested)

Use Cases

Knowledge Base Building

bash
# Crawl documentation site
firecrawl crawl https://docs.framework.com --limit 200 -o ./kb/

# Merge into single file for RAG
cat ./kb/*.md > knowledge-base.md

Research & Analysis

bash
# Scrape competitor pricing
firecrawl batch competitors.txt --extract pricing-schema.json

# Monitor blog updates
firecrawl map https://blog.company.com --since 2024-01-01

Content Migration

bash
# Export old CMS content
firecrawl crawl https://old-site.com --formats markdown,html -o ./export/

Scripts

All functionality via scripts/firecrawl.py:

  • Handles API authentication
  • Automatic rate limiting
  • Retry logic for failures
  • Progress tracking for large crawls

Integration

Works well with:

  • markdown-sync-pro - Sync scraped content to Notion/GitHub
  • arxiv-paper - Combine with academic paper downloads
  • maybe-finance - Scrape financial data for analysis

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描155.3k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描155.3k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描155.3k

相关 MCP 服务

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
87.7k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
58.1k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
48.3k

评论