crawlee-web-scraper
by bryantegomoh
Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single URLs, bulk file input, and automatic fallback from requests to Crawlee on 403/429 responses.
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/bryantegomoh/crawlee-web-scraper文档
crawlee-web-scraper
Drop-in replacement for web_fetch when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically.
Scripts
crawlee_fetch.py— main scraper; accepts a single URL or a file of URLs; returns JSONcrawlee_http.py— library helper; triesrequestsfirst, falls back to Crawlee on 403/429/503
Usage
# Single URL, return HTML preview
python3 scripts/crawlee_fetch.py --url "https://example.com"
# Single URL, extract text (strips HTML tags)
python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text
# Bulk scrape from file
python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json
Library usage
from crawlee_http import fetch_with_fallback
resp = fetch_with_fallback("https://example.com")
print(resp.status_code, resp.text[:500])
Output
JSON array with one object per URL:
[
{
"url": "https://example.com",
"status": 200,
"fetched_at": "2026-01-01T00:00:00Z",
"length": 12345,
"text": "Page content..."
}
]
Installation
pip install crawlee requests
When to use
web_fetchreturns 403 / 429 / empty- Bulk scraping 10+ URLs
- Sites using Cloudflare or similar bot protection
相关 Skills
agent-browser
by chulla-ceja
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
接口规范
by alexxxiong
API 规范管理工具 - 跨项目 API 文档的初始化、更新、查询与搜索。Triggers: 'API文档', 'API规范', '接口文档', '路由解析', 'apispec', 'API lookup', 'API search'.
investment-research
by caijichang212
Perform structured investment research (投研分析) for a company/stock/ETF/sector using a repeatable framework: fundamentals (basic/财务报表与商业模式), technical analysis (技术指标与关键价位), industry research (行业景气与竞争格局), valuation (估值对比/情景), catalysts and risks, and produce a professional research report + actionable plan. Use when the user asks for: equity/ETF analysis, earnings/financial statement breakdown, peer/industry comparison, valuation ranges, bull/base/bear scenarios, technical trend/support-resistance, or a full research memo.
相关 MCP 服务
Puppeteer 浏览器控制
编辑精选by Anthropic
Puppeteer 是让 Claude 自动操作浏览器进行网页抓取和测试的 MCP 服务器。
✎ 这个服务器解决了手动编写 Puppeteer 脚本的繁琐问题,适合需要自动化网页交互的开发者,比如抓取动态内容或做端到端测试。不过,作为参考实现,它可能缺少生产级的安全防护,建议在可控环境中使用。
网页抓取
编辑精选by Anthropic
Fetch 是 MCP 官方参考服务器,让 AI 能抓取网页并转为 Markdown 格式。
✎ 这个服务器解决了 AI 直接处理网页内容时格式混乱的问题,适合需要让 Claude 分析在线文档或新闻的开发者。不过作为参考实现,它缺乏生产级的安全配置,你得自己处理反爬虫和隐私风险。
Brave 搜索
编辑精选by Anthropic
Brave Search 是让 Claude 直接调用 Brave 搜索 API 获取实时网络信息的 MCP 服务器。
✎ 如果你想让 AI 助手帮你搜索最新资讯或技术文档,这个工具能绕过传统搜索的限制,直接返回结构化数据。特别适合需要实时信息的开发者,比如查 API 更新或竞品动态。不过它依赖 Brave 的 API 配额,高频使用可能受限。