网页抓取器

Name: 网页抓取器
Rating: 5 (4460 reviews)
Author: bryantegomoh

crawlee-web-scraper

by bryantegomoh

Resilient web scraper with bot-detection evasion using the Crawlee library. Use when web_fetch is blocked by rate limits or bot detection. Supports single URLs, bulk file input, and automatic fallback from requests to Crawlee on 403/429 responses.

4.5k搜索与获取未扫描2026年3月23日

安装

claude skill add --url github.com/openclaw/skills/tree/main/skills/bryantegomoh/crawlee-web-scraper

文档

crawlee-web-scraper

Drop-in replacement for web_fetch when sites block automated requests. Crawlee handles session management, retry logic, and bot-detection evasion automatically.

Scripts

crawlee_fetch.py — main scraper; accepts a single URL or a file of URLs; returns JSON
crawlee_http.py — library helper; tries requests first, falls back to Crawlee on 403/429/503

Usage

bash

# Single URL, return HTML preview
python3 scripts/crawlee_fetch.py --url "https://example.com"

# Single URL, extract text (strips HTML tags)
python3 scripts/crawlee_fetch.py --url "https://example.com" --extract-text

# Bulk scrape from file
python3 scripts/crawlee_fetch.py --urls-file urls.txt --output results.json

Library usage

python

from crawlee_http import fetch_with_fallback

resp = fetch_with_fallback("https://example.com")
print(resp.status_code, resp.text[:500])

Output

JSON array with one object per URL:

json

[
  {
    "url": "https://example.com",
    "status": 200,
    "fetched_at": "2026-01-01T00:00:00Z",
    "length": 12345,
    "text": "Page content..."
  }
]

Installation

bash

pip install crawlee requests

When to use

web_fetch returns 403 / 429 / empty
Bulk scraping 10+ URLs
Sites using Cloudflare or similar bot protection

网页抓取器

安装

文档

crawlee-web-scraper

Scripts

Usage

Library usage

Output

Installation

When to use

相关 Skills

谷歌视频工具

股票投研框架

SEO审计工具

相关 MCP 服务

Brave 搜索

Puppeteer 浏览器控制

网页抓取

评论