io.github.bamchi/scrapi
编码与调试by bamchi
为 AI agents 提供网页抓取能力,可将 URL 转换为干净、适合 LLM 处理的 Markdown,并支持绕过反爬虫机制。
什么是 io.github.bamchi/scrapi?
为 AI agents 提供网页抓取能力,可将 URL 转换为干净、适合 LLM 处理的 Markdown,并支持绕过反爬虫机制。
README
⚡ Fast & Reliable — Built on 8+ years of web scraping expertise, 1,900+ production crawlers, and battle-tested anti-bot handling.
What is this?
An MCP (Model Context Protocol) server that lets AI agents fetch and read web pages. Simply give it a URL, and it returns clean, LLM-ready content — fast.
Before: AI can't read web pages directly
After: "Summarize this article" just works ✨
Features
- 🌐 URL → Markdown: Preserves headings, lists, links
- 📄 URL → Text: Plain text extraction
- 🏷️ Metadata: Title, author, date, images
- 🧹 Clean Output: No ads, no navigation, no scripts
- ⚡ JavaScript Rendering: Works with SPAs
- 💳 Built-in Billing: Credit tracking, subscription management, usage analytics (MCP keys)
- 🔄 Auto-Retry: 429 rate limit responses automatically retried with Retry-After
- 🌍 Dual Transport: Stdio (npx) + Streamable HTTP for flexible deployment
Transport Modes
Scrapi MCP Server supports two transport modes:
| Mode | Best For | Node.js Required |
|---|---|---|
| Stdio | Claude Desktop, Cursor, Cline, Claude Code | Yes (auto via npx) |
| Streamable HTTP | All clients, Node.js-free environments | No |
Prerequisites
- Scrapi MCP account (separate from the main Scrapi account)
- Claude Desktop, Cline, or Cursor installed
- Node.js 20+
Installation
Option A: npx (Recommended)
No installation needed. Just configure your MCP client to use npx.
{
"mcpServers": {
"scrapi": {
"command": "npx",
"args": ["-y", "@scrapi.ai/mcp-server"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Tip: You can also pass the API key via CLI argument instead of env var:
json"args": ["-y", "@scrapi.ai/mcp-server", "--api-key", "your-api-key"]
See Step 2 for where to put this configuration.
Option B: Install from Source
# Clone the repository
git clone https://github.com/bamchi/scrapi-mcp-server.git
cd scrapi-mcp-server
# Install dependencies and build
npm install && npm run build
Step 1: Get Your API Key
- Go to https://scrapi.ai
- Sign up or log in
- Visit the MCP Dashboard — your Free plan (500 credits/month) and API key are created automatically
- Copy your
hsmcp_API key
Step 2: Configure MCP Server
Claude Desktop
Option A: Via Settings (Recommended)
- Open Claude Desktop
- Click Settings (gear icon, bottom left)
- Select Developer tab
- Click "Edit Config" button
- Add the mcpServers configuration (see below)
- Save and restart Claude Desktop (Cmd+Q, then reopen)
Option B: Edit config file directly
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Configuration (npx):
{
"mcpServers": {
"scrapi": {
"command": "npx",
"args": ["-y", "@scrapi.ai/mcp-server"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Configuration (from source):
{
"mcpServers": {
"scrapi": {
"command": "node",
"args": ["/absolute/path/to/scrapi-mcp-server/dist/index.js"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Note: Replace
/absolute/path/to/with the actual path where you cloned the repository.
Cline
Config file location:
- macOS:
~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json - Windows:
%APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json
Configuration (npx):
{
"mcpServers": {
"scrapi": {
"command": "npx",
"args": ["-y", "@scrapi.ai/mcp-server"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Configuration (from source):
{
"mcpServers": {
"scrapi": {
"command": "node",
"args": ["/absolute/path/to/scrapi-mcp-server/dist/index.js"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Cursor
Create or edit .cursor/mcp.json in your project root:
Configuration (npx):
{
"mcpServers": {
"scrapi": {
"command": "npx",
"args": ["-y", "@scrapi.ai/mcp-server"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Configuration (from source):
{
"mcpServers": {
"scrapi": {
"command": "node",
"args": ["/absolute/path/to/scrapi-mcp-server/dist/index.js"],
"env": {
"SCRAPI_API_KEY": "your-api-key"
}
}
}
}
Claude Code
Option 1: CLI command (Recommended)
claude mcp add scrapi-ai -s user -e SCRAPI_API_KEY=your-api-key -- npx -y @scrapi.ai/mcp-server
Or with --api-key:
claude mcp add scrapi-ai -s user -- npx -y @scrapi.ai/mcp-server --api-key your-api-key
Option 2: Edit config file
Edit ~/.claude.json or project .mcp.json:
{
"mcpServers": {
"scrapi": {
"command": "npx",
"args": ["-y", "@scrapi.ai/mcp-server", "--api-key", "your-api-key"]
}
}
}
Streamable HTTP
Connect via Streamable HTTP — no Node.js installation needed on the client side.
Endpoint: https://scrapi.ai/mcp
Cursor (.cursor/mcp.json):
{
"mcpServers": {
"scrapi": {
"url": "https://scrapi.ai/mcp",
"headers": {
"Authorization": "Bearer your-api-key"
}
}
}
}
Claude Code (CLI):
claude mcp add --transport http scrapi https://scrapi.ai/mcp \
--header "Authorization: Bearer your-api-key"
Cline (cline_mcp_settings.json):
{
"mcpServers": {
"scrapi": {
"type": "streamableHttp",
"url": "https://scrapi.ai/mcp",
"headers": {
"Authorization": "Bearer your-api-key"
}
}
}
}
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"scrapi": {
"command": "npx",
"args": [
"mcp-remote",
"https://scrapi.ai/mcp",
"--header",
"Authorization: Bearer your-api-key"
]
}
}
}
<details> <summary>Self-host the HTTP server (advanced)</summary>Note: Claude Desktop requires the mcp-remote proxy for HTTP connections.
Run your own instance instead of using the hosted endpoint:
SCRAPI_API_KEY=your-api-key npx -y -p @scrapi.ai/mcp-server scrapi-http
# or from source:
SCRAPI_API_KEY=your-api-key node dist/http.js
The server starts at http://localhost:3000 with the MCP endpoint at /mcp. Configure with PORT and HOST environment variables. Replace the URL in the client configurations above with your self-hosted URL (e.g. http://localhost:3000/mcp).
Health check: GET http://localhost:3000/health
Step 3: Restart Your AI Client
- Claude Desktop: Fully quit (Cmd+Q on macOS, Alt+F4 on Windows) and reopen
- Claude Code: Restart the session
- Cline: Restart VS Code
- Cursor: Restart the editor
You should see the MCP server connection indicator.
Available Tools
scrape_url
Scrapes a webpage and returns AI-readable content.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
url | string | ✅ | URL to scrape |
format | string | markdown (default) or text |
Example:
{
"url": "https://example.com/article",
"format": "markdown"
}
Markdown Output:
# Article Title
> Author: John Doe | Published: 2024-01-15
## Introduction
This is the main content of the article, converted to clean markdown...
## Key Points
- Point 1: Important detail
- Point 2: Another insight
- [Related Link](https://example.com/related)
Text Output:
Article Title
Author: John Doe | Published: 2024-01-15
Introduction
This is the main content of the article, converted to plain text...
Key Points
- Point 1: Important detail
- Point 2: Another insight
scrape_urls
Scrapes multiple webpages in parallel and returns AI-readable content.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
urls | string[] | ✅ | URLs to scrape (max 10) |
format | string | markdown (default) or text |
Example:
{
"urls": ["https://example.com/page1", "https://example.com/page2"],
"format": "text"
}
Output:
[
{
"url": "https://example.com/page1",
"content": "Page 1 Title\n\nThis is the content of page 1..."
},
{
"url": "https://example.com/page2",
"content": "Page 2 Title\n\nThis is the content of page 2..."
}
]
scraper_server_status
Check the status of all ScraperServer instances. Shows server health, circuit breaker state, failure counts, and timing info.
Parameters: None
Example:
{}
Output:
## ScraperServer Status
Total: 3 | Available: 2
| Name | OS | Status | Failures | Last Success | Last Failure |
|------|----|--------|----------|--------------|--------------|
| pluto | linux | OK | 0 | 01/30 14:23:05 | - |
| mars | mac | FAIL | 2 | 01/29 10:00:00 | 01/30 13:55:12 |
| venus | linux | OPEN | 3 | 01/28 09:00:00 | 01/30 12:00:00 |
### Issues
- **mars**: Connection refused - connect(2)
- **venus**: Circuit breaker open until 01/30 12:30:00
- **venus**: Net::ReadTimeout
Status values:
| Status | Description |
|---|---|
OK | Server is healthy |
FAIL | Server is unhealthy |
OPEN | Circuit breaker open (isolated for 30 min) |
N/A | Not yet checked |
get_usage
Check your API usage and remaining credits.
Parameters: None
Example:
{}
Output:
## MCP Credits
| Item | Value |
|------|-------|
| Plan | starter |
| Subscription Credits | 1,500 |
| Purchased Credits | 200 |
| Total Remaining | 1,700 |
| Period End | 2026-03-01 |
get_billing
Retrieve detailed billing information including subscription, plans, daily usage, and spending limits.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
action | string | Yes | subscription, plans, daily_usage, or spending_limits |
start_date | string | Start date for daily_usage (YYYY-MM-DD, default: 30 days ago) | |
end_date | string | End date for daily_usage (YYYY-MM-DD, default: today) |
Example — Current subscription:
{ "action": "subscription" }
## MCP Subscription
| Item | Value |
|------|-------|
| Plan | starter (Starter) |
| Status | active |
| Monthly Credits | 2,000 |
| Price | $19.00/mo |
| Rate Limit | 30 RPM |
| Burst Limit | 5 concurrent |
| Period End | 2026-03-01 |
Example — Available plans:
{ "action": "plans" }
## Available MCP Plans
| Plan | Credits/mo | Price | RPM | Burst |
|------|-----------|-------|-----|-------|
| Free (free) | 500 | Free | 10 | 2 |
| Starter (starter) | 2,000 | $19.00/mo | 30 | 5 |
| Pro (pro) | 10,000 | $49.00/mo | 60 | 10 |
| Business (business) | 50,000 | $149.00/mo | 120 | 20 |
Example — Daily usage history:
{ "action": "daily_usage", "start_date": "2026-02-01", "end_date": "2026-02-07" }
## Daily Usage (2026-02-01 ~ 2026-02-07)
| Date | Requests | Credits | Top Tool |
|------|----------|---------|----------|
| 2026-02-07 | 45 | 45 | scrape#scrape (45) |
| 2026-02-06 | 120 | 120 | scrape#scrape (100) |
**Total**: 165 requests, 165 credits
Example — Spending limits:
{ "action": "spending_limits" }
## Spending Limits
| Item | Value |
|------|-------|
| Daily Limit | 500 credits |
| Today's Usage | 120 credits |
| Usage % | 24.0% |
Usage Examples
Example 1: Summarize a News Article
User: Summarize this article: https://news.example.com/article/12345
Claude: [calls scrape_url]
Here's a summary of the article:
## Key Points
- Point 1: ...
- Point 2: ...
- Point 3: ...
Example 2: Fetch Page Content
User: Get the content from https://example.com/data
Claude: [calls scrape_url]
# Page Title
> Source: https://example.com/data
The page content is returned in clean Markdown format...
Example 3: Research Competitor Pricing
User: What's the pricing on https://competitor.com/product/abc
Claude: [calls scrape_url]
Here's the pricing information:
- **Product**: ABC Premium
- **Regular Price**: $99.00
- **Sale Price**: $79.00 (20% off)
Example 4: Read API Documentation
User: Read https://docs.example.com/api/v2 and write integration code
Claude: [calls scrape_url]
I've analyzed the API documentation. Here's the integration code:
// api-client.ts
export class ExampleApiClient {
private baseUrl = 'https://api.example.com/v2';
async getData(): Promise<Response> {
// ...
}
}
How It Works
┌─────────────────┐
│ User │
│ "Summarize this │
│ URL for me" │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Claude Desktop │
│ / Cursor │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ MCP Server │────►│ Scrapi API │
│ (scrape_url) │ │ (format param) │
└────────┬────────┘ └────────┬────────┘
│ │
│◄──────────────────────┘
│ Markdown/Text Response
▼
┌─────────────────┐
│ AI Response │
│ (Summary, etc.) │
└─────────────────┘
Why Scrapi?
Built by the team behind Scrapi, with 8+ years of web scraping experience:
- ✅ 1,900+ production crawlers
- ✅ JavaScript rendering support
- ✅ Anti-bot handling
- ✅ 99.9% uptime
Troubleshooting
"API key is required"
Make sure your API key is provided via one of these methods:
- Environment variable: Set
SCRAPI_API_KEYin your configuration - CLI argument: Pass
--api-key your-keyin the args
"Invalid API key"
Verify that your API key is correct and active in your Scrapi dashboard.
npx using an old cached version
If you upgraded but still see old behavior, clear the npx cache:
npx clear-npx-cache
MCP Server not connecting
- Ensure Node.js 20+ is installed
- Try running
node /absolute/path/to/scrapi-mcp-server/dist/index.jsmanually to check for errors - Fully quit Claude Desktop (Cmd+Q on macOS, Alt+F4 on Windows) and restart
- Check Settings > Developer to verify the server is listed
Developer tab not visible
Update Claude Desktop to the latest version: Claude menu → "Check for Updates..."
Support
- Email: support@scrapi.ai
- Issues: GitHub Issues
License
MIT © Scrapi
常见问题
io.github.bamchi/scrapi 是什么?
为 AI agents 提供网页抓取能力,可将 URL 转换为干净、适合 LLM 处理的 Markdown,并支持绕过反爬虫机制。
相关 Skills
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。