io.github.ScrapeGraphAI/scrapegraph-mcp
编码与调试by scrapegraphai
通过 ScrapeGraph API 提供 AI 驱动的网页抓取与数据提取能力,适合结构化采集任务。
什么是 io.github.ScrapeGraphAI/scrapegraph-mcp?
通过 ScrapeGraph API 提供 AI 驱动的网页抓取与数据提取能力,适合结构化采集任务。
README
ScrapeGraph MCP Server
A production-ready Model Context Protocol (MCP) server that provides seamless integration with the ScrapeGraph AI API. This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.
Table of Contents
- Key Features
- Quick Start
- Available Tools
- Setup Instructions
- Remote Server Usage
- Local Usage
- Google ADK Integration
- Example Use Cases
- Error Handling
- Common Issues
- Development
- Contributing
- Documentation
- Technology Stack
- License
Key Features
- 8 Powerful Tools: From simple markdown conversion to complex multi-page crawling and agentic workflows
- AI-Powered Extraction: Intelligently extract structured data using natural language prompts
- Multi-Page Crawling: SmartCrawler supports asynchronous crawling with configurable depth and page limits
- Infinite Scroll Support: Handle dynamic content loading with configurable scroll counts
- JavaScript Rendering: Full support for JavaScript-heavy websites
- Flexible Output Formats: Get results as markdown, structured JSON, or custom schemas
- Easy Integration: Works seamlessly with Claude Desktop, Cursor, and any MCP-compatible client
- Enterprise-Ready: Robust error handling, timeout management, and production-tested reliability
- Simple Deployment: One-command installation via Smithery or manual setup
- Comprehensive Documentation: Detailed developer docs in
.agent/folder
Quick Start
1. Get Your API Key
Sign up and get your API key from the ScrapeGraph Dashboard
2. Install with Smithery (Recommended)
npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claude
3. Start Using
Ask Claude or Cursor:
- "Convert https://scrapegraphai.com to markdown"
- "Extract all product prices from this e-commerce page"
- "Research the latest AI developments and summarize findings"
That's it! The server is now available to your AI assistant.
Available Tools
The server provides 8 enterprise-ready tools for AI-powered web scraping:
Core Scraping Tools
1. markdownify
Transform any webpage into clean, structured markdown format.
markdownify(website_url: str)
- Credits: 2 per request
- Use case: Quick webpage content extraction in markdown
2. smartscraper
Leverage AI to extract structured data from any webpage with support for infinite scrolling.
smartscraper(
user_prompt: str,
website_url: str,
number_of_scrolls: int = None,
markdown_only: bool = None
)
- Credits: 10+ (base) + variable based on scrolling
- Use case: AI-powered data extraction with custom prompts
3. searchscraper
Execute AI-powered web searches with structured, actionable results.
searchscraper(
user_prompt: str,
num_results: int = None,
number_of_scrolls: int = None,
time_range: str = None # Filter by: past_hour, past_24_hours, past_week, past_month, past_year
)
- Credits: Variable (3-20 websites × 10 credits)
- Use case: Multi-source research and data aggregation
- Time filtering: Use
time_rangeto filter results by recency (e.g.,"past_week"for recent results)
Advanced Scraping Tools
4. scrape
Basic scraping endpoint to fetch page content with optional heavy JavaScript rendering.
scrape(website_url: str, render_heavy_js: bool = None)
- Use case: Simple page content fetching with JS rendering support
5. sitemap
Extract sitemap URLs and structure for any website.
sitemap(website_url: str)
- Use case: Website structure analysis and URL discovery
Multi-Page Crawling
6. smartcrawler_initiate
Initiate intelligent multi-page web crawling (asynchronous operation).
smartcrawler_initiate(
url: str,
prompt: str = None,
extraction_mode: str = "ai",
depth: int = None,
max_pages: int = None,
same_domain_only: bool = None
)
- AI Extraction Mode: 10 credits per page - extracts structured data
- Markdown Mode: 2 credits per page - converts to markdown
- Returns:
request_idfor polling - Use case: Large-scale website crawling and data extraction
7. smartcrawler_fetch_results
Retrieve results from asynchronous crawling operations.
smartcrawler_fetch_results(request_id: str)
- Returns: Status and results when crawling is complete
- Use case: Poll for crawl completion and retrieve results
Intelligent Agent-Based Scraping
8. agentic_scrapper
Run advanced agentic scraping workflows with customizable steps and structured output schemas.
agentic_scrapper(
url: str,
user_prompt: str = None,
output_schema: dict = None,
steps: list = None,
ai_extraction: bool = None,
persistent_session: bool = None,
timeout_seconds: float = None
)
- Use case: Complex multi-step workflows with custom schemas and persistent sessions
Setup Instructions
To utilize this server, you'll need a ScrapeGraph API key. Follow these steps to obtain one:
- Navigate to the ScrapeGraph Dashboard
- Create an account and generate your API key
Automated Installation via Smithery
For automated installation of the ScrapeGraph API Integration Server using Smithery:
npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claude
Claude Desktop Configuration
Update your Claude Desktop configuration file with the following settings (located on the top rigth of the Cursor page):
(remember to add your API key inside the config)
{
"mcpServers": {
"@ScrapeGraphAI-scrapegraph-mcp": {
"command": "npx",
"args": [
"-y",
"@smithery/cli@latest",
"run",
"@ScrapeGraphAI/scrapegraph-mcp",
"--config",
"\"{\\\"scrapegraphApiKey\\\":\\\"YOUR-SGAI-API-KEY\\\"}\""
]
}
}
}
The configuration file is located at:
- Windows:
%APPDATA%/Claude/claude_desktop_config.json - macOS:
~/Library/Application\ Support/Claude/claude_desktop_config.json
Cursor Integration
Add the ScrapeGraphAI MCP server on the settings:

Remote Server Usage
Connect to our hosted MCP server - no local installation required!
Claude Desktop Configuration (Remote)
Add this to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"scrapegraph-mcp": {
"command": "npx",
"args": [
"mcp-remote@0.1.25",
"https://scrapegraph-mcp.onrender.com/mcp",
"--header",
"X-API-Key:YOUR_API_KEY"
]
}
}
}
Cursor Configuration (Remote)
Cursor supports native HTTP MCP connections. Add to your Cursor MCP settings (~/.cursor/mcp.json):
{
"mcpServers": {
"scrapegraph-mcp": {
"url": "https://scrapegraph-mcp.onrender.com/mcp",
"headers": {
"X-API-Key": "YOUR_API_KEY"
}
}
}
}
Benefits of Remote Server
- No local setup - Just configure and start using
- Always up-to-date - Automatically receives latest updates
- Cross-platform - Works on any OS with Node.js
Local Usage
To run the MCP server locally for development or testing, follow these steps:
Prerequisites
- Python 3.13 or higher
- pip or uv package manager
- ScrapeGraph API key
Installation
- Clone the repository (if you haven't already):
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp
cd scrapegraph-mcp
- Install the package:
# Using pip
pip install -e .
# Or using uv (faster)
uv pip install -e .
- Set your API key:
# macOS/Linux
export SGAI_API_KEY=your-api-key-here
# Windows (PowerShell)
$env:SGAI_API_KEY="your-api-key-here"
# Windows (CMD)
set SGAI_API_KEY=your-api-key-here
Running the Server Locally
You can run the server directly:
# Using the installed command
scrapegraph-mcp
# Or using Python module
python -m scrapegraph_mcp.server
The server will start and communicate via stdio (standard input/output), which is the standard MCP transport method.
Testing with MCP Inspector
Test your local server using the MCP Inspector tool:
npx @modelcontextprotocol/inspector python -m scrapegraph_mcp.server
This provides a web interface to test all available tools interactively.
Configuring Claude Desktop for Local Server
To use your locally running server with Claude Desktop, update your configuration file:
macOS/Linux (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"scrapegraph-mcp-local": {
"command": "python",
"args": [
"-m",
"scrapegraph_mcp.server"
],
"env": {
"SGAI_API_KEY": "your-api-key-here"
}
}
}
}
Windows (%APPDATA%\Claude\claude_desktop_config.json):
{
"mcpServers": {
"scrapegraph-mcp-local": {
"command": "python",
"args": [
"-m",
"scrapegraph_mcp.server"
],
"env": {
"SGAI_API_KEY": "your-api-key-here"
}
}
}
}
Note: Make sure Python is in your PATH. You can verify by running python --version in your terminal.
Configuring Cursor for Local Server
In Cursor's MCP settings, add a new server with:
- Command:
python - Args:
["-m", "scrapegraph_mcp.server"] - Environment Variables:
{"SGAI_API_KEY": "your-api-key-here"}
Troubleshooting Local Setup
Server not starting:
- Verify Python is installed:
python --version - Check that the package is installed:
pip list | grep scrapegraph-mcp - Ensure API key is set:
echo $SGAI_API_KEY(macOS/Linux) orecho %SGAI_API_KEY%(Windows)
Tools not appearing:
- Check Claude Desktop logs:
- macOS:
~/Library/Logs/Claude/ - Windows:
%APPDATA%\Claude\Logs\
- macOS:
- Verify the server starts without errors when run directly
- Check that the configuration JSON is valid
Import errors:
- Reinstall the package:
pip install -e . --force-reinstall - Verify dependencies:
pip install -r requirements.txt(if available)
Google ADK Integration
The ScrapeGraph MCP server can be integrated with Google ADK (Agent Development Kit) to create AI agents with web scraping capabilities.
Prerequisites
- Python 3.13 or higher
- Google ADK installed
- ScrapeGraph API key
Installation
- Install Google ADK (if not already installed):
pip install google-adk
- Set your API key:
export SGAI_API_KEY=your-api-key-here
Basic Integration Example
Create an agent file (e.g., agent.py) with the following configuration:
import os
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from mcp import StdioServerParameters
# Path to the scrapegraph-mcp server directory
SCRAPEGRAPH_MCP_PATH = "/path/to/scrapegraph-mcp"
# Path to the server.py file
SERVER_SCRIPT_PATH = os.path.join(
SCRAPEGRAPH_MCP_PATH,
"src",
"scrapegraph_mcp",
"server.py"
)
root_agent = LlmAgent(
model='gemini-2.0-flash',
name='scrapegraph_assistant_agent',
instruction='Help the user with web scraping and data extraction using ScrapeGraph AI. '
'You can convert webpages to markdown, extract structured data using AI, '
'perform web searches, crawl multiple pages, and automate complex scraping workflows.',
tools=[
MCPToolset(
connection_params=StdioConnectionParams(
server_params=StdioServerParameters(
command='python3',
args=[
SERVER_SCRIPT_PATH,
],
env={
'SGAI_API_KEY': os.getenv('SGAI_API_KEY'),
},
),
timeout=300.0,)
),
# Optional: Filter which tools from the MCP server are exposed
# tool_filter=['markdownify', 'smartscraper', 'searchscraper']
)
],
)
Configuration Options
Timeout Settings:
- Default timeout is 5 seconds, which may be too short for web scraping operations
- Recommended: Set `timeout=300.0
- Adjust based on your use case (crawling operations may need even longer timeouts)
Tool Filtering:
- By default, all 8 tools are exposed to the agent
- Use
tool_filterto limit which tools are available:pythontool_filter=['markdownify', 'smartscraper', 'searchscraper']
API Key Configuration:
- Set via environment variable:
export SGAI_API_KEY=your-key - Or pass directly in
envdict:'SGAI_API_KEY': 'your-key-here' - Environment variable approach is recommended for security
Usage Example
Once configured, your agent can use natural language to interact with web scraping tools:
# The agent can now handle queries like:
# - "Convert https://example.com to markdown"
# - "Extract all product prices from this e-commerce page"
# - "Search for recent AI research papers and summarize them"
# - "Crawl this documentation site and extract all API endpoints"
For more information about Google ADK, visit the official documentation.
Example Use Cases
The server enables sophisticated queries across various scraping scenarios:
Single Page Scraping
- Markdownify: "Convert the ScrapeGraph documentation page to markdown"
- SmartScraper: "Extract all product names, prices, and ratings from this e-commerce page"
- SmartScraper with scrolling: "Scrape this infinite scroll page with 5 scrolls and extract all items"
- Basic Scrape: "Fetch the HTML content of this JavaScript-heavy page with full rendering"
Search and Research
- SearchScraper: "Research and summarize recent developments in AI-powered web scraping"
- SearchScraper: "Search for the top 5 articles about machine learning frameworks and extract key insights"
- SearchScraper: "Find recent news about GPT-4 and provide a structured summary"
- SearchScraper with time_range: "Search for AI news from the past week only" (uses
time_range="past_week")
Website Analysis
- Sitemap: "Extract the complete sitemap structure from the ScrapeGraph website"
- Sitemap: "Discover all URLs on this blog site"
Multi-Page Crawling
- SmartCrawler (AI mode): "Crawl the entire documentation site and extract all API endpoints with descriptions"
- SmartCrawler (Markdown mode): "Convert all pages in the blog to markdown up to 2 levels deep"
- SmartCrawler: "Extract all product information from an e-commerce site, maximum 100 pages, same domain only"
Advanced Agentic Scraping
- Agentic Scraper: "Navigate through a multi-step authentication form and extract user dashboard data"
- Agentic Scraper with schema: "Follow pagination links and compile a dataset with schema: {title, author, date, content}"
- Agentic Scraper: "Execute a complex workflow: login, navigate to reports, download data, and extract summary statistics"
Error Handling
The server implements robust error handling with detailed, actionable error messages for:
- API authentication issues
- Malformed URL structures
- Network connectivity failures
- Rate limiting and quota management
Common Issues
Windows-Specific Connection
When running on Windows systems, you may need to use the following command to connect to the MCP server:
C:\Windows\System32\cmd.exe /c npx -y @smithery/cli@latest run @ScrapeGraphAI/scrapegraph-mcp --config "{\"scrapegraphApiKey\":\"YOUR-SGAI-API-KEY\"}"
This ensures proper execution in the Windows environment.
Other Common Issues
"ScrapeGraph client not initialized"
- Cause: Missing API key
- Solution: Set
SGAI_API_KEYenvironment variable or provide via--config
"Error 401: Unauthorized"
- Cause: Invalid API key
- Solution: Verify your API key at the ScrapeGraph Dashboard
"Error 402: Payment Required"
- Cause: Insufficient credits
- Solution: Add credits to your ScrapeGraph account
SmartCrawler not returning results
- Cause: Still processing (asynchronous operation)
- Solution: Keep polling
smartcrawler_fetch_results()until status is "completed"
Tools not appearing in Claude Desktop
- Cause: Server not starting or configuration error
- Solution: Check Claude logs at
~/Library/Logs/Claude/(macOS) or%APPDATA%\Claude\Logs\(Windows)
For detailed troubleshooting, see the .agent documentation.
Development
Prerequisites
- Python 3.13 or higher
- pip or uv package manager
- ScrapeGraph API key
Installation from Source
# Clone the repository
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp
cd scrapegraph-mcp
# Install dependencies
pip install -e ".[dev]"
# Set your API key
export SGAI_API_KEY=your-api-key
# Run the server
scrapegraph-mcp
# or
python -m scrapegraph_mcp.server
Testing with MCP Inspector
Test your server locally using the MCP Inspector tool:
npx @modelcontextprotocol/inspector scrapegraph-mcp
This provides a web interface to test all available tools.
Code Quality
Linting:
ruff check src/
Type Checking:
mypy src/
Format Checking:
ruff format --check src/
Project Structure
scrapegraph-mcp/
├── src/
│ └── scrapegraph_mcp/
│ ├── __init__.py # Package initialization
│ └── server.py # Main MCP server (all code in one file)
├── .agent/ # Developer documentation
│ ├── README.md # Documentation index
│ └── system/ # System architecture docs
├── assets/ # Images and badges
├── pyproject.toml # Project metadata & dependencies
├── smithery.yaml # Smithery deployment config
└── README.md # This file
Contributing
We welcome contributions! Here's how you can help:
Adding a New Tool
- Add method to
ScapeGraphClientclass in server.py:
def new_tool(self, param: str) -> Dict[str, Any]:
"""Tool description."""
url = f"{self.BASE_URL}/new-endpoint"
data = {"param": param}
response = self.client.post(url, headers=self.headers, json=data)
if response.status_code != 200:
raise Exception(f"Error {response.status_code}: {response.text}")
return response.json()
- Add MCP tool decorator:
@mcp.tool()
def new_tool(param: str) -> Dict[str, Any]:
"""
Tool description for AI assistants.
Args:
param: Parameter description
Returns:
Dictionary containing results
"""
if scrapegraph_client is None:
return {"error": "ScrapeGraph client not initialized. Please provide an API key."}
try:
return scrapegraph_client.new_tool(param)
except Exception as e:
return {"error": str(e)}
- Test with MCP Inspector:
npx @modelcontextprotocol/inspector scrapegraph-mcp
-
Update documentation:
- Add tool to this README
- Update .agent documentation
-
Submit a pull request
Development Workflow
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run linting and type checking
- Test with MCP Inspector and Claude Desktop
- Update documentation
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style
- Line length: 100 characters
- Type hints: Required for all functions
- Docstrings: Google-style docstrings
- Error handling: Return error dicts, don't raise exceptions in tools
- Python version: Target 3.13+
For detailed development guidelines, see the .agent documentation.
Documentation
For comprehensive developer documentation, see:
- .agent/README.md - Complete developer documentation index
- .agent/system/project_architecture.md - System architecture and design
- .agent/system/mcp_protocol.md - MCP protocol integration details
Technology Stack
Core Framework
- Python 3.13+ - Modern Python with type hints
- FastMCP - Lightweight MCP server framework
- httpx 0.24.0+ - Modern async HTTP client
Development Tools
- Ruff - Fast Python linter and formatter
- mypy - Static type checker
- Hatchling - Modern build backend
Deployment
- Smithery - Automated MCP server deployment
- Docker - Container support with Alpine Linux
- stdio transport - Standard MCP communication
API Integration
- ScrapeGraph AI API - Enterprise web scraping service
- Base URL:
https://api.scrapegraphai.com/v1 - Authentication: API key-based
License
This project is distributed under the MIT License. For detailed terms and conditions, please refer to the LICENSE file.
Acknowledgments
Special thanks to tomekkorbak for his implementation of oura-mcp-server, which served as starting point for this repo.
Resources
Official Links
- ScrapeGraph AI Homepage
- ScrapeGraph Dashboard - Get your API key
- ScrapeGraph API Documentation
- GitHub Repository
MCP Resources
- Model Context Protocol - Official MCP specification
- FastMCP Framework - Framework used by this server
- MCP Inspector - Testing tool
- Smithery - MCP server distribution
- mcp-name: io.github.ScrapeGraphAI/scrapegraph-mcp
AI Assistant Integration
- Claude Desktop - Desktop app with MCP support
- Cursor - AI-powered code editor
Support
- GitHub Issues - Report bugs or request features
- Developer Documentation - Comprehensive dev docs
Made with ❤️ by ScrapeGraphAI Team
常见问题
io.github.ScrapeGraphAI/scrapegraph-mcp 是什么?
通过 ScrapeGraph API 提供 AI 驱动的网页抓取与数据提取能力,适合结构化采集任务。
相关 Skills
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。
