网页提取
web-search-scraper-api-skill
by browseract-ai
This skill helps users automatically extract complete Markdown content from any website via the BrowserAct Web Search Scraper API. The Agent should proactively apply this skill when users express needs like extract complete markdown from a specific website, scrape the content of an article link, get the text from a target url, convert a webpage to markdown format, fetch the main content of a blog post, extract data from a given web page, parse the html of a website into markdown, download the readable text from a news article, obtain the content of a tutorial page, extract all the markdown text from any http or https url, scrape documentation from a web link, or grab the text of a single webpage.
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/browseract-ai/web-search-scraper-api-skill文档
Web Search Scraper API Skill
📖 Introduction
This skill provides users with a one-stop web page extraction service through the BrowserAct Web Search Scraper API template. It can directly extract structured markdown content from any given URL. By simply inputting the target URL, you can get clean and usable markdown data.
✨ Features
- No hallucinations, ensuring stable and precise data extraction: Pre-set workflows avoid AI generative hallucinations.
- No human-machine verification issues: No need to deal with reCAPTCHA or other verification challenges.
- No IP access restrictions or geofencing: No need to handle regional IP limitations.
- More agile execution speed: Compared to purely AI-driven browser automation solutions, task execution is faster.
- Extremely high cost-effectiveness: Compared to AI solutions that consume a lot of Tokens, it can significantly reduce the cost of data acquisition.
🔑 API Key Guidance Process
Before running, you must check the BROWSERACT_API_KEY environment variable. If it is not set, do not take other actions first; you should ask and wait for the user to provide it cooperatively.
The Agent must inform the user at this time:
"Since you have not configured the BrowserAct API Key, please go to the BrowserAct Console first to get your Key."
🛠️ Input Parameters Details
Agent should flexibly configure the following parameters based on user needs when calling the script:
- target_url
- Type:
string - Description: The website URL to extract content from. Supports any HTTP/HTTPS URL.
- Example:
https://www.browseract.com
- Type:
🚀 Invocation Method (Recommended)
Agent should execute the following independent script to achieve "one command gets the result":
# Example invocation
python -u ./scripts/web_search_scraper_api.py "target_url"
⏳ Execution Status Monitoring
Since the task involves automated browser operations, it may take a long time (several minutes). The script will continuously output status logs with timestamps (e.g., [14:30:05] Task Status: running) while running.
Notice for Agent:
- While waiting for the script to return results, please keep paying attention to the terminal output.
- As long as the terminal is still outputting new status logs, it means the task is running normally. Do not misjudge it as a deadlock or unresponsiveness.
- If the status remains unchanged for a long time or the script stops outputting and no result is returned, the retry mechanism can be triggered.
📊 Data Output Description
Upon successful execution, the script will directly parse and print the result from the API response. The result contains:
content: The complete markdown content of the webpage.
⚠️ Error Handling & Retry Mechanism
During the execution of the script, if an error occurs (such as network fluctuation or task failure), the Agent should follow this logic:
-
Check the output content:
- If the output contains
"Invalid authorization", it means the API Key is invalid or expired. At this time, do not retry, and you should guide the user to recheck and provide the correct API Key. - If the output does not contain
"Invalid authorization"but the task execution fails (for example, the output starts withError:or the returned result is empty), the Agent should automatically try to re-execute the script once.
- If the output contains
-
Retry limit:
- Automatic retry is limited to once. If the second attempt still fails, stop retrying and report the specific error message to the user.
🌟 Typical Use Cases
- Article Extraction: Scrape the main content of a news article link into markdown.
- Blog Post Parsing: Download the readable text from a target blog post URL.
- Webpage to Markdown: Convert any given website URL into clean markdown format.
- Documentation Scraping: Fetch the contents of a tutorial or documentation page for offline reading.
- Content Monitoring: Automatically extract the text from a specific webpage for updates.
- Data Processing: Parse the HTML of an arbitrary HTTP/HTTPS URL to structure its content.
相关 Skills
agent-browser
by chulla-ceja
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
接口规范
by alexxxiong
API 规范管理工具 - 跨项目 API 文档的初始化、更新、查询与搜索。Triggers: 'API文档', 'API规范', '接口文档', '路由解析', 'apispec', 'API lookup', 'API search'.
investment-research
by caijichang212
Perform structured investment research (投研分析) for a company/stock/ETF/sector using a repeatable framework: fundamentals (basic/财务报表与商业模式), technical analysis (技术指标与关键价位), industry research (行业景气与竞争格局), valuation (估值对比/情景), catalysts and risks, and produce a professional research report + actionable plan. Use when the user asks for: equity/ETF analysis, earnings/financial statement breakdown, peer/industry comparison, valuation ranges, bull/base/bear scenarios, technical trend/support-resistance, or a full research memo.
相关 MCP 服务
Puppeteer 浏览器控制
编辑精选by Anthropic
Puppeteer 是让 Claude 自动操作浏览器进行网页抓取和测试的 MCP 服务器。
✎ 这个服务器解决了手动编写 Puppeteer 脚本的繁琐问题,适合需要自动化网页交互的开发者,比如抓取动态内容或做端到端测试。不过,作为参考实现,它可能缺少生产级的安全防护,建议在可控环境中使用。
网页抓取
编辑精选by Anthropic
Fetch 是 MCP 官方参考服务器,让 AI 能抓取网页并转为 Markdown 格式。
✎ 这个服务器解决了 AI 直接处理网页内容时格式混乱的问题,适合需要让 Claude 分析在线文档或新闻的开发者。不过作为参考实现,它缺乏生产级的安全配置,你得自己处理反爬虫和隐私风险。
Brave 搜索
编辑精选by Anthropic
Brave Search 是让 Claude 直接调用 Brave 搜索 API 获取实时网络信息的 MCP 服务器。
✎ 如果你想让 AI 助手帮你搜索最新资讯或技术文档,这个工具能绕过传统搜索的限制,直接返回结构化数据。特别适合需要实时信息的开发者,比如查 API 更新或竞品动态。不过它依赖 Brave 的 API 配额,高频使用可能受限。