推特爬虫

Twint

by ckchzh

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing twitter-scraper, python, elasticsearch, kibana, osint.

4.5k搜索与获取未扫描2026年3月23日

安装

claude skill add --url github.com/openclaw/skills/tree/main/skills/ckchzh/social-scraper

文档

Social Scraper

Social Scraper v2.0.0 — a general-purpose utility toolkit for logging, tracking, and managing data entries from the command line. Each command records timestamped entries into its own log file and supports review of recent history.

Commands

The script (scripts/script.sh) exposes the following commands via a case dispatcher:

CommandDescription
run <input>Record a "run" entry. Without args, shows the 20 most recent run entries.
check <input>Record a "check" entry. Without args, lists recent check entries.
convert <input>Record a "convert" entry. Without args, lists recent convert entries.
analyze <input>Record an "analyze" entry. Without args, lists recent analyze entries.
generate <input>Record a "generate" entry. Without args, lists recent generate entries.
preview <input>Record a "preview" entry. Without args, lists recent preview entries.
batch <input>Record a "batch" entry. Without args, lists recent batch entries.
compare <input>Record a "compare" entry. Without args, lists recent compare entries.
export <input>Record an "export" entry. Without args, lists recent export entries.
config <input>Record a "config" entry. Without args, lists recent config entries.
status <input>Record a "status" entry. Without args, lists recent status entries.
report <input>Record a "report" entry. Without args, lists recent report entries.
statsShow summary statistics across all log files (entry counts per type, total, disk usage).
export <fmt>Export all data in json, csv, or txt format to $DATA_DIR/export.<fmt>.
search <term>Search all log files for a term (case-insensitive grep).
recentShow the 20 most recent lines from history.log.
statusHealth check — shows version, data directory, total entries, disk usage, last activity.
helpDisplay the full help/usage message.
versionPrint social-scraper v2.0.0.

Note: The export and status commands appear twice in the case statement. The first match (entry-logging form) takes precedence. The standalone _export and _status helper functions are reachable only if the entry-logging branches are bypassed.

How Each Entry Command Works

  1. If called without arguments, it tails the last 20 lines of <command>.log.
  2. If called with arguments, it:
    • Timestamps the input (YYYY-MM-DD HH:MM|<input>)
    • Appends it to $DATA_DIR/<command>.log
    • Prints confirmation with the current total count
    • Logs the action to history.log

Data Storage

All data is stored as plain-text log files under:

code
~/.local/share/social-scraper/
├── run.log
├── check.log
├── convert.log
├── analyze.log
├── generate.log
├── preview.log
├── batch.log
├── compare.log
├── export.log
├── config.log
├── status.log
├── report.log
└── history.log          # unified activity log

Each log line uses pipe-delimited format: YYYY-MM-DD HH:MM|<value>

The history.log uses: MM-DD HH:MM <command>: <value>

Requirements

  • Bash 4.0+ (uses local variables, set -euo pipefail)
  • coreutils: date, wc, du, tail, cat, basename, grep, sed
  • No external dependencies, API keys, or network access required
  • Works on Linux and macOS

When to Use

  1. Quick data logging — when you need a lightweight CLI to record timestamped scraping results without setting up a database
  2. Scrape tracking — log each scraping run with parameters and review them later with recent or search
  3. Batch scrape records — track batch scraping jobs for auditing and reproducibility
  4. Data export — pull all logged entries into JSON, CSV, or TXT for reporting or integration with analytics pipelines
  5. Health monitoring — use stats and status to get a quick overview of scraping activity volume and disk usage

Examples

Log a scraping run and review history

bash
# Record a scraping session
bash scripts/script.sh run "scraped @user timeline 500 tweets"

# Check recent runs
bash scripts/script.sh run

Analyze and report on collected data

bash
# Log an analysis
bash scripts/script.sh analyze "sentiment breakdown for #topic"

# Generate a report entry
bash scripts/script.sh report "weekly scrape summary: 3200 entries"

Search across all logs

bash
bash scripts/script.sh search "timeline"

Export everything as CSV

bash
bash scripts/script.sh export csv
# Output: ~/.local/share/social-scraper/export.csv

View summary statistics

bash
bash scripts/script.sh stats
# Shows per-type counts, totals, and disk usage

Configuration

Set the DATA_DIR variable (or modify it in the script) to change the storage directory. Default: ~/.local/share/social-scraper/

Output

All commands print to stdout. Redirect to a file as needed:

bash
bash scripts/script.sh report > weekly-report.txt

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

相关 Skills

谷歌视频工具

by bwbernardweston18

热门

>

搜索与获取
未扫描4.5k
热门

股票投研9点分析框架,覆盖基本面/财务/竞品/估值/宏观/情绪等维度

搜索与获取
未扫描4.5k

SEO审计工具

by amdf01-debug

热门

搜索与获取
未扫描4.5k

相关 MCP 服务

网页抓取

编辑精选

by Anthropic

热门

Fetch 是 MCP 官方参考服务器,让 AI 能抓取网页并转为 Markdown 格式。

这个服务器解决了 AI 直接处理网页内容时格式混乱的问题,适合需要让 Claude 分析在线文档或新闻的开发者。不过作为参考实现,它缺乏生产级的安全配置,你得自己处理反爬虫和隐私风险。

搜索与获取
86.6k

by Anthropic

热门

Puppeteer 是让 Claude 自动操作浏览器进行网页抓取和测试的 MCP 服务器。

这个服务器解决了手动编写 Puppeteer 脚本的繁琐问题,适合需要自动化网页交互的开发者,比如抓取动态内容或做端到端测试。不过,作为参考实现,它可能缺少生产级的安全防护,建议在可控环境中使用。

搜索与获取
86.6k

Brave 搜索

编辑精选

by Anthropic

热门

Brave Search 是让 Claude 直接调用 Brave 搜索 API 获取实时网络信息的 MCP 服务器。

如果你想让 AI 助手帮你搜索最新资讯或技术文档,这个工具能绕过传统搜索的限制,直接返回结构化数据。特别适合需要实时信息的开发者,比如查 API 更新或竞品动态。不过它依赖 Brave 的 API 配额,高频使用可能受限。

搜索与获取
86.6k

评论