数据集

Datasets

by ckchzh

Browse and load ready-to-use AI/ML datasets with fast manipulation. Use when searching datasets, loading training data, transforming formats.

4.5k数据与存储未扫描2026年3月23日

安装

claude skill add --url github.com/openclaw/skills/tree/main/skills/ckchzh/datasets

文档

Datasets

A data processing toolkit for ingesting, transforming, querying, and managing dataset entries from the command line. All operations are logged with timestamps and stored locally.

Commands

Data Operations

Each data command works in two modes: run without arguments to view recent entries, or pass input to record a new entry.

CommandDescription
datasets ingest <input>Ingest data — record a new ingest entry or view recent ones
datasets transform <input>Transform data — record a transformation or view recent ones
datasets query <input>Query data — record a query or view recent ones
datasets filter <input>Filter data — record a filter operation or view recent ones
datasets aggregate <input>Aggregate data — record an aggregation or view recent ones
datasets visualize <input>Visualize data — record a visualization or view recent ones
datasets export <input>Export data — record an export entry or view recent ones
datasets sample <input>Sample data — record a sample or view recent ones
datasets schema <input>Schema management — record a schema entry or view recent ones
datasets validate <input>Validate data — record a validation or view recent ones
datasets pipeline <input>Pipeline management — record a pipeline step or view recent ones
datasets profile <input>Profile data — record a profile or view recent ones

Utility Commands

CommandDescription
datasets statsShow summary statistics — entry counts per category, total entries, disk usage
datasets export <fmt>Export all data to a file (formats: json, csv, txt)
datasets search <term>Search all log files for a term (case-insensitive)
datasets recentShow last 20 entries from activity history
datasets statusHealth check — version, data directory, entry count, disk usage, last activity
datasets helpShow available commands
datasets versionShow version (v2.0.0)

Data Storage

All data is stored locally at ~/.local/share/datasets/:

  • Each data command writes to its own log file (e.g., ingest.log, transform.log)
  • Entries are stored as timestamp|value pairs (pipe-delimited)
  • All actions are tracked in history.log with timestamps
  • Export generates files in the data directory (export.json, export.csv, or export.txt)

Requirements

  • Bash (with set -euo pipefail)
  • Standard Unix utilities: date, wc, du, grep, tail, cat, sed
  • No external dependencies or API keys required

When to Use

  • To log and track data processing operations (ingest, transform, query, etc.)
  • To maintain a searchable history of data pipeline activities
  • To export accumulated records in JSON, CSV, or plain text format
  • As part of larger automation or data-pipeline workflows
  • When you need a lightweight, local-only dataset operation tracker

Examples

bash
# Record a new ingest entry
datasets ingest "loaded training_data.csv 10000 rows"

# View recent transform entries
datasets transform

# Record a query
datasets query "filter by date > 2026-01-01"

# Search across all logs
datasets search "training"

# Export everything as JSON
datasets export json

# Check overall statistics
datasets stats

# View recent activity
datasets recent

# Health check
datasets status

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com 💬 Feedback & Feature Requests: https://bytesagain.com/feedback

相关 Skills

技术栈评估

by alirezarezvani

Universal
热门

对比框架、数据库和云服务,结合 5 年 TCO、安全风险、生态活力与迁移复杂度做量化评估,适合技术选型、栈升级和替换路线决策。

帮你系统比较技术栈优劣,不只看功能,还把TCO、安全性和生态健康度一起量化,选型和迁移决策更稳。

数据与存储
未扫描17.5k

资深数据科学家

by alirezarezvani

Universal
热门

覆盖实验设计、特征工程、预测建模、因果推断与模型评估,适合用 Python/R/SQL 做 A/B 测试、时序分析和生产级 ML 落地,支撑数据驱动决策。

从 A/B 测试、因果分析到预测建模一条龙搞定,既有硬核统计方法也懂业务沟通,特别适合把数据结论真正落地。

数据与存储
未扫描17.5k

资深架构师

by alirezarezvani

Universal
热门

适合系统设计评审、ADR记录和扩展性规划,分析依赖与耦合,权衡单体或微服务、数据库与技术栈选型,并输出Mermaid、PlantUML、ASCII架构图。

搞系统设计、技术选型和扩展规划时,用它能更快理清架构决策与依赖关系,还能直接产出 Mermaid/PlantUML 图,方案讨论效率很高。

数据与存储
未扫描17.5k

相关 MCP 服务

by Anthropic

热门

PostgreSQL 是让 Claude 直接查询和管理你的数据库的 MCP 服务器。

这个服务器解决了开发者需要手动编写 SQL 查询的痛点,特别适合数据分析师或后端开发者快速探索数据库结构。不过,由于是参考实现,生产环境使用前务必评估安全风险,别指望它能处理复杂事务。

数据与存储
86.9k

SQLite 数据库

编辑精选

by Anthropic

热门

SQLite 是让 AI 直接查询本地数据库进行数据分析的 MCP 服务器。

这个服务器解决了 AI 无法直接访问 SQLite 数据库的问题,适合需要快速分析本地数据集的开发者。不过,作为参考实现,它可能缺乏生产级的安全特性,建议在受控环境中使用。

数据与存储
86.6k

by Firecrawl

热门

Firecrawl 是让 AI 直接抓取网页并提取结构化数据的 MCP 服务器。

它解决了手动写爬虫的麻烦,让 Claude 能直接访问动态网页内容。最适合需要实时数据的研究者或开发者,比如监控竞品价格或抓取新闻。但要注意,它依赖第三方 API,可能涉及隐私和成本问题。

数据与存储
6.5k

评论