Datasets

by ckchzh

Browse and load ready-to-use AI/ML datasets with fast manipulation. Use when searching datasets, loading training data, transforming formats.

View Chinese version with editor review

安装

claude skill add --url github.com/openclaw/skills/tree/main/skills/ckchzh/datasets

文档

Datasets

A data processing toolkit for ingesting, transforming, querying, and managing dataset entries from the command line. All operations are logged with timestamps and stored locally.

Commands

Data Operations

Each data command works in two modes: run without arguments to view recent entries, or pass input to record a new entry.

CommandDescription
datasets ingest <input>Ingest data — record a new ingest entry or view recent ones
datasets transform <input>Transform data — record a transformation or view recent ones
datasets query <input>Query data — record a query or view recent ones
datasets filter <input>Filter data — record a filter operation or view recent ones
datasets aggregate <input>Aggregate data — record an aggregation or view recent ones
datasets visualize <input>Visualize data — record a visualization or view recent ones
datasets export <input>Export data — record an export entry or view recent ones
datasets sample <input>Sample data — record a sample or view recent ones
datasets schema <input>Schema management — record a schema entry or view recent ones
datasets validate <input>Validate data — record a validation or view recent ones
datasets pipeline <input>Pipeline management — record a pipeline step or view recent ones
datasets profile <input>Profile data — record a profile or view recent ones

Utility Commands

CommandDescription
datasets statsShow summary statistics — entry counts per category, total entries, disk usage
datasets export <fmt>Export all data to a file (formats: json, csv, txt)
datasets search <term>Search all log files for a term (case-insensitive)
datasets recentShow last 20 entries from activity history
datasets statusHealth check — version, data directory, entry count, disk usage, last activity
datasets helpShow available commands
datasets versionShow version (v2.0.0)

Data Storage

All data is stored locally at ~/.local/share/datasets/:

  • Each data command writes to its own log file (e.g., ingest.log, transform.log)
  • Entries are stored as timestamp|value pairs (pipe-delimited)
  • All actions are tracked in history.log with timestamps
  • Export generates files in the data directory (export.json, export.csv, or export.txt)

Requirements

  • Bash (with set -euo pipefail)
  • Standard Unix utilities: date, wc, du, grep, tail, cat, sed
  • No external dependencies or API keys required

When to Use

  • To log and track data processing operations (ingest, transform, query, etc.)
  • To maintain a searchable history of data pipeline activities
  • To export accumulated records in JSON, CSV, or plain text format
  • As part of larger automation or data-pipeline workflows
  • When you need a lightweight, local-only dataset operation tracker

Examples

bash
# Record a new ingest entry
datasets ingest "loaded training_data.csv 10000 rows"

# View recent transform entries
datasets transform

# Record a query
datasets query "filter by date > 2026-01-01"

# Search across all logs
datasets search "training"

# Export everything as JSON
datasets export json

# Check overall statistics
datasets stats

# View recent activity
datasets recent

# Health check
datasets status

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com 💬 Feedback & Feature Requests: https://bytesagain.com/feedback