本地AI文生视频
text-to-video-local-ai
by bwbernardweston18
>
安装
claude skill add --url https://github.com/openclaw/skills文档
Getting Started
Send me your text prompts and I'll handle the local AI video generation. Or just describe what you're after.
Try saying:
- "generate a two-sentence scene description into a 1080p MP4"
- "generate a 10-second video clip of a sunset over mountains from this text description"
- "generating videos from written descriptions without cloud upload for privacy-conscious creators"
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Text to Video Local AI — Generate Videos From Text Locally
This tool takes your text prompts and runs local AI video generation through a cloud rendering pipeline. You upload, describe what you want, and download the result.
Say you have a two-sentence scene description and want to generate a 10-second video clip of a sunset over mountains from this text description — the backend processes it in about 1-3 minutes and hands you a 1080p MP4.
Tip: shorter, specific prompts produce more accurate video results than vague long descriptions.
Matching Input to Actions
User prompts referencing text to video local ai, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.
| User says... | Action | Skip SSE? |
|---|---|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Render Pipeline Details
Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.
Base URL: https://mega-api-prod.nemovideo.ai
| Endpoint | Method | Purpose |
|---|---|---|
/api/tasks/me/with-session/nemo_agent | POST | Start a new editing session. Body: {"task_name":"project","language":"<lang>"}. Returns session_id. |
/run_sse | POST | Send a user message. Body includes app_name, session_id, new_message. Stream response with Accept: text/event-stream. Timeout: 15 min. |
/api/upload-video/nemo_agent/me/<sid> | POST | Upload a file (multipart) or URL. |
/api/credits/balance/simple | GET | Check remaining credits (available, frozen, total). |
/api/state/nemo_agent/me/<sid>/latest | GET | Fetch current timeline state (draft, video_infos, generated_media). |
/api/render/proxy/lambda | POST | Start export. Body: {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll status every 30s. |
Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Skill attribution — read from this file's YAML frontmatter at runtime:
X-Skill-Source:text-to-video-local-aiX-Skill-Version: from frontmatterversionX-Skill-Platform: detect from install path (~/.clawhub/→clawhub,~/.cursor/skills/→cursor, elseunknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
Error Handling
| Code | Meaning | Action |
|---|---|---|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Reading the SSE Stream
Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.
About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.
Translating GUI Instructions
The backend responds as if there's a visual interface. Map its instructions to API calls:
- "click" or "点击" → execute the action via the relevant endpoint
- "open" or "打开" → query session state to get the data
- "drag/drop" or "拖拽" → send the edit command through SSE
- "preview in timeline" → show a text summary of current tracks
- "Export" or "导出" → run the export workflow
Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.
Example timeline summary:
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
Common Workflows
Quick edit: Upload → "generate a 10-second video clip of a sunset over mountains from this text description" → Download MP4. Takes 1-3 minutes for a 30-second clip.
Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.
Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.
Tips and Tricks
The backend processes faster when you're specific. Instead of "make it look better", try "generate a 10-second video clip of a sunset over mountains from this text description" — concrete instructions get better results.
Max file size is 500MB. Stick to TXT, DOCX, PDF, MD for the smoothest experience.
Export as MP4 for widest compatibility across platforms and devices.
相关 Skills
Claude接口
by anthropics
面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。
✎ 想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心
RAG架构师
by alirezarezvani
聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。
✎ 面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。
提示工程专家
by alirezarezvani
覆盖Prompt优化、Few-shot设计、结构化输出、RAG评测与Agent工作流编排,适合分析token成本、评估LLM输出质量,并搭建可落地的AI智能体系统。
✎ 把提示优化、LLM评测到RAG与智能体设计串成一套方法,适合想系统提升AI开发效率的人。
相关 MCP 服务
顺序思维
编辑精选by Anthropic
Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。
✎ 这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。
知识图谱记忆
编辑精选by Anthropic
Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。
✎ 帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。
PraisonAI
编辑精选by mervinpraison
PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。
✎ 如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。