通义音频实验室

qwen-audio-lab

by aliyx

Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech, clone and reuse a voice from a reference recording, generate narration files from plain text or text files, or create PPT speaker-note voiceovers.

4.5k内容与创意未扫描2026年4月20日

安装

claude skill add --url https://github.com/openclaw/skills

文档

Qwen Audio Lab

Use this skill for text-to-speech on macOS or with Aliyun Qwen.

Choose the backend

  • Use mac-say for fast local playback, notifications, and low-friction speech on a Mac.
  • Use qwen-tts when the user wants better naturalness, reusable output files, custom voices, or voice cloning.
  • If DASHSCOPE_API_KEY is missing, fall back to mac-say for local playback.

Environment

  • DASHSCOPE_API_KEY: required for Qwen synthesis and voice cloning.
  • QWEN_AUDIO_REGION: optional, cn (default) or intl.
  • QWEN_AUDIO_OUTPUT_DIR: optional directory for generated audio files. Defaults to ~/.openclaw/data/qwen-audio-lab/output.
  • QWEN_AUDIO_STATE_DIR: optional directory for local state such as remembered voices. Defaults to ~/.openclaw/data/qwen-audio-lab/state.

Commands

Run all commands through:

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py <command> [...]

Preferred high-level commands

Use these first for most user-facing narration tasks:

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text --text "这是要转成语音的正文"
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file --text-file /path/to/script.txt
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt /path/to/file.pptx

Use the older commands only when you specifically want the legacy workflow names. Generated audio and remembered voice state now default to ~/.openclaw/data/qwen-audio-lab/ instead of the skill folder.

Local macOS speech

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py mac-say \
  --text "开会了,别忘了带电脑" \
  --voice Tingting

Qwen TTS from inline text

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
  --text "你好,我是你的语音助手。" \
  --voice Cherry \
  --model qwen3-tts-flash \
  --language-type Chinese \
  --download

Qwen TTS from a text file

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
  --text-file /path/to/script.txt \
  --voice Cherry \
  --download

Qwen TTS from stdin

bash
cat /path/to/script.txt | python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
  --stdin \
  --voice Cherry \
  --download

Clone a voice

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py clone-voice \
  --audio /path/to/reference.mp3 \
  --name claw-voice-01 \
  --target-model qwen3-tts-vc-2026-01-22
  • Keep the cloning target-model aligned with the synthesis model family.
  • Use a clean speech sample with minimal background noise.
  • Ask before cloning a third party voice when consent is unclear.

Design a voice from a text prompt

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py design-voice \
  --prompt "沉稳的中年男性播音员,音色低沉浑厚,适合纪录片旁白。" \
  --name doc-voice-01 \
  --target-model qwen3-tts-vd-2026-01-26 \
  --preview-format wav

Legacy command: reuse the latest cloned voice

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py speak-last-cloned \
  --text "你好,这是我的声音测试。" \
  --download

High-level narration from any text source

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text \
  --text "这是要转成语音的正文" \
  --output narration.wav

python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file \
  --text-file /path/to/script.txt
  • Default voice source is last-cloned.
  • Use --voice-source last-designed to use the latest designed voice instead.
  • Use --voice and optionally --model to force a specific voice id and synthesis model.

Legacy command: narrate PPT speaker notes with the latest cloned voice

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py ppt-own-voice   --ppt "/path/to/file.pptx"

High-level PPT narration

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt   --ppt "/path/to/file.pptx"
  • Default voice source is last-cloned.
  • Use --voice-source last-designed to switch to the latest designed voice.
  • Use --voice and optionally --model to force a specific voice id and synthesis model.
  • Keep ppt-own-voice as the backward-compatible alias for the original workflow.

Inspect or manage remembered voices

bash
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py list-voices
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py show-last-voice --kind cloned
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py delete-voice --voice claw-voice-01

Workflow rules

  • Reuse an existing cloned voice before asking for a new sample.
  • Ask for a reference recording if the user wants their own voice and no cloned voice exists yet.
  • Prefer the narrate-* commands as the primary high-level interface for narration tasks.
  • Keep speak-last-cloned and ppt-own-voice for backward compatibility with older workflows.
  • Keep only final outputs by default after segmented synthesis unless the user explicitly asks to keep fragments.

相关 Skills

文档共著

by anthropics

Universal
热门

围绕文档、提案、技术规格、决策记录等写作任务,按上下文收集、结构迭代、读者测试三步协作共创,减少信息遗漏,写出更清晰、经得起他人阅读的内容。

写文档、方案或技术规格时容易思路散、信息漏,它用结构化共著流程帮你高效传递上下文、反复打磨内容,还能从读者视角做验证。

内容与创意
未扫描139.0k

内部沟通

by anthropics

Universal
热门

按公司常用模板和语气快速起草内部沟通内容,覆盖 3P 更新、状态报告、领导汇报、项目进展、事故复盘、FAQ 与 newsletter,适合需要统一格式的团队沟通场景。

按公司偏好的模板快速产出状态汇报、领导更新和 FAQ,既省去反复改稿,也让内部沟通更统一、更专业。

内容与创意
未扫描139.0k

平面设计

by anthropics

Universal
热门

先生成视觉哲学,再落地成原创海报、艺术画面或其他静态设计,输出 .png/.pdf,强调构图、色彩与空间表达,适合需要高完成度视觉成品的场景。

做海报、插画或静态视觉稿时,用它能快速产出兼顾美感与版式的PNG/PDF成品,原创设计更省心,也更适合规避版权风险。

内容与创意
未扫描139.0k

相关 MCP 服务

免费的加密新闻聚合 MCP,汇集 Bitcoin、Ethereum、DeFi、Solana 与 altcoins 资讯源。

内容与创意
212

by ProfessionalWiki

热门

让 Large Language Model 客户端无缝连接任意 MediaWiki 站点,可创建、更新、搜索页面,并通过 OAuth 2.0 安全管理内容。

内容与创意16 个工具
94

by roomi-fields

热门

Automate Google NotebookLM — Q&A with citations, audio, video, content generation

内容与创意
79

评论