通义音频实验室
qwen-audio-lab
by aliyx
Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech, clone and reuse a voice from a reference recording, generate narration files from plain text or text files, or create PPT speaker-note voiceovers.
安装
claude skill add --url https://github.com/openclaw/skills文档
Qwen Audio Lab
Use this skill for text-to-speech on macOS or with Aliyun Qwen.
Choose the backend
- Use
mac-sayfor fast local playback, notifications, and low-friction speech on a Mac. - Use
qwen-ttswhen the user wants better naturalness, reusable output files, custom voices, or voice cloning. - If
DASHSCOPE_API_KEYis missing, fall back tomac-sayfor local playback.
Environment
DASHSCOPE_API_KEY: required for Qwen synthesis and voice cloning.QWEN_AUDIO_REGION: optional,cn(default) orintl.QWEN_AUDIO_OUTPUT_DIR: optional directory for generated audio files. Defaults to~/.openclaw/data/qwen-audio-lab/output.QWEN_AUDIO_STATE_DIR: optional directory for local state such as remembered voices. Defaults to~/.openclaw/data/qwen-audio-lab/state.
Commands
Run all commands through:
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py <command> [...]
Preferred high-level commands
Use these first for most user-facing narration tasks:
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text --text "这是要转成语音的正文"
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file --text-file /path/to/script.txt
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt /path/to/file.pptx
Use the older commands only when you specifically want the legacy workflow names.
Generated audio and remembered voice state now default to ~/.openclaw/data/qwen-audio-lab/ instead of the skill folder.
Local macOS speech
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py mac-say \
--text "开会了,别忘了带电脑" \
--voice Tingting
Qwen TTS from inline text
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
--text "你好,我是你的语音助手。" \
--voice Cherry \
--model qwen3-tts-flash \
--language-type Chinese \
--download
Qwen TTS from a text file
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
--text-file /path/to/script.txt \
--voice Cherry \
--download
Qwen TTS from stdin
cat /path/to/script.txt | python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py qwen-tts \
--stdin \
--voice Cherry \
--download
Clone a voice
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py clone-voice \
--audio /path/to/reference.mp3 \
--name claw-voice-01 \
--target-model qwen3-tts-vc-2026-01-22
- Keep the cloning
target-modelaligned with the synthesis model family. - Use a clean speech sample with minimal background noise.
- Ask before cloning a third party voice when consent is unclear.
Design a voice from a text prompt
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py design-voice \
--prompt "沉稳的中年男性播音员,音色低沉浑厚,适合纪录片旁白。" \
--name doc-voice-01 \
--target-model qwen3-tts-vd-2026-01-26 \
--preview-format wav
Legacy command: reuse the latest cloned voice
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py speak-last-cloned \
--text "你好,这是我的声音测试。" \
--download
High-level narration from any text source
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-text \
--text "这是要转成语音的正文" \
--output narration.wav
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-file \
--text-file /path/to/script.txt
- Default voice source is
last-cloned. - Use
--voice-source last-designedto use the latest designed voice instead. - Use
--voiceand optionally--modelto force a specific voice id and synthesis model.
Legacy command: narrate PPT speaker notes with the latest cloned voice
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py ppt-own-voice --ppt "/path/to/file.pptx"
High-level PPT narration
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py narrate-ppt --ppt "/path/to/file.pptx"
- Default voice source is
last-cloned. - Use
--voice-source last-designedto switch to the latest designed voice. - Use
--voiceand optionally--modelto force a specific voice id and synthesis model. - Keep
ppt-own-voiceas the backward-compatible alias for the original workflow.
Inspect or manage remembered voices
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py list-voices
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py show-last-voice --kind cloned
python3 ~/.openclaw/skills/qwen-audio-lab/scripts/qwen_audio.py delete-voice --voice claw-voice-01
Workflow rules
- Reuse an existing cloned voice before asking for a new sample.
- Ask for a reference recording if the user wants their own voice and no cloned voice exists yet.
- Prefer the
narrate-*commands as the primary high-level interface for narration tasks. - Keep
speak-last-clonedandppt-own-voicefor backward compatibility with older workflows. - Keep only final outputs by default after segmented synthesis unless the user explicitly asks to keep fragments.
相关 Skills
文档共著
by anthropics
围绕文档、提案、技术规格、决策记录等写作任务,按上下文收集、结构迭代、读者测试三步协作共创,减少信息遗漏,写出更清晰、经得起他人阅读的内容。
✎ 写文档、方案或技术规格时容易思路散、信息漏,它用结构化共著流程帮你高效传递上下文、反复打磨内容,还能从读者视角做验证。
内部沟通
by anthropics
按公司常用模板和语气快速起草内部沟通内容,覆盖 3P 更新、状态报告、领导汇报、项目进展、事故复盘、FAQ 与 newsletter,适合需要统一格式的团队沟通场景。
✎ 按公司偏好的模板快速产出状态汇报、领导更新和 FAQ,既省去反复改稿,也让内部沟通更统一、更专业。
平面设计
by anthropics
先生成视觉哲学,再落地成原创海报、艺术画面或其他静态设计,输出 .png/.pdf,强调构图、色彩与空间表达,适合需要高完成度视觉成品的场景。
✎ 做海报、插画或静态视觉稿时,用它能快速产出兼顾美感与版式的PNG/PDF成品,原创设计更省心,也更适合规避版权风险。
相关 MCP 服务
by nirholas
免费的加密新闻聚合 MCP,汇集 Bitcoin、Ethereum、DeFi、Solana 与 altcoins 资讯源。
by ProfessionalWiki
让 Large Language Model 客户端无缝连接任意 MediaWiki 站点,可创建、更新、搜索页面,并通过 OAuth 2.0 安全管理内容。
by roomi-fields
Automate Google NotebookLM — Q&A with citations, audio, video, content generation