Audio Content Generator技能使用说明

2026-03-30 新闻来源：网淘吧围观:153

电脑广告

手机广告

🎙️ 音频内容生成器

利用AI编写的脚本和ElevenLabs文本转语音技术，按需生成高质量的有声书、播客或教育音频内容。

快速开始

创建有声书章节：

Audio Content Generator

User: "Create a 5-minute audiobook chapter about a dragon discovering friendship"

生成播客：

User: "Make a 10-minute podcast about the history of coffee"

制作教育内容：

User: "Generate a 15-minute educational audio explaining how neural networks work"

内容格式

有声书

风格：富有情感深度的叙事性故事讲述

清晰的开端、发展和结尾
描述性语言和生动的意象
富有戏剧性的节奏和深思熟虑的停顿
与故事相匹配的情感基调
使用语音效果如[低语]、[兴奋]、[严肃]以增强感染力

示例结构：

[Opening hook - set the scene]
[long pause]

[Story development with character emotions]
[short pause] between sentences
[long pause] between paragraphs

[Climax with dramatic tension]
[long pause]

[Resolution and emotional closure]

播客

风格：对话式且引人入胜

温暖、欢迎式的开场（15-30秒）
主要内容自然流畅
话题之间的过渡
令人印象深刻的结尾，包含关键要点
始终保持对话式语气

示例结构：

**Intro:** "Welcome to [topic]. I'm excited to share..."
[short pause]

**Main Content:** "Let's start with... [topic 1]"
[long pause] between segments

**Outro:** "Thanks for listening! Remember..."

教育内容

风格：清晰的解释以辅助学习

对复杂主题的简洁介绍
逐步分解
现实世界的例子和类比
结尾回顾关键概念
热情洋溢的表达，并在[激动地]强调重要观点

示例结构：

**Introduction:** What is [topic] and why it matters?

**Main Content:**
- Concept 1: Explanation + Example
- Concept 2: Explanation + Example
- Concept 3: Explanation + Example

**Summary:** Key takeaways and next steps

时长指导方针

字数到时长转换：

5分钟 ≈ 375字
10分钟 ≈ 750字
15分钟 ≈ 1125字
20分钟 ≈ 1500字
30分钟 ≈ 2250字

节奏参考：平均语速约为每分钟75字

实用范围：

最低时长：2分钟（约150字）
最高时长：30分钟（约2250字）
理想区间：5-15分钟（最佳参与度）

工作流程说明

第一步：理解需求

解析用户需求，明确：

内容类型（有声书、播客、教育类，或根据主题推断）
主题方向（内容应围绕什么主题展开）
目标时长（多少分钟）
语调风格（戏剧性、随意、教育性等）
特殊要求（特定的语气，强调某些要点）

第二步：计算字数

target_words = target_minutes × 75

例如：10分钟 = 10 × 75 = 750字

第三步：生成脚本

按照以下规则撰写完整的脚本：

内容指南：

开头要有力，使用引人入胜的钩子
保持自然、对话式的流畅感
使用主动语态和简单的句子结构
包含相关的例子和故事
以令人满意的结论结尾

格式规则：

在句子后添加[短暂停顿]（谨慎使用，并非每句都加）
在段落或主要部分之间添加[长停顿]策略性地使用声音效果：
[低声]、[大喊][shouts][兴奋地][严肃地][讽刺地][唱歌][大笑]将数字写作文字：“二十三”，而非“23”首次出现缩写时拼写全称：“AI，即人工智能”避免使用复杂标点（破折号可用，但分号朗读效果不佳）在文本转语音转换前移除标记格式第四步：呈现脚本
向用户展示脚本并询问：
第五步：处理用户反馈
如果用户要求修改：
根据调整重新生成脚本

保持目标字数

展示修订版本

Here's the [format] script I've created (approximately [length] minutes):

[Display the script]

Would you like me to:
1. Generate the audio now
2. Make changes to the script
3. Adjust the length or tone

如果用户批准：

继续进行音频生成

Regenerate the script with adjustments
Maintain the target word count
Present the revised version

If user approves:

Proceed to audio generation

第六步：生成音频

为TTS格式化脚本：

移除所有剩余的markdown格式（标题、粗体、斜体）
确保语音效果格式正确[效果]格式
检查停顿位置是否恰当
验证数字和缩写是否已拼写出来

调用TTS脚本：

重要提示：系统已配置好ELEVENLABS_API_KEY环境变量。直接调用TTS脚本即可。

uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \
  -o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \
  -m eleven_multilingual_v2 \
  "[formatted_script]"

对于长脚本，请使用heredoc：

uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \
  -o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \
  -m eleven_multilingual_v2 \
  "$(cat <<'EOF'
[formatted_script]
EOF
)"

返回结果：

MEDIA:/tmp/audio-gen-[timestamp]-[topic-slug].mp3

Your [format] is ready! [Brief description of content]. Duration: approximately [X] minutes.

语音效果（SSML标签）

可用的语音调制效果（为增强效果请谨慎使用）：

[耳语]- 柔和、亲密的表达方式
[呼喊]- 响亮、强调的表达方式
[兴奋]- 热情洋溢，充满活力的语气
[严肃]- 庄重，严肃的语气
[讽刺]- 讽刺，嘲弄的语气
[演唱]- 富有音乐性，旋律优美的表达方式
[笑]- 愉悦，欢快的语气
[短暂停顿]- 短暂沉默（约0.5秒）
[长停顿]- 较长沉默（约1-2秒）

最佳实践：

音效应用于情感强烈的时刻，而非每句话
停顿是控制节奏最有力的工具
声音效果在有声读物和戏剧性内容中效果最佳
播客和教育类内容应保持自然为主

错误处理

脚本过长

若生成脚本超出目标时长>20%：

The script I generated is [X] words ([Y] minutes), which is longer than your target of [Z] minutes. Would you like me to:
1. Condense it to fit the target length
2. Split it into multiple parts
3. Keep it as is

脚本过短

如果生成的脚本低于目标值超过20%：

The script is [X] words ([Y] minutes), shorter than your target. Would you like me to:
1. Expand it with more detail
2. Add additional examples or stories
3. Generate as is

TTS生成失败

如果TTS脚本失败：

I've created the script, but I'm unable to generate the audio right now. Here's your script:

[Display script]

Error: [specific error message]

You can:
1. Check that ELEVENLABS_API_KEY is configured
2. Use the script with your own text-to-speech tool
3. Try again in a moment
4. Ask me to troubleshoot the audio generation

常见TTS问题：

API密钥未设置：请检查配置文件中的ELEVENLABS_API_KEY
频率限制：请稍等片刻后重试
文本过长：请分割成更小的段落（最大约5000字符）

无效请求

对于不切实际的请求（例如，“100小时的有声书”）：

That length would require [X] words and take significant time to generate. I recommend:
- Breaking it into multiple episodes/chapters
- Targeting 5-30 minutes per audio file
- Creating a series instead of one long file

最佳效果提示

用于制作引人入胜的有声书

专注于角色情感和感官细节
运用停顿来营造戏剧张力
变化句子长度以创造节奏感
包含内心独白和反思

用于制作引人入胜的播客

以一个问题或令人惊讶的事实开始
使用对话式短语：“你知道有趣的是什么吗……”
包含来自日常生活的相关例子
以可操作的要点结尾

制作高效教育内容

采用"像给五岁孩子解释"的方法
从简单概念逐步过渡到复杂概念
重复关键术语和定义
提供多个示例以增强清晰度

技术说明

TTS实现方案：

使用Python脚本：~/.clawdbot/clawdbot/skills/sag/scripts/tts.py
无需二进制安装（纯Python + requests库）
直接调用ElevenLabs API
兼容Linux和macOS系统

文件存储：

音频文件保存至/tmp/audio-gen/
文件名格式：audio-gen-[时间戳]-[主题短链].mp3
文件将在24小时后自动清理

API要求：

脚本生成需使用Anthropic API（已配置完成）
ElevenLabs API用于文本转语音（通过ELEVENLABS_API_KEY配置）
两项服务都必须完成配置并拥有可用额度

支持的模型：

eleven_multilingual_v2- 最佳质量（默认）
eleven_turbo_v2- 生成速度更快
eleven_turbo_v2_5- 生成速度最快
eleven_multilingual_v1- 旧版模型

成本估算：

10分钟音频（约750词）：约1.43美元
- Claude API：约0.075美元
- ElevenLabs：约1.35美元
更长内容按比例增加

生成时间：

脚本生成：5-30秒（取决于长度）
音频生成：5-15秒（ElevenLabs处理时间）
总计：10分钟音频通常不超过1分钟

限制说明

最大长度：每个音频文件约30分钟（约2,250字）
- 对于较长的内容，请创建多个部分/集数
单一语音：目前支持一个叙述者语音
- 无法进行多语音对话或角色语音
无背景音乐：仅纯语音叙述
- 无背景音乐、音效或音频混合
实时生成：每个请求生成全新内容
- 无预制模板或缓存音频
语言：主要为英语
- ElevenLabs支持其他语言，但内容生成针对英语进行了优化

对话示例

示例1：快速有声书

User: Create a 5-minute audiobook chapter about a lonely robot learning to feel emotions

Agent: I'll create a 5-minute sci-fi audiobook chapter. Let me write the story...

[Generates ~400-word narrative with emotional arc]

Here's the story I've created about Unit-7, a maintenance robot who discovers the meaning of friendship through an unexpected encounter. Would you like me to generate the audio or make changes first?

User: Generate the audio

Agent: [Invokes SAG with formatted script]

MEDIA:/tmp/audio-gen-1234567890-lonely-robot.mp3

Your audiobook chapter is ready! The story follows Unit-7's emotional awakening through a touching encounter with a young engineer. Duration: approximately 5 minutes.

示例2：带修订的播客

User: Make a 10-minute podcast about sustainable living tips

Agent: [Generates podcast script with intro, 5 practical tips, outro]

Here's a 10-minute podcast with 5 actionable sustainable living tips. The tone is encouraging and practical. Would you like me to:
1. Generate audio now
2. Make it more urgent/climate-focused
3. Add more specific examples
4. Adjust the length

User: Add more specific examples for each tip

Agent: [Regenerates with detailed examples]

Better? I've added specific examples like calculating your carbon footprint, composting at home, and choosing sustainable brands. Ready for audio?

User: Perfect!

Agent: MEDIA:/tmp/audio-gen-1234567890-sustainable-living.mp3

Your sustainable living podcast is ready with 5 practical tips, each with concrete examples you can start using today! Duration: 10 minutes.

示例3：教育内容

User: Create a 15-minute educational audio explaining quantum computing for beginners

Agent: I'll create a beginner-friendly 15-minute explanation of quantum computing...

[Generates ~1,100 words with analogies, examples, and clear structure]

Script preview:
- Introduction: What is quantum computing? (2 min)
- Classical vs Quantum: The key differences (4 min)
- Qubits and Superposition: How it works (5 min)
- Real Applications: What it means for you (3 min)
- Conclusion: The future of computing (1 min)

Should I generate the audio?

User: Yes please

Agent: MEDIA:/tmp/audio-gen-1234567890-quantum-computing.mp3

Your educational podcast on quantum computing is ready! It uses everyday analogies to explain qubits, superposition, and real-world applications. Duration: approximately 15 minutes.

何时使用此技能

当用户请求以下内容时，请使用音频生成技能：

"创建一本关于……的有声书"
"制作一期关于……的播客节目"
"生成解释……的音频内容"
"录制一个关于……的故事"
"为……制作一份音频指南"
"创建……的旁白版本"
"制作一堂关于……的音频课程"

请勿用于：

转录现有音频（请改用whisper技能）
将用户撰写的文本转换为音频（请直接使用SAG技能）
实时语音交互（请使用voice-call插件）
多说话者对话或访谈（当前限制）

高级用法

语音选择

用户可以请求特定语音：

User: Create a podcast with the Rachel voice

Agent: [Checks voice configuration for "Rachel" voice ID and uses it in SAG command]

系列/剧集

对于多部分内容：

User: Create a 3-episode series about space exploration

Agent: I'll create Episode 1 first. Each episode will be 10 minutes. Let's start with "The Dawn of the Space Age"...

[After completion]

Episode 1 is ready! Would you like me to continue with Episode 2?

格式混合

混合格式以创造独特风格：

User: Create an educational podcast that tells a story

Agent: [Generates content that combines storytelling narrative with educational explanations]

故障排除

问题：音频听起来机械或不自然解决方案：增加更多停顿和语音效果。使用缩略词和口语化语言。

问题：脚本长度不符合要求解决方案：明确指定字数目标后重新生成。检查计算（75字/分钟）。

问题：内容过于技术性或过于简单解决方案：询问用户目标受众。相应调整内容复杂度。

问题：SAG命令执行失败解决方案：检查ELEVENLABS_API_KEY是否已设置。确认SAG技能已安装且正常工作。

问题：用户希望手动编辑脚本解决方案：提供纯文本脚本。用户可修改后粘贴回来以生成音频。

💡专业提示：在生成音频前，务必先完成脚本并获取用户确认。这能节省时间和API成本，并确保最终产出完全符合用户需求。

免责申明

部分文章来自各大搜索引擎，如有侵权，请与我联系删除。

打赏

文章底部电脑广告

手机广告位-内容正文底部

标签

上一篇：Openclaw Config技能使用说明下一篇：Mail技能使用说明

Audio Content Generator技能使用说明

🎙️ 音频内容生成器

快速开始

内容格式

有声书

播客

教育内容

时长指导方针

工作流程说明

第一步：理解需求

第二步：计算字数

第三步：生成脚本

保持目标字数

如果用户批准：

第六步：生成音频

语音效果（SSML标签）

错误处理

脚本过长

脚本过短

TTS生成失败

无效请求

最佳效果提示

用于制作引人入胜的有声书

用于制作引人入胜的播客

制作高效教育内容

技术说明

限制说明

对话示例

示例1：快速有声书

示例2：带修订的播客

示例3：教育内容

何时使用此技能

高级用法

语音选择

系列/剧集

格式混合

故障排除

相关文章

推荐文章

热门浏览

标签列表