AudioPod技能使用说明
2026-03-29
新闻来源:网淘吧
围观:4
电脑广告
手机广告
AudioPod AI
完整的音频处理API:音乐生成、音轨分离、文本转语音、降噪、转录、说话人分离、钱包管理。
设置
pip install audiopod # Python
npm install audiopod # Node.js
认证:设置AUDIOPOD_API_KEY环境变量或传递给客户端构造函数。

获取API密钥
- 请前往https://audiopod.ai/auth/signup注册(免费,无需信用卡)
- 然后访问https://www.audiopod.ai/dashboard/account/api-keys
- 点击"创建API密钥"并复制密钥(以
ap_开头) - 在https://www.audiopod.ai/dashboard/account/wallet为您的钱包充值(按需付费,无订阅制)
from audiopod import AudioPod
client = AudioPod() # uses AUDIOPOD_API_KEY env var
# or: client = AudioPod(api_key="ap_...")
AI音乐生成
根据文本提示生成歌曲、说唱、器乐、采样和人声。
任务: 文本转音乐(带人声的歌曲),文本转说唱(说唱),提示转器乐(器乐),歌词转人声(仅人声),文本转样本(循环/样本),音频转音频(风格转换),歌曲绽放
Python SDK
# Generate a full song with lyrics
result = client.music.song(
prompt="Upbeat pop, synth, drums, 120 bpm, female vocals, radio-ready",
lyrics="Verse 1:\nWalking down the street on a sunny day\n\nChorus:\nWe're on fire tonight!",
duration=60
)
print(result["output_url"])
# Generate rap
result = client.music.rap(
prompt="Lo-Fi Hip Hop, 100 BPM, male rap, melancholy, keyboard chords",
lyrics="Verse 1:\nStarted from the bottom, now we climbing...",
duration=60
)
# Generate instrumental (no lyrics needed)
result = client.music.instrumental(
prompt="Atmospheric ambient soundscape, uplifting, driving mood",
duration=30
)
# Generic generate with explicit task
result = client.music.generate(
prompt="Electronic dance music, high energy",
task="text2samples", # any task type
duration=30
)
# Async: submit then poll
job = client.music.create(
prompt="Chill lofi beat",
duration=30,
task="prompt2instrumental"
)
result = client.music.wait_for_completion(job["id"], timeout=600)
# Get available genre presets
presets = client.music.get_presets()
# List/manage jobs
jobs = client.music.list(skip=0, limit=50)
job = client.music.get(job_id=123)
client.music.delete(job_id=123)
cURL
# Song with lyrics
curl -X POST "https://api.audiopod.ai/api/v1/music/text2music" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"upbeat pop, synth, 120bpm, female vocals", "lyrics":"Walking down the street...", "audio_duration":60}'
# Rap
curl -X POST "https://api.audiopod.ai/api/v1/music/text2rap" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"Lo-Fi Hip Hop, male rap, 100 BPM", "lyrics":"Started from the bottom...", "audio_duration":60}'
# Instrumental
curl -X POST "https://api.audiopod.ai/api/v1/music/prompt2instrumental" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"ambient soundscape, uplifting", "audio_duration":30}'
# Samples/loops
curl -X POST "https://api.audiopod.ai/api/v1/music/text2samples" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"drum loop, sad mood", "audio_duration":15}'
# Vocals only
curl -X POST "https://api.audiopod.ai/api/v1/music/lyric2vocals" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"clean vocals, happy", "lyrics":"Eternal chorus of unity...", "audio_duration":30}'
# Check job status / get result
curl "https://api.audiopod.ai/api/v1/music/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Get genre presets
curl "https://api.audiopod.ai/api/v1/music/presets" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "https://api.audiopod.ai/api/v1/music/jobs?skip=0&limit=50" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "https://api.audiopod.ai/api/v1/music/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
参数
| 字段 | 必需 | 描述 |
|---|---|---|
| 提示词 | 是 | 风格/流派描述 |
| 歌词 | 用于歌曲/说唱/人声 | 具有主歌/副歌结构的歌曲歌词 |
| 音频时长 | 不 | 持续时间(单位:秒)(默认值:30) |
| 流派预设 | 不 | 流派预设名称(来自预设端点) |
| 显示名称 | 不 | 曲目显示名称 |
音轨分离
将音频分离为独立的乐器/人声音轨。
模式
| 模式 | 音轨 | 输出 | 使用场景 |
|---|---|---|---|
| 单轨 | 1 | 仅指定音轨 | 人声隔离、鼓点提取 |
| 双轨 | 2 | 人声 + 伴奏 | 卡拉OK曲目 |
| 四轨 | 4 | 人声、鼓点、贝斯、其他 | 标准混音(默认) |
| 六 | 6 | + 吉他、钢琴 | 完整乐器分离 |
| 制作人 | 8 | + 底鼓、军鼓、踩镲 | 节拍制作 |
| 录音室 | 12 | + 镲片、低音贝斯、合成器 | 专业混音 |
| 母带处理 | 16 | 最大细节 | 音轨分析 |
单轨选项:人声、鼓、贝斯、吉他、钢琴、其他
Python SDK
# Sync: extract and wait for result
result = client.stems.separate(
url="https://youtube.com/watch?v=VIDEO_ID",
mode="six",
timeout=600
)
for stem, url in result["download_urls"].items():
print(f"{stem}: {url}")
# From local file
result = client.stems.separate(file="/path/to/song.mp3", mode="four")
# Single stem extraction
result = client.stems.separate(
url="https://youtube.com/watch?v=ID",
mode="single",
stem="vocals"
)
# Async: submit then poll
job = client.stems.extract(url="https://youtube.com/watch?v=ID", mode="six")
print(f"Job ID: {job['id']}")
status = client.stems.status(job["id"])
# or wait:
result = client.stems.wait_for_completion(job["id"], timeout=600)
# List available modes
modes = client.stems.modes()
# Job management
jobs = client.stems.list(skip=0, limit=50, status="COMPLETED")
job = client.stems.get(job_id=1234)
client.stems.delete(job_id=1234)
cURL
# Extract from URL
curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=https://youtube.com/watch?v=VIDEO_ID" \
-F "mode=six"
# Extract from file
curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "file=@/path/to/song.mp3" \
-F "mode=four"
# Single stem
curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=URL" \
-F "mode=single" \
-F "stem=vocals"
# Check job status
curl "https://api.audiopod.ai/api/v1/stem-extraction/status/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List available modes
curl "https://api.audiopod.ai/api/v1/stem-extraction/modes" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs (filter by status: PENDING, PROCESSING, COMPLETED, FAILED)
curl "https://api.audiopod.ai/api/v1/stem-extraction/jobs?skip=0&limit=50&status=COMPLETED" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Get specific job
curl "https://api.audiopod.ai/api/v1/stem-extraction/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "https://api.audiopod.ai/api/v1/stem-extraction/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
响应格式
{
"id": 1234,
"status": "COMPLETED",
"download_urls": {
"vocals": "https://...",
"drums": "https://...",
"bass": "https://...",
"other": "https://..."
},
"quality_scores": {
"vocals": 0.95,
"drums": 0.88
}
}
文本转语音
使用60多种语言的50多种语音将文本转换为语音。支持语音克隆。
语音类型
- 50多种可直接用于生产的语音— 支持60多种语言,具备自动检测功能
- 自定义克隆— 仅需约5秒音频样本即可克隆任何声音
Python SDK
# Generate speech and wait for result
result = client.voice.generate(
text="Hello, world! This is a test.",
voice_id=123,
speed=1.0
)
print(result["output_url"])
# Async: submit then poll
job = client.voice.speak(
text="Hello world",
voice_id=123,
speed=1.0
)
status = client.voice.get_job(job["id"])
result = client.voice.wait_for_completion(job["id"], timeout=300)
# List all available voices
voices = client.voice.list()
for v in voices:
print(f"{v['id']}: {v['name']}")
# Clone a voice (needs ~5 sec audio sample)
new_voice = client.voice.create(
name="My Voice Clone",
audio_file="./sample.mp3",
description="Cloned from recording"
)
# Get/delete voice
voice = client.voice.get(voice_id=123)
client.voice.delete(voice_id=123)
cURL(原始HTTP — 最可靠)
# List all voices
curl "https://api.audiopod.ai/api/v1/voice/voice-profiles" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Generate speech (FORM DATA, not JSON!)
curl -X POST "https://api.audiopod.ai/api/v1/voice/voices/{VOICE_UUID}/generate" \
-H "Authorization: Bearer $AUDIOPOD_API_KEY" \
-d "input_text=Hello world, this is a test" \
-d "audio_format=mp3" \
-d "speed=1.0"
# Poll job status
curl "https://api.audiopod.ai/api/v1/voice/tts-jobs/{JOB_ID}/status" \
-H "Authorization: Bearer $AUDIOPOD_API_KEY"
# SDK-style endpoints (alternative)
# Generate via SDK endpoint
curl -X POST "https://api.audiopod.ai/api/v1/voice/tts/generate" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text":"Hello world","voice_id":123,"speed":1.0}'
# Poll via SDK endpoint
curl "https://api.audiopod.ai/api/v1/voice/tts/status/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List voices (SDK endpoint)
curl "https://api.audiopod.ai/api/v1/voice/voices" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Clone a voice
curl -X POST "https://api.audiopod.ai/api/v1/voice/voices" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "name=My Voice" \
-F "file=@sample.mp3" \
-F "description=Cloned voice"
# Delete voice
curl -X DELETE "https://api.audiopod.ai/api/v1/voice/voices/VOICE_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
生成参数
| 字段 | 必需 | 描述 |
|---|---|---|
| input_text | 是 | 要朗读的文本(最多5000个字符)。原始HTTP请求使用input_text,SDK使用textaudio_format |
| 否 | mp3、wav、ogg(默认:mp3) | speed |
| 否 | 0.25 - 4.0(默认:1.0) | language |
| 否 | no | ISO代码,如果省略则自动检测 |
响应格式
// Generate response
{"job_id": 12345, "status": "pending", "credits_reserved": 25}
// Status response (completed)
{"status": "completed", "output_url": "https://r2-url/generated.mp3"}
重要说明
- 原始HTTP生成端点使用表单数据,而非JSON。字段为
input_text而非text - SDK端点(
/api/v1/voice/tts/generate)使用JSON,其字段为text - 输出文件可能是伪装为.mp3的WAV文件——可通过
ffmpeg -i output.mp3 -c:a aac real.m4a - 进行转换
每次生成约55积分,基于钱包计费
说话人分离
通过自动语音分类按说话人分离音频。
# Diarize and wait for result
result = client.speaker.identify(
file="./meeting.mp3",
num_speakers=3, # optional hint for accuracy
timeout=600
)
for segment in result["segments"]:
print(f"Speaker {segment['speaker']}: {segment['text']} [{segment['start']:.1f}s - {segment['end']:.1f}s]")
# From URL
result = client.speaker.identify(
url="https://youtube.com/watch?v=VIDEO_ID",
num_speakers=2
)
# Async: submit then poll
job = client.speaker.diarize(
file="./meeting.mp3",
num_speakers=3
)
result = client.speaker.wait_for_completion(job["id"], timeout=600)
# Job management
jobs = client.speaker.list(skip=0, limit=50, status="COMPLETED")
job = client.speaker.get(job_id=123)
client.speaker.delete(job_id=123)
Python SDK
# Diarize from file
curl -X POST "https://api.audiopod.ai/api/v1/speaker/diarize" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "file=@meeting.mp3" \
-F "num_speakers=3"
# Diarize from URL
curl -X POST "https://api.audiopod.ai/api/v1/speaker/diarize" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=https://youtube.com/watch?v=VIDEO_ID" \
-F "num_speakers=2"
# Check job status
curl "https://api.audiopod.ai/api/v1/speaker/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "https://api.audiopod.ai/api/v1/speaker/jobs?skip=0&limit=50" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "https://api.audiopod.ai/api/v1/speaker/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
cURL
语音转文本(转录)
Python SDK
# Transcribe URL and wait
result = client.transcription.transcribe(
url="https://youtube.com/watch?v=VIDEO_ID",
speaker_diarization=True,
min_speakers=2,
max_speakers=5,
timeout=600
)
print(f"Language: {result['detected_language']}")
for seg in result["segments"]:
print(f"[{seg['start']:.1f}s] {seg.get('speaker','?')}: {seg['text']}")
# Batch: multiple URLs at once
result = client.transcription.transcribe(
urls=["https://youtube.com/watch?v=ID1", "https://youtube.com/watch?v=ID2"],
speaker_diarization=True
)
# Upload local file
job = client.transcription.upload(
file_path="./recording.mp3",
language="en",
speaker_diarization=True
)
result = client.transcription.wait_for_completion(job["id"], timeout=600)
# Async: submit then poll
job = client.transcription.create(
url="https://youtube.com/watch?v=ID",
language="en",
speaker_diarization=True,
word_timestamps=True,
min_speakers=2,
max_speakers=4
)
result = client.transcription.wait_for_completion(job["id"], timeout=600)
# Get transcript in different formats
transcript_json = client.transcription.get_transcript(job_id=123, format="json")
transcript_srt = client.transcription.get_transcript(job_id=123, format="srt")
transcript_vtt = client.transcription.get_transcript(job_id=123, format="vtt")
transcript_txt = client.transcription.get_transcript(job_id=123, format="txt")
# Job management
jobs = client.transcription.list(skip=0, limit=50, status="COMPLETED")
job = client.transcription.get(job_id=123)
client.transcription.delete(job_id=123)
cURL
# Transcribe from URL
curl -X POST "https://api.audiopod.ai/api/v1/transcribe/transcribe" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://youtube.com/watch?v=ID","enable_speaker_diarization":true,"word_timestamps":true}'
# Transcribe multiple URLs
curl -X POST "https://api.audiopod.ai/api/v1/transcribe/transcribe" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"urls":["URL1","URL2"],"enable_speaker_diarization":true}'
# Upload file for transcription
curl -X POST "https://api.audiopod.ai/api/v1/transcribe/transcribe-upload" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "files=@recording.mp3" \
-F "language=en" \
-F "enable_speaker_diarization=true"
# Get job status
curl "https://api.audiopod.ai/api/v1/transcribe/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Get transcript in specific format (json, srt, vtt, txt)
curl "https://api.audiopod.ai/api/v1/transcribe/jobs/JOB_ID/transcript?format=srt" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "https://api.audiopod.ai/api/v1/transcribe/jobs?offset=0&limit=50" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "https://api.audiopod.ai/api/v1/transcribe/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
参数
| 字段 | 必需 | 描述 |
|---|---|---|
| url / urls | 是(或 file) | 要转录的URL(支持YouTube、SoundCloud、直接链接) |
| language | 否 | ISO 639-1代码(如果省略则自动检测) |
| enable_speaker_diarization | 否 | 启用说话人识别(默认值:false) |
| min_speakers / max_speakers | 否 | 用于优化说话人分离的说话人数量提示 |
| word_timestamps | 否 | 启用词级时间戳(默认值:true) |
输出格式
- json— 包含片段、时间戳、说话人的完整结构化输出
- srt— SubRip 字幕格式
- vtt— WebVTT 字幕格式
- txt— 纯文本转录稿
降噪
从音频/视频文件中移除背景噪音。
Python SDK
# Denoise and wait for result
result = client.denoiser.denoise(file="./noisy-audio.mp3", timeout=600)
print(f"Clean audio: {result['output_url']}")
# From URL
result = client.denoiser.denoise(url="https://example.com/noisy.mp3")
# Async: submit then poll
job = client.denoiser.create(file="./noisy-audio.mp3")
result = client.denoiser.wait_for_completion(job["id"], timeout=600)
# From URL (async)
job = client.denoiser.create(url="https://example.com/noisy.mp3")
# Job management
jobs = client.denoiser.list(skip=0, limit=50, status="COMPLETED")
job = client.denoiser.get(job_id=123)
client.denoiser.delete(job_id=123)
cURL
# Denoise from file
curl -X POST "https://api.audiopod.ai/api/v1/denoiser/denoise" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "file=@noisy-audio.mp3"
# Denoise from URL
curl -X POST "https://api.audiopod.ai/api/v1/denoiser/denoise" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=https://example.com/noisy.mp3"
# Check job status
curl "https://api.audiopod.ai/api/v1/denoiser/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "https://api.audiopod.ai/api/v1/denoiser/jobs?skip=0&limit=50" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "https://api.audiopod.ai/api/v1/denoiser/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
钱包与账单
检查余额、估算费用并查看使用历史。
Python SDK
# Get current balance
balance = client.wallet.get_balance()
print(f"Balance: ${balance['balance_usd']}")
# Check if balance is sufficient for an operation
check = client.wallet.check_balance(
service_type="stem_extraction",
duration_seconds=180
)
print(f"Sufficient: {check['sufficient']}")
# Estimate cost before running
estimate = client.wallet.estimate_cost(
service_type="transcription",
duration_seconds=300
)
print(f"Cost: ${estimate['cost_usd']}")
# Get pricing for all services
pricing = client.wallet.get_pricing()
# View usage history
usage = client.wallet.get_usage(page=1, limit=50)
cURL
# Get balance
curl "https://api.audiopod.ai/api/v1/api-wallet/balance" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Check balance sufficiency
curl -X POST "https://api.audiopod.ai/api/v1/api-wallet/check-balance" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"service_type":"stem_extraction","duration_seconds":180}'
# Estimate cost
curl -X POST "https://api.audiopod.ai/api/v1/api-wallet/estimate-cost" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"service_type":"transcription","duration_seconds":300}'
# Get pricing
curl "https://api.audiopod.ai/api/v1/api-wallet/pricing" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Usage history
curl "https://api.audiopod.ai/api/v1/api-wallet/usage?page=1&limit=50" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
API 端点摘要
| 服务 | 端点 | 方法 |
|---|---|---|
| 音乐 | /api/v1/music/{task} | POST |
| 音乐任务 | /api/v1/music/jobs/{id} | GET/DELETE |
| 音乐预设 | /api/v1/music/presets | GET |
| 音轨分离 | /api/v1/stem-extraction/api/extract | POST (multipart) |
| 音轨分离状态 | /api/v1/stem-extraction/status/{id} | GET |
| 音轨分离模式 | /api/v1/stem-extraction/modes | GET |
| 音轨分离任务 | /api/v1/stem-extraction/jobs | GET |
| 文本转语音生成 | /api/v1/voice/voices/{uuid}/generate | POST (form data) |
| 文本转语音生成(SDK) | /api/v1/voice/tts/generate | POST (JSON) |
| 文本转语音状态 | /api/v1/voice/tts-jobs/{id}/status | GET |
| 文本转语音状态(SDK) | /api/v1/voice/tts/status/{id} | GET |
| 语音列表 | /api/v1/voice/voice-profiles | GET |
| 语音列表(SDK) | /api/v1/voice/voices | GET |
| 说话人 | /api/v1/speaker/diarize | POST(多部分表单) |
| 说话人任务 | /api/v1/speaker/jobs/{id} | GET/DELETE |
| 转录URL | /api/v1/transcribe/transcribe | POST(JSON) |
| 转录上传 | /api/v1/transcribe/transcribe-upload | POST(多部分表单) |
| 转录输出 | /api/v1/transcribe/jobs/{id}/transcript?format= | GET |
| 转录任务 | /api/v1/transcribe/jobs | GET |
| 降噪 | /api/v1/denoiser/denoise | POST (multipart) |
| 降噪任务 | /api/v1/denoiser/jobs/{id} | GET/DELETE |
| 钱包余额 | /api/v1/api-wallet/balance | GET |
| 钱包定价 | /api/v1/api-wallet/pricing | GET |
| 钱包使用情况 | /api/v1/api-wallet/usage | GET |
认证请求头
两种认证方式有效:
X-API-Key: ap_...— 适用于大多数端点Authorization: Bearer ap_...— 适用于 TTS generate/status
已知问题
- SDK方法签名可能与原始API不同——如有疑问,请参考cURL示例
- TTS输出文件存储在Cloudflare R2中,可通过
output_url在任务状态中下载 - TTS输出文件可能是伪装成.mp3的WAV文件——通过WhatsApp发送前请使用ffmpeg进行格式转换
文章底部电脑广告
手机广告位-内容正文底部


微信扫一扫,打赏作者吧~