🔄 卡若AI 同步 2026-03-03 14:29 | 更新:水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个
This commit is contained in:
@@ -19,7 +19,7 @@ updated: "2026-03-03"
|
||||
## ⭐ Soul派对切片流程(默认)
|
||||
|
||||
```
|
||||
原始视频 → MLX转录 → 字幕转简体 → 高光识别(当前模型/AI) → 批量切片 → soul_enhance → 输出成片
|
||||
原始视频 → MLX转录 → 字幕转简体 → 高光识别(API 优先/最佳模型,失败则 Ollama→规则) → 批量切片 → soul_enhance → 输出成片
|
||||
↑ ↓
|
||||
提取后立即繁转简+修正错误 封面+字幕(已简体)+加速10%+去语气词
|
||||
```
|
||||
@@ -53,8 +53,9 @@ eval "$(~/miniforge3/bin/conda shell.zsh hook)"
|
||||
conda activate mlx-whisper
|
||||
mlx_whisper audio.wav --model mlx-community/whisper-small-mlx --language zh --output-format all
|
||||
|
||||
# 2. 高光识别(Ollama → 规则;流水线会在读取 transcript 前自动转简体)
|
||||
# 2. 高光识别(API 优先,未配置则 Ollama → 规则;流水线会在读取 transcript 前自动转简体)
|
||||
python3 identify_highlights.py -t transcript.srt -o highlights.json -n 6
|
||||
# 需配置 OPENAI_API_KEY 或 OPENAI_API_BASES/KEYS/MODELS,默认模型 gpt-4o
|
||||
|
||||
# 3. 切片
|
||||
python3 batch_clip.py -i 视频.mp4 -l highlights.json -o clips/ -p soul
|
||||
@@ -262,7 +263,7 @@ python3 scripts/burn_subtitles_clean.py -i enhanced.mp4 -s clean.srt -o 成片.m
|
||||
| **soul_vertical_crop.py** | Soul 竖屏中段批量裁剪(横版→498×1080 去白边) | ⭐⭐⭐ |
|
||||
| **scene_detect_to_highlights.py** | 镜头/场景检测 → highlights.json(PySceneDetect,可接 batch_clip) | ⭐⭐ |
|
||||
| chapter_themes_to_highlights.py | 按章节 .md 主题提取片段(本地模型→highlights.json) | ⭐⭐⭐ |
|
||||
| identify_highlights.py | 高光识别(Ollama→规则) | ⭐⭐ |
|
||||
| identify_highlights.py | 高光识别(API 优先→Ollama→规则,默认 gpt-4o) | ⭐⭐ |
|
||||
| batch_clip.py | 批量切片 | ⭐⭐ |
|
||||
| one_video.py | 单视频一键成片 | ⭐⭐ |
|
||||
| burn_subtitles_clean.py | 字幕烧录(无阴影) | ⭐ |
|
||||
@@ -291,6 +292,14 @@ conda activate mlx-whisper
|
||||
mlx_whisper audio.wav --model mlx-community/whisper-small-mlx --language zh --output-format all
|
||||
```
|
||||
|
||||
### 高光识别模型(API 优先)
|
||||
|
||||
高光识别默认使用**当前可用最佳模型**:优先走 **OpenAI 兼容 API**(见下),未配置或失败时再用本地 Ollama,最后规则兜底。
|
||||
|
||||
- **单接口**:`OPENAI_API_BASE`、`OPENAI_API_KEY`、`OPENAI_MODEL`(默认 `gpt-4o`)。
|
||||
- **多接口故障切换**:`OPENAI_API_BASES`、`OPENAI_API_KEYS`、`OPENAI_MODELS`(逗号分隔,按顺序尝试)。
|
||||
- 不写死密钥,从环境变量读取;详见 `运营中枢/参考资料/卡若AI异常处理与红线.md` 与 API 稳定性规则。
|
||||
|
||||
### 依赖检查
|
||||
|
||||
```bash
|
||||
@@ -303,7 +312,7 @@ conda activate mlx-whisper
|
||||
python -c "import mlx_whisper; print('OK')"
|
||||
|
||||
# Python库
|
||||
pip3 list | grep -E "moviepy|Pillow|opencc"
|
||||
pip3 list | grep -E "moviepy|Pillow|opencc|openai"
|
||||
```
|
||||
|
||||
### 安装依赖
|
||||
|
||||
@@ -34,7 +34,7 @@
|
||||
```
|
||||
|
||||
- **batch_clip**:输出到 `clips/`
|
||||
- **soul_enhance -o 成片/ --vertical --title-only**:封面(优先用 question 作前3秒)+ 字幕 + **完整去语助词** + 竖屏裁剪,直接输出到 `成片/`,文件名为标题
|
||||
- **soul_enhance -o 成片/ --vertical --title-only**:**文件名 = 封面标题 = highlights 的 title**(去杠:`:|、—、/` 等替换为空格),名字与标题一致、无序号无杠;字幕烧录(随语音走动);完整去语助词;竖屏裁剪直出到 `成片/`
|
||||
|
||||
---
|
||||
|
||||
@@ -52,7 +52,7 @@
|
||||
|
||||
## 五、成片:封面 + 字幕 + 竖屏
|
||||
|
||||
- **封面**:竖屏 498×1080 内**不超出界面**;**半透明质感**(背景 alpha=165,透出底层画面);深色渐变(墨绿→绿)、左上角 Soul logo、标题文字**严格居中**且左右留白 44px,多行自动换行不裁切。透明度在 `soul_enhance.py` 中由 `VERTICAL_COVER_ALPHA` 调节(0~255)。
|
||||
- **封面**:竖屏 498×1080 内**不超出界面**;**半透明质感**(背景 alpha=165);深色渐变、左上角 Soul logo;**封面显示标题 = 成片文件名 = highlights.title**(去杠后一致,无 `:|—/`、无序号);标题文字严格居中、多行自动换行。透明度由 `VERTICAL_COVER_ALPHA` 调节。
|
||||
- **字幕**:封面结束后才显示,**居中**在竖屏内;烧录用**图像 overlay**(每张字幕图 `-loop 1` + `enable=between(t,a,b)`),若系统 FFmpeg 带 libass 可改用 SRT+subtitles 滤镜;语助词由 soul_enhance 统一清理。重新加字幕时加 `--force-burn-subs`。
|
||||
- **竖屏**:498×1080,crop 参数与 `参考资料/竖屏中段裁剪参数说明.md` 一致
|
||||
|
||||
|
||||
@@ -66,8 +66,16 @@
|
||||
|
||||
**只保留两个目录**:**切片**、**成片**。其他中间目录不保留。
|
||||
|
||||
**命名与标题统一**:成片文件名 = 封面显示标题 = `highlights.json` 的 `title`;对 title 做「去杠」(`:|、—、/` 等替换为空格),保证无序号、无多余符号,名字与标题一致。
|
||||
|
||||
---
|
||||
|
||||
## 本地处理说明(与剪映逆向分析一致)
|
||||
|
||||
使用**本地管线**处理视频,不依赖剪映二进制:MLX Whisper 转录 → 高光/时间节点(highlights.json)→ batch_clip 切片 → soul_enhance(去语助词+封面+字幕)。封面标题**不显示 123 等序号**,仅显示高光/提问文案。
|
||||
|
||||
参考:`剪映_智能剪口播与智能片段分割_逆向分析.md` 第五节「自实现建议」。
|
||||
|
||||
## 命令速查(112 场示例)
|
||||
|
||||
```bash
|
||||
@@ -76,6 +84,6 @@
|
||||
# 3. 切片
|
||||
python3 batch_clip.py -i "原视频.mp4" -l highlights.json -o 切片/ -p soul112
|
||||
|
||||
# 4~5. 成片(去语助词+封面+字幕)
|
||||
# 4~5. 成片(去语助词+封面+字幕,覆盖原成片)
|
||||
python3 soul_enhance.py -c 切片/ -l highlights.json -t transcript.srt -o 成片/ --vertical --title-only --force-burn-subs
|
||||
```
|
||||
|
||||
@@ -7,6 +7,7 @@
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
@@ -59,9 +60,20 @@ def _is_mostly_chinese(text: str) -> bool:
|
||||
return chinese / max(1, len(text.strip())) > 0.3
|
||||
|
||||
|
||||
def _title_no_slash(s: str) -> str:
|
||||
"""标题去杠::|、—、/ 等替换为空格,与 soul_enhance 一致"""
|
||||
if not s:
|
||||
return s
|
||||
s = str(s).strip()
|
||||
for c in "::||—--/、":
|
||||
s = s.replace(c, " ")
|
||||
s = re.sub(r"\s+", " ", s).strip()
|
||||
return s
|
||||
|
||||
|
||||
def sanitize_filename(name: str, max_length: int = 50, chinese_only: bool = True) -> str:
|
||||
"""清理文件名,统一简体中文;若含英文则仅保留中文部分"""
|
||||
name = _to_simplified(str(name))
|
||||
"""清理文件名,先标题去杠,再仅保留中文、空格、_-"""
|
||||
name = _title_no_slash(name) or _to_simplified(str(name))
|
||||
safe_chars = []
|
||||
for c in name:
|
||||
if c in " _-" or "\u4e00" <= c <= "\u9fff":
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
高光识别 - AI 分析视频文字稿,输出高光片段 JSON
|
||||
级联:Ollama(卡若AI本地) → 规则备用
|
||||
只用已有能力,不依赖 Gemini/Groq
|
||||
级联:API 优先(当前可用最佳模型)→ Ollama 本地 → 规则备用
|
||||
API 使用 OPENAI_API_BASE/KEY/MODEL 或 OPENAI_API_BASES/KEYS/MODELS(逗号分隔)故障切换。
|
||||
"""
|
||||
import argparse
|
||||
import json
|
||||
@@ -17,6 +17,8 @@ DEFAULT_CTA = "关注我,每天学一招私域干货"
|
||||
CLIP_COUNT = 15
|
||||
MIN_DURATION = 60 # 最少 1 分钟
|
||||
MAX_DURATION = 300 # 最多 5 分钟
|
||||
# API 默认模型:优先用当前可用最佳(可被 OPENAI_MODEL / OPENAI_MODELS 覆盖)
|
||||
DEFAULT_API_MODEL = "gpt-4o"
|
||||
|
||||
|
||||
def parse_srt_segments(srt_path: str) -> list:
|
||||
@@ -250,6 +252,62 @@ def _ensure_chinese_highlights(data: list) -> list:
|
||||
OLLAMA_MODELS = ["qwen2.5:3b", "qwen2.5:1.5b"] # 优先 3b,能力更强
|
||||
|
||||
|
||||
def _split_csv(s: str) -> list:
|
||||
return [x.strip() for x in (s or "").split(",") if x.strip()]
|
||||
|
||||
|
||||
def _build_api_provider_queue() -> list:
|
||||
"""
|
||||
构建 API 接口队列:OPENAI_API_BASES/KEYS/MODELS 或单接口 OPENAI_API_BASE/KEY/MODEL。
|
||||
返回 [{"base_url", "api_key", "model"}, ...],无配置时返回空列表。
|
||||
"""
|
||||
bases = _split_csv(os.environ.get("OPENAI_API_BASES", ""))
|
||||
keys = _split_csv(os.environ.get("OPENAI_API_KEYS", ""))
|
||||
models = _split_csv(os.environ.get("OPENAI_MODELS", ""))
|
||||
single_base = (os.environ.get("OPENAI_API_BASE") or "https://api.openai.com/v1").strip()
|
||||
single_key = (os.environ.get("OPENAI_API_KEY") or "").strip()
|
||||
single_model = (os.environ.get("OPENAI_MODEL") or DEFAULT_API_MODEL).strip() or DEFAULT_API_MODEL
|
||||
queue = []
|
||||
if bases:
|
||||
for i, b in enumerate(bases):
|
||||
key = keys[i] if i < len(keys) and keys[i] else single_key
|
||||
model = models[i] if i < len(models) and models[i] else single_model
|
||||
if b and key:
|
||||
queue.append({"base_url": b.rstrip("/"), "api_key": key, "model": model})
|
||||
elif single_key:
|
||||
queue.append({"base_url": single_base.rstrip("/"), "api_key": single_key, "model": single_model})
|
||||
return queue
|
||||
|
||||
|
||||
def call_openai_api(transcript: str, clip_count: int, provider: dict) -> str:
|
||||
"""调用 OpenAI 兼容 API(Chat Completion),使用指定 base_url / api_key / model。"""
|
||||
try:
|
||||
from openai import OpenAI
|
||||
except ImportError:
|
||||
raise RuntimeError("未安装 openai 库,请执行: pip install openai")
|
||||
prompt = _build_prompt(transcript, clip_count)
|
||||
system = (
|
||||
"你是短视频策划师。用户会提供视频文字稿,你只输出一个 JSON 数组。"
|
||||
"若某片段内有人提问(观众/连麦者问的问题),必须提取提问原文填 question,且 hook_3sec 用该提问(前3秒先展示提问再回答);无提问则 hook_3sec 用金句/悬念。"
|
||||
"格式含 title, start_time, end_time, hook_3sec, cta_ending, transcript_excerpt, reason;有提问时加 question。"
|
||||
"禁止输出任何非 JSON 内容。"
|
||||
)
|
||||
client = OpenAI(api_key=provider["api_key"], base_url=provider["base_url"])
|
||||
resp = client.chat.completions.create(
|
||||
model=provider["model"],
|
||||
messages=[
|
||||
{"role": "system", "content": system},
|
||||
{"role": "user", "content": prompt},
|
||||
],
|
||||
temperature=0.2,
|
||||
max_tokens=8192,
|
||||
)
|
||||
content = (resp.choices[0].message.content or "").strip()
|
||||
if not content:
|
||||
raise RuntimeError("API 返回空内容")
|
||||
return content
|
||||
|
||||
|
||||
def call_ollama(transcript: str, clip_count: int = CLIP_COUNT, model: str = "qwen2.5:3b") -> str:
|
||||
"""调用卡若AI本地模型(Ollama),使用 chat 接口避免对话式误判"""
|
||||
import requests
|
||||
@@ -301,23 +359,47 @@ def main():
|
||||
if len(text) < 100:
|
||||
print("❌ 文字稿过短,请检查 SRT 格式", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
# 级联:Ollama 3b → 1.5b → 规则备用(--require-ai 时不用规则)
|
||||
# 级联:API 优先(当前可用最佳模型)→ Ollama → 规则备用(--require-ai 时不用规则)
|
||||
data = None
|
||||
raw = ""
|
||||
for model in OLLAMA_MODELS:
|
||||
api_queue = _build_api_provider_queue()
|
||||
for provider in api_queue:
|
||||
try:
|
||||
print(f"正在调用 Ollama {model} 分析高光片段...")
|
||||
raw = call_ollama(text, args.clips, model)
|
||||
print(f"正在调用 API {provider.get('model', '?')} 分析高光片段...")
|
||||
raw = call_openai_api(text, args.clips, provider)
|
||||
if not raw:
|
||||
raise ValueError("模型返回空")
|
||||
raise ValueError("API 返回空")
|
||||
data = _parse_ai_json(raw)
|
||||
if data and isinstance(data, list) and len(data) > 0:
|
||||
print(f" ✓ {model} 成功,识别 {len(data)} 段")
|
||||
print(f" ✓ API ({provider.get('model', '?')}) 成功,识别 {len(data)} 段")
|
||||
break
|
||||
except Exception as e:
|
||||
print(f" {model} 失败: {e}", file=sys.stderr)
|
||||
print(f" API ({provider.get('model', '?')}) 失败: {e}", file=sys.stderr)
|
||||
if raw:
|
||||
print(f" 返回预览: {str(raw)[:400]}...", file=sys.stderr)
|
||||
data = None
|
||||
if (not data or not isinstance(data, list)) and not api_queue:
|
||||
pass # 未配置 API,继续尝试 Ollama
|
||||
elif data and isinstance(data, list) and len(data) > 0:
|
||||
pass # API 已成功,保持 data
|
||||
else:
|
||||
data = None
|
||||
if not data or not isinstance(data, list):
|
||||
for model in OLLAMA_MODELS:
|
||||
try:
|
||||
print(f"正在调用 Ollama {model} 分析高光片段...")
|
||||
raw = call_ollama(text, args.clips, model)
|
||||
if not raw:
|
||||
raise ValueError("模型返回空")
|
||||
data = _parse_ai_json(raw)
|
||||
if data and isinstance(data, list) and len(data) > 0:
|
||||
print(f" ✓ {model} 成功,识别 {len(data)} 段")
|
||||
break
|
||||
except Exception as e:
|
||||
print(f" {model} 失败: {e}", file=sys.stderr)
|
||||
if raw:
|
||||
print(f" 返回预览: {str(raw)[:400]}...", file=sys.stderr)
|
||||
data = None
|
||||
if not data or not isinstance(data, list):
|
||||
if getattr(args, "require_ai", False):
|
||||
print("❌ 必须用 AI 识别,当前无可用模型或解析失败", file=sys.stderr)
|
||||
|
||||
@@ -183,14 +183,26 @@ def draw_text_with_outline(draw, pos, text, font, color, outline_color, outline_
|
||||
# 主体
|
||||
draw.text((x, y), text, font=font, fill=color)
|
||||
|
||||
def _normalize_title_for_display(title: str) -> str:
|
||||
"""标题去杠、更清晰:将 :|、—、/ 等替换为空格"""
|
||||
if not title:
|
||||
return ""
|
||||
s = _to_simplified(str(title).strip())
|
||||
for char in "::||—--/、":
|
||||
s = s.replace(char, " ")
|
||||
s = re.sub(r"\s+", " ", s).strip()
|
||||
return s
|
||||
|
||||
|
||||
def sanitize_filename(name: str, max_length: int = 50) -> str:
|
||||
"""成片文件名:仅保留中文、空格、_-,与 batch_clip 一致"""
|
||||
name = _to_simplified(str(name))
|
||||
"""成片文件名:先标题去杠,再仅保留中文、空格、_-"""
|
||||
name = _normalize_title_for_display(name) or _to_simplified(str(name))
|
||||
safe = []
|
||||
for c in name:
|
||||
if c in " _-" or "\u4e00" <= c <= "\u9fff":
|
||||
safe.append(c)
|
||||
result = "".join(safe).strip()
|
||||
result = re.sub(r"\s+", " ", result).strip()
|
||||
if len(result) > max_length:
|
||||
result = result[:max_length]
|
||||
return result.strip(" _-") or "片段"
|
||||
@@ -416,9 +428,19 @@ def _draw_vertical_gradient(draw, width, height, top_rgb, bottom_rgb, alpha=255)
|
||||
draw.rectangle([0, y, width, y + 1], fill=(r, g, b, alpha))
|
||||
|
||||
|
||||
def _strip_cover_number_prefix(text):
|
||||
"""封面标题不显示序号:去掉开头的 1. 2. 01、切片1、123 等"""
|
||||
if not text:
|
||||
return text
|
||||
text = re.sub(r'^\s*切片\s*\d+\s*[\.\s、::]*\s*', '', text)
|
||||
text = re.sub(r'^\s*\d+[\.\s、::]*\s*', '', text)
|
||||
return text.strip()
|
||||
|
||||
|
||||
def create_cover_image(hook_text, width, height, output_path, video_path=None):
|
||||
"""创建封面贴片。竖屏 498x1080 时:高级渐变背景、文字严格在界面内居中不超出、左上角 Soul logo。"""
|
||||
"""创建封面贴片。竖屏 498x1080 时:高级渐变背景、文字严格在界面内居中不超出、左上角 Soul logo;封面不显示 123 等序号。"""
|
||||
hook_text = _to_simplified(str(hook_text or "").strip())
|
||||
hook_text = _strip_cover_number_prefix(hook_text)
|
||||
if not hook_text:
|
||||
hook_text = "精彩切片"
|
||||
style = STYLE['cover']
|
||||
@@ -725,12 +747,13 @@ def enhance_clip(clip_path, output_path, highlight_info, temp_dir, transcript_pa
|
||||
|
||||
print(f" 分辨率: {width}x{height}, 时长: {duration:.1f}秒")
|
||||
|
||||
# 前3秒优先用「提问问题」:有 question 则封面/前贴先展示提问,再播回答
|
||||
hook_text = highlight_info.get('question') or highlight_info.get('hook_3sec') or highlight_info.get('title') or ''
|
||||
if not hook_text and clip_path:
|
||||
# 封面与成片文件名统一:都用主题 title(去杠),名字与标题一致、无杠更清晰
|
||||
raw_title = highlight_info.get('title') or highlight_info.get('hook_3sec') or ''
|
||||
if not raw_title and clip_path:
|
||||
m = re.search(r'\d+[_\s]+(.+?)(?:_enhanced)?\.mp4$', os.path.basename(clip_path))
|
||||
if m:
|
||||
hook_text = m.group(1).strip()
|
||||
raw_title = m.group(1).strip()
|
||||
hook_text = _normalize_title_for_display(raw_title) or raw_title or '精彩切片'
|
||||
cover_duration = STYLE['cover']['duration']
|
||||
|
||||
# 竖屏成片:封面/字幕按 498x1080 做,叠在裁切区域,文字与字幕在竖屏上完整且居中
|
||||
|
||||
Reference in New Issue
Block a user