🔄 卡若AI 同步 2026-02-27 10:53 | 更新:Cursor规则、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个
This commit is contained in:
@@ -29,7 +29,7 @@ alwaysApply: true
|
||||
- **不匹配** → 回溯 → 搜索(GitHub、Skill、网上)找方案 → 再思考(简洁输出)→ 再执行 → 再验证,直到成功或明确说明无法达成。
|
||||
|
||||
### 第三步:回复形式 = 强制复盘(卡若AI 内所有对话必守)
|
||||
- **在卡若AI 工作台内,所有对话的 AI 回复一律采用「复盘形式」。** 任何 AI、任何子角色,均须按复盘结构组织回复。
|
||||
- **卡若AI 内所有对话的 AI 回复一律采用「复盘形式」——此为强制行为,无例外。** 任何 AI、任何子角色、任何场景,均须按复盘结构组织回复。
|
||||
- **多轮对话**:每一轮回复结尾建议带**简版复盘**(🎯 目标·结果·达成率 + ▶ 下一步,各一行)。
|
||||
- **对话结束**:最终回复**必须**以**完整复盘**收尾,按 `运营中枢/参考资料/卡若复盘格式_固定规则.md`:
|
||||
- 复盘块标题**必须带日期+时间**(YYYY-MM-DD HH:mm);
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# 《一场soul的创业实验》第9章写作规范
|
||||
# 《一场soul的创业实验》Soul 写作规范
|
||||
|
||||
> 本规范归属 **Soul创业实验** Skill → 子类「写作」。写第9章单场文章时以本文件 + 主 SKILL 内写作要点为准。
|
||||
> 本规范归属 **Soul创业实验** Skill → 子类「写作」。**第9章单场文章、第4章等 Soul 相关小节**,一律按本文件执行。以后写 Soul 相关文章都用这套。
|
||||
|
||||
---
|
||||
|
||||
|
||||
62
02_卡人(水)/水桥_平台对接/智能纪要/output/第110场_智能纪要_20260226.md
Normal file
62
02_卡人(水)/水桥_平台对接/智能纪要/output/第110场_智能纪要_20260226.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# Soul 第110场 智能纪要
|
||||
|
||||
**场次**:第110场|Soul变现逻辑全程公开
|
||||
**日期**:2026年2月26日
|
||||
**时长**:144 分钟
|
||||
**进房人数**:406 人
|
||||
|
||||
---
|
||||
|
||||
## 关键词
|
||||
|
||||
Soul变现、切片、副业、小程序2.0、资源群、会员、AI赋能、分布式矩阵
|
||||
|
||||
---
|
||||
|
||||
## 一、核心内容
|
||||
|
||||
### Soul 变现逻辑(四个月跑通)
|
||||
|
||||
- **前期给钱**:召集切片、召集干活,有动作就给钱;发瞬间、有客资、项目卖出都给钱;内容生产(写总结、发优秀分享)给钱;做副业联系管理,有电脑会剪辑可问。
|
||||
- **省钱**:Soul 供应链合作,服务器+租金一年省二十多万;做水的定一年、金融公司也定一年;消费本质是省钱。
|
||||
- **收钱形式**:资源群 93 个老板,有分享、介绍才进;小程序写项目、推广收钱;项目分润、会员群;主营赋能(电竞筛人配人跑通,月利润一二十万)。
|
||||
|
||||
### 数据成果
|
||||
|
||||
- 四个月:做副业 40 多人,管理 7 个,老板群 93 个,会员若干。
|
||||
- 每月派对去重 1 万出头,人均停留 10~15 分钟;30 多切片每月约 2000 万播放。
|
||||
|
||||
### 小程序 2.0 与分享收益
|
||||
|
||||
- 分享文章别人付 1 元,90% 归分享者,5% 腾讯,这边 1 点,管理 4 点。
|
||||
- 开派对推当天文章,付费与分享者 30 天捆绑;小程序日活几千。
|
||||
|
||||
### 场上问答要点
|
||||
|
||||
- **3 号**:Soul 来回转化→变现逻辑就是给钱、省钱、收钱;2.0 已上,开派对打底半年。
|
||||
- **4 号工厂电商**:白胚千款,AI 出图、豆包搜工具、前三款试付。
|
||||
- **5 号室内设计**:围绕原有能力转短视频、AI 出图,比跨赛道容易。
|
||||
- **6 号 10 个抖音号 AI 分发**:同 IP 多号必出问题;分布式矩阵、租号产业链(普通 100/月,好点 200,有粉丝 1000)。
|
||||
- **7 号太阳能卖电**:找算力公司、算力需求企业赚差价,本质是销售。
|
||||
- **8 号博士/大学老师**:两校 17000 学生,AI 漫剧课,对接红果;飞飞时间模式、学校做成大机会。
|
||||
|
||||
---
|
||||
|
||||
## 二、金句
|
||||
|
||||
- 挣钱这件事,能摊开讲的都不算秘密。
|
||||
- 开派对就挣钱。
|
||||
- 消费的本质是省钱。
|
||||
- 没有直接收益,创造价值就有收益。
|
||||
- 你原来做什么就用 AI 赋能,提升效率。
|
||||
- 付费圈子好一点,免费的里面都是不懂的,聊不出啥。
|
||||
- 能坚持的事,一定是聊天就能解决的。
|
||||
|
||||
---
|
||||
|
||||
## 三、下一步
|
||||
|
||||
- 做副业、做切片 → 联系管理(@管理、@南风)
|
||||
- 想了解 AI → 进付费圈子
|
||||
- 小程序 2.0 已上线,3 月迭代 3.0(109 场内容向量化、AI 智能体)
|
||||
- 有事想对接、想做副业或切片,找管理
|
||||
0
02_卡人(水)/水桥_平台对接/飞书管理/脚本/.added_wanzhi_20h
Normal file
0
02_卡人(水)/水桥_平台对接/飞书管理/脚本/.added_wanzhi_20h
Normal file
45
02_卡人(水)/水桥_平台对接/飞书管理/脚本/add_calendar_wanzhi_moment.sh
Executable file
45
02_卡人(水)/水桥_平台对接/飞书管理/脚本/add_calendar_wanzhi_moment.sh
Executable file
@@ -0,0 +1,45 @@
|
||||
#!/bin/bash
|
||||
# 在本机 Mac 日历中新增:每天 20:00 写一篇玩值电竞朋友圈(重复事件)
|
||||
# 执行一次即可,事件会按日重复
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
LOG_MARKER="$SCRIPT_DIR/.added_wanzhi_20h"
|
||||
|
||||
# 避免重复添加
|
||||
if [ -f "$LOG_MARKER" ]; then
|
||||
echo "✅ 玩值电竞朋友圈 20:00 重复事件已存在(见日历),跳过"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# 今晚 20:00 起,每天重复
|
||||
osascript <<'APPLESCRIPT'
|
||||
set eventTitle to "写一篇玩值电竞朋友圈"
|
||||
set eventSummary to "玩值电竞 · 每日朋友圈"
|
||||
-- 从今天 20:00 开始,持续 30 分钟
|
||||
set today to current date
|
||||
set hours of today to 20
|
||||
set minutes of today to 0
|
||||
set startDate to today
|
||||
set endDate to startDate + (30 * minutes)
|
||||
|
||||
tell application "Calendar"
|
||||
-- 使用第一个可写日历(通常是「工作」或主日历)
|
||||
set calList to (every calendar whose writable is true)
|
||||
if (count of calList) is 0 then
|
||||
set calList to calendars
|
||||
end if
|
||||
if (count of calList) is 0 then
|
||||
return "未找到日历"
|
||||
end if
|
||||
set targetCal to item 1 of calList
|
||||
make new event at end of events of targetCal with properties {summary:eventSummary, description:eventTitle, start date:startDate, end date:endDate, recurrence:"FREQ=DAILY"}
|
||||
return "已添加"
|
||||
end tell
|
||||
APPLESCRIPT
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
touch "$LOG_MARKER"
|
||||
echo "✅ 已在本机日历添加重复事件:每天 20:00 写一篇玩值电竞朋友圈"
|
||||
else
|
||||
echo "⚠️ 添加失败,请在本机「日历」中手动新建:每天 20:00,重复每日,标题「玩值电竞 · 每日朋友圈」"
|
||||
fi
|
||||
@@ -153,27 +153,25 @@ def get_today_tasks():
|
||||
today = datetime.now()
|
||||
date_str = f"{today.month}月{today.day}日"
|
||||
|
||||
# 每日固定项(2026-02-27):一人公司第一、玩值电竞第二
|
||||
# 每日固定项:开发<20%,侧重事务与方向;每晚20:00玩值电竞朋友圈已入本机日历
|
||||
tasks = [
|
||||
{
|
||||
"person": "卡若",
|
||||
"events": ["一人公司Agent", "玩值电竞", "重要未完成", "飞书日志迭代"],
|
||||
"events": ["一人公司", "玩值电竞", "事务与方向", "飞书日志"],
|
||||
"quadrant": "重要紧急",
|
||||
"t_targets": [
|
||||
"一人公司Agent→视频切片/文章/直播/小程序/朋友圈/聚合 (5%)",
|
||||
"玩值电竞→Docker部署与功能推进 (第二) (25%)",
|
||||
"卡若AI 4项优化→减重+收口+规则+输出+护栏 (执行中)",
|
||||
"飞书日志→每日迭代+进度百分比更新 (100%)",
|
||||
"一人公司→分发聚合 (5%)",
|
||||
"玩值电竞→Docker/功能 (25%);每晚20:00朋友圈→本机日历重复",
|
||||
"飞书日志→每日迭代 (100%)",
|
||||
],
|
||||
"n_process": [
|
||||
"【一人公司】视频切片分发、文章全网、每日直播、小程序、朋友圈→聚合平台",
|
||||
"【玩值电竞】Docker 3001,MongoDB wanzhi_esports,持续迭代",
|
||||
"【重要未完成】卡若AI优化、书小程序、玉宁直播等持续迭代",
|
||||
"【日志】每日更新前日进度与完成度,未完成项持续迭代",
|
||||
"【事务】导出与婼瑄导出见 卡若Ai的文件夹/执行日志",
|
||||
"【方向】一人公司第一、玩值电竞第二;开发内容控在20%内",
|
||||
"【日志】每日更新前日进度与完成度",
|
||||
],
|
||||
"t_thoughts": ["一人公司第一、玩值电竞第二;重要未完成项加入每日迭代"],
|
||||
"w_work": ["一人公司Agent", "玩值电竞", "卡若AI优化", "飞书日志登记"],
|
||||
"f_feedback": ["一人公司→立项 5% 🔄", "玩值电竞→进行中 25% 🔄", "卡若AI优化→执行中 🔄", "日志→每日迭代 100% ✅"]
|
||||
"t_thoughts": ["日志以事务与未来为主,开发仅提要"],
|
||||
"w_work": ["一人公司", "玩值电竞", "飞书日志", "导出/婼瑄日志"],
|
||||
"f_feedback": ["一人公司 5% 🔄", "玩值电竞 25% 🔄", "日志 100% ✅"]
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
@@ -17,33 +17,30 @@ from auto_log import get_token_silent, write_log, open_result, resolve_wiki_toke
|
||||
|
||||
|
||||
def build_tasks_0227():
|
||||
"""2月27日任务:一人公司(第一)+玩值电竞(第二)+重要未完成+交流汇总"""
|
||||
"""2月27日:开发<20%,侧重事务与未来方向;导出/婼瑄已做日志"""
|
||||
return [
|
||||
{
|
||||
"person": "卡若",
|
||||
"events": ["一人公司Agent", "玩值电竞", "重要未完成项", "飞书日志迭代"],
|
||||
"events": ["一人公司", "玩值电竞", "事务与方向", "飞书日志"],
|
||||
"quadrant": "重要紧急",
|
||||
"t_targets": [
|
||||
"一人公司Agent→视频切片/文章/直播/小程序/朋友圈/聚合 (5%) 【自2月17日迭代】",
|
||||
"玩值电竞→Docker部署与功能推进 (第二) (25%) 【自2月17日迭代】",
|
||||
"卡若AI 4项优化→减重+收口+规则+输出+护栏 (执行中)",
|
||||
"飞书日志→每日迭代+进度百分比更新 (100%)",
|
||||
"一人公司→分发聚合 (5%) 自2月17日",
|
||||
"玩值电竞→Docker/功能 (25%) 自2月17日;每晚20:00朋友圈已入日历",
|
||||
"飞书日志→每日迭代 (100%)",
|
||||
],
|
||||
"n_process": [
|
||||
"【一人公司】视频切片分发、文章全网、每日直播、小程序、朋友圈→聚合平台",
|
||||
"【玩值电竞】Docker 3001,MongoDB wanzhi_esports,持续迭代",
|
||||
"【重要未完成】卡若AI优化、书小程序、玉宁直播等 自2月17日持续迭代",
|
||||
"【昨日2月26】卡若AI 56%、一场创业实验→永平、GitHub yongpxu-soul",
|
||||
"【事务】导出与婼瑄导出已汇总→执行日志/2026-02-27_导出与婼瑄导出汇总.md",
|
||||
"【方向】一人公司第一、玩值电竞第二;开发内容控在20%内",
|
||||
"【昨日2月26】卡若AI 56%、创业实验→永平 yongpxu-soul",
|
||||
],
|
||||
"t_thoughts": [
|
||||
"一人公司第一、玩值电竞第二;未完成进度自2月17日迭代至今日",
|
||||
"日志以事务与未来为主,开发仅提要;日历已加每天20:00玩值电竞朋友圈",
|
||||
],
|
||||
"w_work": ["一人公司Agent", "玩值电竞", "卡若AI优化", "飞书日志登记"],
|
||||
"w_work": ["一人公司", "玩值电竞", "飞书日志", "导出/婼瑄日志"],
|
||||
"f_feedback": [
|
||||
"一人公司→立项 5% 🔄",
|
||||
"玩值电竞→进行中 25% 🔄",
|
||||
"卡若AI优化→执行中 🔄",
|
||||
"每日交流→飞书日志、一人公司Agent、玩值电竞",
|
||||
"一人公司 5% 🔄",
|
||||
"玩值电竞 25% 🔄;20:00朋友圈→本机日历重复",
|
||||
"导出与婼瑄汇总 ✅",
|
||||
],
|
||||
}
|
||||
]
|
||||
|
||||
@@ -12,6 +12,8 @@ updated: "2026-02-17"
|
||||
|
||||
> **语言**:所有文档、字幕、封面文案统一使用**简体中文**。soul_enhance 自动繁转简。
|
||||
|
||||
> **Soul 视频输出**:Soul 剪辑的成片统一导出到 `/Users/karuo/Movies/soul视频/最终版/`,原视频在 `原视频/`,中间产物在 `其他/`。
|
||||
|
||||
> **联动规则**:每次执行视频切片时,自动检查是否需要「切片动效包装」。若用户提到片头/片尾/程序化包装/批量封面,则联动调用 `切片动效包装/10秒视频` 模板渲染,再与切片合成。
|
||||
|
||||
## ⭐ Soul派对切片流程(默认)
|
||||
|
||||
253
03_卡木(木)/木叶_视频内容/视频切片/脚本/remove_filler_segments.py
Normal file
253
03_卡木(木)/木叶_视频内容/视频切片/脚本/remove_filler_segments.py
Normal file
@@ -0,0 +1,253 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
从视频中删除「嗯」等语助词的语音段落
|
||||
流程:提取音频 → 转录 → 识别纯语助词时段 → ffmpeg 裁剪掉这些段落 → 输出新视频
|
||||
"""
|
||||
import argparse
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
# 纯语助词(整句只有这些时,整段删除)
|
||||
FILLER_ONLY = [
|
||||
'嗯', '啊', '呃', '额', '哦', '噢', '唉', '哎', '诶', '喔',
|
||||
'嗯嗯', '啊啊', '呃呃', '嗯啊', '啊嗯',
|
||||
]
|
||||
|
||||
|
||||
def parse_srt_all(srt_path: str) -> list:
|
||||
"""解析 SRT,返回所有段落(含短句)"""
|
||||
with open(srt_path, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
segments = []
|
||||
pattern = r"(\d+)\n(\d{2}):(\d{2}):(\d{2}),(\d{3}) --> (\d{2}):(\d{2}):(\d{2}),(\d{3})\n(.*?)(?=\n\n|\Z)"
|
||||
for m in re.findall(pattern, content, re.DOTALL):
|
||||
sh, sm, ss = int(m[1]), int(m[2]), int(m[3])
|
||||
eh, em, es = int(m[5]), int(m[6]), int(m[7])
|
||||
start_sec = sh * 3600 + sm * 60 + ss + int(m[4]) / 1000
|
||||
end_sec = eh * 3600 + em * 60 + es + int(m[8]) / 1000
|
||||
text = m[9].strip().replace("\n", " ").strip()
|
||||
segments.append({
|
||||
"start_sec": start_sec,
|
||||
"end_sec": end_sec,
|
||||
"text": text,
|
||||
})
|
||||
return segments
|
||||
|
||||
|
||||
def is_filler_only(text: str) -> bool:
|
||||
"""判断是否为纯语助词(含标点变体:嗯、嗯。嗯, 等)"""
|
||||
t = re.sub(r"[\s,。、,.\-—…]+", "", text.strip())
|
||||
if not t:
|
||||
return True
|
||||
for f in FILLER_ONLY:
|
||||
if t == f:
|
||||
return True
|
||||
# 连续重复的嗯、啊等
|
||||
if re.match(r"^[嗯啊呃噢哦]+$", t):
|
||||
return True
|
||||
# 去掉开头的语助词后,剩余极短(≤2字)则视为 filler
|
||||
rest = re.sub(r"^[嗯啊呃噢哦]+", "", t)
|
||||
if len(rest) <= 2 and not rest.isalnum():
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def parse_time(s: str) -> float:
|
||||
"""解析时间:支持 123.45 或 00:01:23.45 或 01:23"""
|
||||
s = s.strip()
|
||||
if ":" in s:
|
||||
parts = s.split(":")
|
||||
if len(parts) == 3:
|
||||
h, m, sec = float(parts[0]), float(parts[1]), float(parts[2])
|
||||
return h * 3600 + m * 60 + sec
|
||||
elif len(parts) == 2:
|
||||
return float(parts[0]) * 60 + float(parts[1])
|
||||
return float(s)
|
||||
|
||||
|
||||
def parse_remove_list(path: str) -> list:
|
||||
"""解析手动删除列表,返回 [(start_sec, end_sec), ...]"""
|
||||
out = []
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
parts = line.split()
|
||||
if len(parts) >= 2:
|
||||
out.append((parse_time(parts[0]), parse_time(parts[1])))
|
||||
return out
|
||||
|
||||
|
||||
def build_keep_ranges(segments: list, total_duration: float, remove_ranges_override=None) -> list:
|
||||
"""根据要删除的段落,构建保留的时间区间 [(start, end), ...]"""
|
||||
if remove_ranges_override is not None:
|
||||
remove_ranges = remove_ranges_override
|
||||
else:
|
||||
remove_ranges = [
|
||||
(s["start_sec"], s["end_sec"])
|
||||
for s in segments
|
||||
if is_filler_only(s["text"])
|
||||
]
|
||||
if not remove_ranges:
|
||||
return [(0, total_duration)]
|
||||
|
||||
remove_ranges.sort(key=lambda x: x[0])
|
||||
keep = []
|
||||
current = 0.0
|
||||
for rs, re in remove_ranges:
|
||||
if rs > current + 0.1: # 保留 [current, rs)
|
||||
keep.append((current, rs))
|
||||
current = max(current, re)
|
||||
if current < total_duration - 0.1:
|
||||
keep.append((current, total_duration))
|
||||
return keep
|
||||
|
||||
|
||||
def run_ffmpeg(args: list) -> bool:
|
||||
r = subprocess.run(args, capture_output=True, text=True)
|
||||
return r.returncode == 0
|
||||
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description="删除视频中「嗯」等语助词的语音段落")
|
||||
ap.add_argument("video", help="输入视频路径")
|
||||
ap.add_argument("-o", "--output", help="输出路径(默认:原文件名_去嗯.mp4)")
|
||||
ap.add_argument("--transcript", "-t", help="已有 transcript.srt(若不提供则先转录)")
|
||||
ap.add_argument("--dry-run", action="store_true", help="仅打印要删除的段落,不处理视频")
|
||||
ap.add_argument("--debug", action="store_true", help="打印所有含「嗯」的段落便于调试")
|
||||
ap.add_argument("--save-transcript", help="保存转录 SRT 到指定路径便于检查")
|
||||
ap.add_argument("--remove-list", metavar="FILE", help="手动指定要删除的时间段文件,每行: 开始秒 结束秒 或 00:01:23 00:01:25")
|
||||
args = ap.parse_args()
|
||||
|
||||
video_path = Path(args.video).resolve()
|
||||
if not video_path.exists():
|
||||
print(f"❌ 视频不存在: {video_path}")
|
||||
return 1
|
||||
|
||||
output_path = Path(args.output) if args.output else video_path.parent / f"{video_path.stem}_去嗯.mp4"
|
||||
|
||||
# 1. 获取视频时长
|
||||
r = subprocess.run(
|
||||
["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", str(video_path)],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
total_duration = float(r.stdout.strip()) if r.returncode == 0 else 0
|
||||
if total_duration <= 0:
|
||||
print("❌ 无法获取视频时长")
|
||||
return 1
|
||||
|
||||
transcript_path = Path(args.transcript) if args.transcript else None
|
||||
|
||||
# 2. 若指定了 --remove-list,直接使用,跳过转录
|
||||
remove_list_ranges = None
|
||||
if args.remove_list:
|
||||
rlp = Path(args.remove_list)
|
||||
if rlp.exists():
|
||||
remove_list_ranges = parse_remove_list(str(rlp))
|
||||
print(f"从 {args.remove_list} 读取 {len(remove_list_ranges)} 个待删除时间段")
|
||||
else:
|
||||
print(f"❌ --remove-list 文件不存在: {args.remove_list}")
|
||||
return 1
|
||||
|
||||
# 3. 若无 transcript 且无 remove-list,则转录
|
||||
if (not transcript_path or not transcript_path.exists()) and not remove_list_ranges:
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpp = Path(tmpdir)
|
||||
audio_path = tmpp / "audio.wav"
|
||||
print("提取音频...")
|
||||
if not run_ffmpeg([
|
||||
"ffmpeg", "-y", "-i", str(video_path), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", str(audio_path)
|
||||
]):
|
||||
print("❌ 提取音频失败")
|
||||
return 1
|
||||
transcript_path = tmpp / "transcript.srt"
|
||||
print("MLX Whisper 转录(需 conda mlx-whisper)...")
|
||||
r = subprocess.run([
|
||||
"mlx_whisper", str(audio_path),
|
||||
"--model", "mlx-community/whisper-small-mlx",
|
||||
"--language", "zh",
|
||||
"--output-format", "srt",
|
||||
"--output-name", "transcript"
|
||||
], capture_output=True, text=True, cwd=tmpdir)
|
||||
if r.returncode != 0:
|
||||
print("❌ 转录失败,请先: conda activate mlx-whisper")
|
||||
print(r.stderr[:500] if r.stderr else "")
|
||||
return 1
|
||||
if args.save_transcript:
|
||||
import shutil
|
||||
shutil.copy(str(transcript_path), args.save_transcript)
|
||||
print(f" 转录已保存: {args.save_transcript}")
|
||||
segments = parse_srt_all(str(transcript_path))
|
||||
else:
|
||||
segments = parse_srt_all(str(transcript_path)) if (transcript_path and transcript_path.exists()) else []
|
||||
|
||||
# 4. 确定要删除的段落
|
||||
if remove_list_ranges:
|
||||
filler_segments = [{"start_sec": a, "end_sec": b} for a, b in remove_list_ranges]
|
||||
print(f"将删除 {len(filler_segments)} 个手动指定的时间段")
|
||||
for s in filler_segments:
|
||||
print(f" 删除: {s['start_sec']:.2f}s - {s['end_sec']:.2f}s")
|
||||
keep_ranges = build_keep_ranges(segments, total_duration, remove_list_ranges)
|
||||
else:
|
||||
filler_segments = [s for s in segments if is_filler_only(s["text"])]
|
||||
contain_ng = [s for s in segments if "嗯" in s["text"]]
|
||||
print(f"共 {len(segments)} 段字幕,其中 {len(filler_segments)} 段为纯语助词(将删除)")
|
||||
if getattr(args, "debug", False) and contain_ng:
|
||||
print(f" 含「嗯」的段落共 {len(contain_ng)} 个:")
|
||||
for s in contain_ng:
|
||||
print(f" {s['start_sec']:.2f}s-{s['end_sec']:.2f}s 「{s['text']}」 is_filler={is_filler_only(s['text'])}")
|
||||
for s in filler_segments:
|
||||
print(f" 删除: {s['start_sec']:.2f}s - {s['end_sec']:.2f}s 「{s['text']}」")
|
||||
keep_ranges = build_keep_ranges(segments, total_duration)
|
||||
|
||||
if args.dry_run:
|
||||
print("--dry-run 模式,未处理视频")
|
||||
return 0
|
||||
if len(keep_ranges) == 1 and keep_ranges[0][0] == 0 and keep_ranges[0][1] >= total_duration - 0.5:
|
||||
print("无需要删除的语助词段落,跳过处理")
|
||||
return 0
|
||||
|
||||
# 4. ffmpeg 截取保留片段并拼接
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpp = Path(tmpdir)
|
||||
seg_files = []
|
||||
for i, (start, end) in enumerate(keep_ranges):
|
||||
dur = end - start
|
||||
if dur < 0.1:
|
||||
continue
|
||||
seg = tmpp / f"seg_{i:04d}.mp4"
|
||||
ok = run_ffmpeg([
|
||||
"ffmpeg", "-y", "-ss", str(start), "-t", str(dur),
|
||||
"-i", str(video_path), "-c", "copy", str(seg)
|
||||
])
|
||||
if ok and seg.exists():
|
||||
seg_files.append(seg)
|
||||
|
||||
if not seg_files:
|
||||
print("❌ 未能生成有效片段")
|
||||
return 1
|
||||
|
||||
# concat list
|
||||
list_path = tmpp / "list.txt"
|
||||
with open(list_path, "w") as f:
|
||||
for seg in seg_files:
|
||||
f.write(f"file '{seg}'\n")
|
||||
|
||||
print(f"拼接 {len(seg_files)} 个片段...")
|
||||
ok = run_ffmpeg([
|
||||
"ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(list_path),
|
||||
"-c", "copy", str(output_path)
|
||||
])
|
||||
if not ok:
|
||||
print("❌ 拼接失败")
|
||||
return 1
|
||||
|
||||
print(f"✅ 已输出: {output_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
160
03_卡木(木)/木叶_视频内容/视频切片/脚本/remove_ng_auto.py
Normal file
160
03_卡木(木)/木叶_视频内容/视频切片/脚本/remove_ng_auto.py
Normal file
@@ -0,0 +1,160 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
全自动删除视频中「嗯」的语音段落
|
||||
使用 whisper-timestamped 词级时间戳检测 嗯,ffmpeg 裁剪输出
|
||||
"""
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
# 语助词(词级匹配)
|
||||
FILLER_CHARS = "嗯啊呃额哦噢唉哎诶喔"
|
||||
|
||||
|
||||
def extract_audio(video_path: str, out_path: str) -> bool:
|
||||
r = subprocess.run([
|
||||
"ffmpeg", "-y", "-i", video_path, "-vn", "-acodec", "pcm_s16le", "-ar", "16000", out_path
|
||||
], capture_output=True)
|
||||
return r.returncode == 0
|
||||
|
||||
|
||||
def get_duration(video_path: str) -> float:
|
||||
r = subprocess.run(
|
||||
["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", video_path],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
return float(r.stdout.strip()) if r.returncode == 0 else 0
|
||||
|
||||
|
||||
def transcribe_word_level(audio_path: str, language: str = "zh") -> dict:
|
||||
import whisper_timestamped as whisper
|
||||
audio = whisper.load_audio(audio_path)
|
||||
model = whisper.load_model("small", device="cpu") # small 更准,易识别语助词
|
||||
# 用 initial_prompt 提示模型保留 嗯 等语助词
|
||||
return whisper.transcribe(
|
||||
model, audio, language=language, detect_disfluencies=True,
|
||||
initial_prompt="嗯 啊 呃 然后 就是 那个 所以说。这是一段中文语音转写,请保留说话人发出的嗯、啊等语气词。"
|
||||
)
|
||||
|
||||
|
||||
def find_filler_ranges(result: dict) -> list:
|
||||
"""从词级转录结果中找出语助词(嗯等)的时间段:词级 + 段级兜底"""
|
||||
ranges = []
|
||||
for seg in result.get("segments", []):
|
||||
for w in seg.get("words", []):
|
||||
text = (w.get("text") or "").strip()
|
||||
t_clean = re.sub(r"[\s,。、,.\-—…]+", "", text)
|
||||
if not t_clean:
|
||||
continue
|
||||
if re.match(r"^[嗯啊呃噢哦额唉哎诶喔]+$", t_clean):
|
||||
s, e = w.get("start"), w.get("end")
|
||||
if s is not None and e is not None and e - s > 0.05:
|
||||
ranges.append((float(s), float(e)))
|
||||
# 段级兜底:整段仅为语助词
|
||||
seg_text = re.sub(r"[\s,。、,.\-—…]+", "", (seg.get("text") or "").strip())
|
||||
if seg_text and re.match(r"^[嗯啊呃噢哦额唉哎诶喔]+$", seg_text):
|
||||
ss, se = seg.get("start"), seg.get("end")
|
||||
if ss is not None and se is not None:
|
||||
r = (float(ss), float(se))
|
||||
if r not in ranges and all(r[0] >= x[1] or r[1] <= x[0] for x in ranges):
|
||||
ranges.append(r)
|
||||
return sorted(ranges, key=lambda x: x[0])
|
||||
|
||||
|
||||
def build_keep_ranges(remove_ranges: list, total_duration: float) -> list:
|
||||
remove_ranges = sorted(remove_ranges, key=lambda x: x[0])
|
||||
keep = []
|
||||
current = 0.0
|
||||
for rs, re in remove_ranges:
|
||||
if rs > current + 0.05:
|
||||
keep.append((current, rs))
|
||||
current = max(current, re)
|
||||
if current < total_duration - 0.05:
|
||||
keep.append((current, total_duration))
|
||||
return keep
|
||||
|
||||
|
||||
def run_ffmpeg(args: list) -> bool:
|
||||
return subprocess.run(args, capture_output=True).returncode == 0
|
||||
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("video", help="输入视频")
|
||||
ap.add_argument("-o", "--output", help="输出路径")
|
||||
args = ap.parse_args()
|
||||
|
||||
video_path = Path(args.video).resolve()
|
||||
if not video_path.exists():
|
||||
print("❌ 视频不存在")
|
||||
return 1
|
||||
|
||||
output_path = Path(args.output) if args.output else video_path.parent / f"{video_path.stem}_去嗯.mp4"
|
||||
total_duration = get_duration(str(video_path))
|
||||
if total_duration <= 0:
|
||||
print("❌ 无法获取时长")
|
||||
return 1
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmp = Path(tmpdir)
|
||||
audio_path = tmp / "audio.wav"
|
||||
print("1. 提取音频...")
|
||||
if not extract_audio(str(video_path), str(audio_path)):
|
||||
print("❌ 提取音频失败")
|
||||
return 1
|
||||
|
||||
print("2. whisper-timestamped 词级转录(检测语助词)...")
|
||||
try:
|
||||
result = transcribe_word_level(str(audio_path))
|
||||
except Exception as e:
|
||||
print(f"❌ 转录失败: {e}")
|
||||
return 1
|
||||
|
||||
remove_ranges = find_filler_ranges(result)
|
||||
print(f" 检测到 {len(remove_ranges)} 处语助词(嗯等)")
|
||||
for s, e in remove_ranges[:10]:
|
||||
print(f" {s:.2f}s - {e:.2f}s")
|
||||
if len(remove_ranges) > 10:
|
||||
print(f" ... 共 {len(remove_ranges)} 处")
|
||||
|
||||
if not remove_ranges:
|
||||
print(" 无检测到语助词,复制原视频作为输出")
|
||||
import shutil
|
||||
shutil.copy(str(video_path), str(output_path))
|
||||
print(f"✅ 已复制: {output_path}")
|
||||
return 0
|
||||
|
||||
keep_ranges = build_keep_ranges(remove_ranges, total_duration)
|
||||
|
||||
print("3. ffmpeg 裁剪并拼接...")
|
||||
seg_files = []
|
||||
for i, (start, end) in enumerate(keep_ranges):
|
||||
dur = end - start
|
||||
if dur < 0.1:
|
||||
continue
|
||||
seg = tmp / f"seg_{i:04d}.mp4"
|
||||
if run_ffmpeg(["ffmpeg", "-y", "-ss", str(start), "-t", str(dur), "-i", str(video_path), "-c", "copy", str(seg)]):
|
||||
seg_files.append(seg)
|
||||
|
||||
if not seg_files:
|
||||
print("❌ 片段生成失败")
|
||||
return 1
|
||||
|
||||
list_path = tmp / "list.txt"
|
||||
with open(list_path, "w") as f:
|
||||
for seg in seg_files:
|
||||
f.write(f"file '{seg}'\n")
|
||||
|
||||
if not run_ffmpeg(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(list_path), "-c", "copy", str(output_path)]):
|
||||
print("❌ 拼接失败")
|
||||
return 1
|
||||
|
||||
print(f"✅ 已输出: {output_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
@@ -166,3 +166,4 @@
|
||||
| 2026-02-26 00:43:29 | 🔄 卡若AI 同步 2026-02-26 00:43 | 更新:运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
| 2026-02-26 16:43:05 | 🔄 卡若AI 同步 2026-02-26 16:41 | 更新:金仓、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
| 2026-02-27 05:06:58 | 🔄 卡若AI 同步 2026-02-27 05:06 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
| 2026-02-27 05:21:57 | 🔄 卡若AI 同步 2026-02-27 05:21 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
|
||||
@@ -169,3 +169,4 @@
|
||||
| 2026-02-26 00:43:29 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-26 00:43 | 更新:运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
| 2026-02-26 16:43:05 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-26 16:41 | 更新:金仓、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
| 2026-02-27 05:06:58 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 05:06 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
| 2026-02-27 05:21:57 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 05:21 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
|
||||
Reference in New Issue
Block a user