添加 READMIN.md

2026-05-23 18:05:04 +08:00 · 2026-05-23 18:05:04 +08:00 · a6001e3688
commit a6001e3688
parent 24d34f695f
1 changed files with 416 additions and 0 deletions
--- a/READMIN.md
+++ b/READMIN.md
@ -0,0 +1,416 @@
+---
+date: 2026-05-07
+topic: "Go Agent Framework - Orca"
+status: validated
+---
+
+# Go Agent Framework (Orca) 设计文档
+
+## Problem Statement
+
+构建一个基于 Go 的基础 Agent 框架，支持多 Agent 协作、持久化会话记忆、Skill 技能自动识别、沙箱安全执行、自定义 Tool 注册扩展，并接入 Ollama 本地模型（gemma4:e4b）。
+
+**核心挑战：**
+- 如何在 Go 中实现轻量、高并发的多 Agent 系统
+- 如何安全地执行用户命令和 Skill 脚本
+- 如何设计可扩展的插件机制（Skill / Tool）
+- 如何管理会话上下文和记忆
+
+## Constraints
+
+1. **语言约束：** 纯 Go 实现，最小化外部依赖
+2. **存储约束：** 使用 JSON Lines（无 SQLite/数据库依赖）
+3. **隔离约束：** 进程级限制（chroot + 资源限制），不依赖 Docker
+4. **模型约束：** 仅接入 Ollama 本地模型，默认 gemma4:e4b
+5. **Skill 目录：** 读取 `~/.agents/skills/` 下的 Skill 定义
+6. **部署约束：** 单二进制文件，零配置启动
+
+## Approach
+
+### 架构风格：微内核 + Actor 模型
+
+采用**微内核架构**作为基础，所有功能（Skill、Tool、LLM 驱动）都以**插件**形式注册到核心。
+
+每个 **Agent 实例是一个独立的 Actor**，通过 **消息总线（Message Bus）** 进行通信。这完美契合 Go 的 goroutine + channel 并发模型。
+
+**为什么选择这个组合？**
+- 微内核保证核心最小化，Skill 和 Tool 热插拔
+- Actor 模型天然支持高并发，避免共享状态
+- 两者结合 = 轻量级、高扩展、Go 原生友好
+
+**放弃的其他方案：**
+- Docker 沙箱：太重，违背最小依赖原则
+- SQLite 存储：增加依赖，JSONL 已足够
+- 中央协调器：单点瓶颈，不如 Actor 模型灵活
+
+## Architecture
+
+### 整体架构图
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                        CLI / API Layer                       │
+└─────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│                      Core Kernel (微内核)                     │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
+│  │  Message Bus │  │  Plugin Reg  │  │  Session Manager │  │
+│  │  (channel)   │  │  (registry)  │  │   (JSONL-based)  │  │
+│  └──────────────┘  └──────────────┘  └──────────────────┘  │
+└─────────────────────────────────────────────────────────────┘
+                              │
+              ┌───────────────┼───────────────┐
+              ▼               ▼               ▼
+┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
+│   Agent Actor   │ │   Agent Actor   │ │   Agent Actor   │
+│  (Specialist 1) │ │  (Specialist 2) │ │  (Orchestrator) │
+└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
+         │                   │                   │
+         └───────────────────┼───────────────────┘
+                             ▼
+┌─────────────────────────────────────────────────────────────┐
+│                      Plugin Layer                            │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐│
+│  │  Skills  │  │  Tools   │  │  Ollama  │  │ Custom Tools ││
+│  │(Skill Mgr)│  │(Tool Mgr)│  │ (Driver) │  │  (Registry)  ││
+│  └──────────┘  └──────────┘  └──────────┘  └──────────────┘│
+└─────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────┐
+│                      Sandbox Layer                           │
+│         (Process-level isolation + Resource limits)          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 模块职责
+
+| 模块 | 职责 |
+|------|------|
+| **Core Kernel** | 消息路由、插件生命周期管理、会话协调 |
+| **Message Bus** | 基于 Go channel 的异步消息传递系统 |
+| **Plugin Registry** | 统一的 Skill/Tool/LLM 驱动注册中心 |
+| **Session Manager** | 基于 JSONL 的会话历史读写和上下文窗口管理 |
+| **Agent Actor** | 独立 goroutine，持有状态，接收/发送消息 |
+| **Skill Manager** | 扫描 `~/.agents/skills/`，解析 SKILL.md，加载技能 |
+| **Tool Manager** | 管理内置工具和自定义工具的注册/调用 |
+| **Ollama Driver** | 封装 Ollama HTTP API，支持流式响应 |
+| **Sandbox** | 安全执行 shell 命令和脚本，限制资源和时间 |
+
+## Components
+
+### 1. Core Kernel (微内核)
+
+**职责：** 框架的最小化核心，只负责消息路由和插件生命周期。
+
+**设计要点：**
+- 使用 Go 的 `interface{}` 或泛型定义插件契约
+- 启动时加载所有已注册的插件
+- 提供事件总线供插件间通信
+- **不**包含任何业务逻辑（如 LLM 调用、命令执行）
+
+**核心接口：**
+```
+// 所有插件必须实现
+Plugin interface {
+    Name() string
+    Init(kernel *Kernel) error
+    Shutdown() error
+}
+
+// 消息总线
+MessageBus interface {
+    Publish(topic string, msg Message) error
+    Subscribe(topic string, handler Handler) (Subscription, error)
+}
+```
+
+### 2. Actor System (多 Agent 引擎)
+
+**职责：** 管理 Agent 生命周期和消息通信。
+
+**设计要点：**
+- 每个 Agent 是一个独立的 goroutine，通过 channel 接收消息
+- Agent 持有自己的状态（角色、上下文、工具列表）
+- 支持三种 Agent 类型：Orchestrator（协调者）、Worker（执行者）、Specialist（专家）
+- 消息类型：`TaskRequest`、`TaskResponse`、`ToolCall`、`Observation`
+
+**Agent 状态机：**
+```
+Idle → Processing → [ToolCall] → WaitingForTool → Processing → Completed
+                    ↓
+              [Error] → Failed
+```
+
+### 3. Session Manager (会话记忆)
+
+**职责：** 持久化会话历史，支持上下文窗口管理。
+
+**设计要点：**
+- 每个会话一个 JSONL 文件：`~/.orca/sessions/{session_id}.jsonl`
+- 每行一个 JSON 对象：`{role, content, timestamp, metadata}`
+- 提供 `GetContext(windowSize)` 方法，返回最近的 N 条消息
+- 支持会话列表、搜索、归档
+
+**为什么 JSON Lines？**
+- 追加写入 O(1)，无需加载整个文件
+- 人类可读，便于调试
+- 零依赖，无需数据库驱动
+- 通过简单文件锁保证并发安全
+
+### 4. Skill Manager (技能系统)
+
+**职责：** 自动发现和加载 Skill。
+
+**设计要点：**
+- 启动时扫描 `~/.agents/skills/` 下的所有子目录
+- 解析每个 Skill 目录下的 `SKILL.md`
+- 提取元数据：`name`、`description`、`triggers`（触发词）
+- Skill 可以包含脚本文件（`scripts/` 目录）
+- 提供 `FindSkill(query string)` 方法，基于触发词匹配
+
+**Skill 结构：**
+```yaml
+name: "md2pdf"
+description: "Convert Markdown to PDF..."
+triggers: ["pdf", "markdown", "export"]
+scripts:
+  - "scripts/convert.py"
+  - "scripts/setup.sh"
+```
+
+### 5. Tool Manager (工具系统)
+
+**职责：** 管理可执行工具的注册和调用。
+
+**设计要点：**
+- **内置工具：** `exec`（执行命令）、`read_file`、`write_file`、`list_dir`
+- **Skill 工具：** 从 Skill 的 `scripts/` 目录自动注册
+- **自定义工具：** 通过代码注册，实现 `Tool` 接口
+- 每个工具定义：名称、描述、参数 schema、执行函数
+- LLM 通过 Function Calling 调用工具
+
+**Tool 接口：**
+```
+Tool interface {
+    Name() string
+    Description() string
+    Parameters() JSONSchema
+    Execute(ctx Context, args map[string]any) (Result, error)
+}
+```
+
+### 6. Ollama Driver (LLM 驱动)
+
+**职责：** 封装 Ollama API，提供统一的 LLM 调用接口。
+
+**设计要点：**
+- 默认模型：`gemma4:e4b`
+- 支持流式响应（SSE）
+- 支持 Function Calling（通过 tools 参数）
+- 自动处理上下文窗口截断
+- 可配置参数：temperature、top_p、max_tokens
+
+**API 封装：**
+```
+LLMClient interface {
+    Chat(messages []Message, tools []Tool) (Response, error)
+    ChatStream(messages []Message, tools []Tool) (Stream, error)
+}
+```
+
+### 7. Sandbox (沙箱执行)
+
+**职责：** 安全地执行终端命令和脚本。
+
+**设计要点：**
+- 使用 `os/exec` 创建子进程
+- 资源限制：CPU 时间、内存、输出大小
+- 超时控制：默认 30 秒，可配置
+- 工作目录限制：可选 chroot 或指定工作目录
+- 环境变量隔离：只允许白名单环境变量
+- **不**使用 Docker，保持轻量
+
+**安全策略：**
+```yaml
+sandbox:
+  timeout: 30s
+  max_memory: 512MB
+  max_output: 64KB
+  allowed_env: [PATH, HOME, USER]
+  working_dir: /tmp/orca-sandbox
+  read_only_dirs: []
+  blocked_commands: [rm -rf /, mkfs, dd]
+```
+
+## Data Flow
+
+### 典型交互流程
+
+```
+用户输入
+    │
+    ▼
+┌─────────────┐
+│   CLI/API   │
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐     ┌─────────────┐
+│ Session Mgr │────▶│ 加载历史上下文 │
+└──────┬──────┘     └─────────────┘
+       │
+       ▼
+┌─────────────┐
+│ Orchestrator │ (Agent Actor)
+│   Agent      │
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐     ┌─────────────┐
+│  Skill Mgr  │────▶│ 匹配相关 Skill │
+└──────┬──────┘     └─────────────┘
+       │
+       ▼
+┌─────────────┐     ┌─────────────┐
+│ Ollama Driver│────▶│ 发送 prompt  │
+└──────┬──────┘     └─────────────┘
+       │
+       ▼
+┌─────────────┐
+│ LLM Response │
+│ (Function    │
+│   Calling)   │
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐     ┌─────────────┐
+│  Tool Call   │────▶│ 执行 Tool/   │
+│              │     │ 沙箱命令     │
+└──────┬──────┘     └─────────────┘
+       │
+       ▼
+┌─────────────┐
+│ Observation  │ (工具执行结果)
+└──────┬──────┘
+       │
+       ▼
+┌─────────────┐     ┌─────────────┐
+│ Orchestrator │────▶│ 决策：继续/完成 │
+└──────┬──────┘     └─────────────┘
+       │
+       ▼
+┌─────────────┐
+│  保存会话     │
+│  返回结果     │
+└─────────────┘
+```
+
+### 消息类型定义
+
+```go
+type Message struct {
+    ID        string
+    Type      MessageType // TaskRequest, TaskResponse, ToolCall, Observation, Error
+    From      string      // Agent ID
+    To        string      // Agent ID or "broadcast"
+    Content   interface{} // 根据 Type 不同而变化
+    Timestamp time.Time
+}
+
+type TaskRequest struct {
+    Query    string
+    SessionID string
+    Context  []ChatMessage
+}
+
+type ToolCall struct {
+    ToolName string
+    Arguments map[string]interface{}
+}
+
+type Observation struct {
+    ToolCallID string
+    Output     string
+    Error      string
+}
+```
+
+## Error Handling
+
+### 策略
+
+1. **分层错误处理：**
+   - **Kernel 层：** 插件加载失败 → 记录日志，跳过该插件，继续启动
+   - **Agent 层：** 任务执行失败 → 返回错误消息，让 Orchestrator 决策重试或终止
+   - **Tool 层：** 工具执行失败 → 返回结构化错误，LLM 可据此调整策略
+   - **Sandbox 层：** 命令超时/内存超限 → 强制终止进程，返回错误
+
+2. **重试机制：**
+   - LLM API 调用：指数退避重试 3 次
+   - 工具执行：不重试（避免循环），由 LLM 决策
+
+3. **优雅降级：**
+   - Ollama 不可用 → 提示用户检查服务
+   - Skill 解析失败 → 跳过该 Skill，不影响其他
+   - 沙箱执行失败 → 返回错误信息，LLM 可尝试其他工具
+
+### 错误类型
+
+```go
+type ErrorCategory int
+
+const (
+    ErrCategoryKernel    ErrorCategory = iota // 内核错误
+    ErrCategoryAgent                          // Agent 错误
+    ErrCategoryTool                           // 工具错误
+    ErrCategorySandbox                        // 沙箱错误
+    ErrCategoryLLM                            // LLM 错误
+    ErrCategoryNetwork                        // 网络错误
+)
+```
+
+## Testing Strategy
+
+### 测试金字塔
+
+1. **单元测试（60%）：**
+   - `Kernel`：插件注册/卸载、消息路由
+   - `SessionManager`：JSONL 读写、上下文窗口截断
+   - `SkillManager`：Skill 解析、触发词匹配
+   - `Sandbox`：资源限制、超时控制
+   - `OllamaDriver`：HTTP 请求封装（使用 mock server）
+
+2. **集成测试（30%）：**
+   - Agent + Tool：端到端任务执行
+   - Agent + LLM：使用 mock LLM 测试 Function Calling 流程
+   - Skill + Sandbox：加载 Skill 并执行其脚本
+
+3. **E2E 测试（10%）：**
+   - 完整 CLI 工作流
+   - 多 Agent 协作场景
+
+### Mock 策略
+
+- `LLMClient`：使用接口，测试时注入 mock
+- `Sandbox`：提供 `DryRun` 模式，记录命令但不执行
+- `MessageBus`：内存实现，用于测试
+
+## Open Questions
+
+1. **Skill 执行方式：** Skill 脚本是用 Shell 调用还是直接在 Go 中执行？当前设计倾向 Shell 调用（通过 Sandbox），但 Python/Node 脚本需要对应运行时。
+   - **假设：** 用户环境已安装所需运行时（Python、Node 等），Sandbox 只负责安全执行。
+
+2. **Function Calling 格式：** gemma4:e4b 对 Function Calling 的支持程度？
+   - **假设：** 使用 Ollama 的 `tools` 参数格式，如果不支持则 fallback 到 prompt-based tool calling。
+
+3. **多 Agent 协作粒度：** Agent 之间是平等协作还是有层级？
+   - **假设：** 支持两种模式：层级（Orchestrator + Workers）和平等（对等协作），由用户配置。
+
+4. **会话共享：** 多个 Agent 是否可以共享同一个会话上下文？
+   - **假设：** 是，Session Manager 通过文件锁支持并发读取，但同一时间只有一个 Agent 写入。
+
+5. **Tool 参数 Schema：** 使用 JSON Schema 还是简化格式？
+   - **假设：** 使用简化版 JSON Schema（支持 string/number/boolean/array/object + description）。