用 Go + Ollama 构建本地离线 RAG 知识库系统-平芜编程栈

# 用 Go + Ollama 构建本地离线 RAG 知识库系统

## 背景

大模型应用开发中，RAG（Retrieval-Augmented Generation，检索增强生成）已成为最主流的知识增强方案。但云端 API 调用存在成本、隐私、稳定性等问题。本文将介绍如何用 Go 语言 + Ollama 本地模型，构建一个完全离线运行的 RAG 知识库问答系统。

## 项目概述

本项目基于 Go 语言实现，核心特性：

- ✅ **完全离线运行** - 断网状态仍可正常问答

- ✅ **本地模型适配** - 支持 Ollama、OpenAI、DeepSeek 多后端

- ✅ **文档自动切片** - Markdown 按标题层级智能切分

- ✅ **多轮对话** - Session 上下文记忆管理

- ✅ **配置化设计** - 一个配置文件切换所有参数

## 技术架构

```

┌─────────────────────────────────────────────┐

│ HTTP API │

│ (Gin 框架) │

└─────────────────────────────────────────────┘

│

┌─────────────┼─────────────┐

▼ ▼ ▼

文档服务 RAG 服务会话管理

│ │ │

▼ ▼ ▼

文本切片向量检索 SQLite

(Markdown) (Memory/ 历史存储

Milvus)

│ │

▼ ▼

┌──────────────┐ ┌──────────────┐

│ Embedding │ │ LLM │

│ (Ollama) │ │ (Ollama) │

│nomic-embed │ │ qwen2.5:7b │

│ -text │ │ │

└──────────────┘ └──────────────┘

```

## 核心模块实现

### 1. 配置化管理

通过 TOML 配置文件管理所有参数，支持一键切换模型后端：

```toml

# config.toml

[vector]

type = "memory" # 可切换为 milvus

dim = 768 # nomic-embed-text 向量维度

[embedding]

provider = "ollama" # 可切换为 openai、deepseek

model = "nomic-embed-text"

[llm]

provider = "ollama"

model = "qwen2.5:7b"

[rag]

topk = 5 # 检索文档数量

min_score = 0.6 # 最小相似度阈值

```

配置解析核心代码：

```go

func LoadConfig(path string) (*Config, error) {

data, err := os.ReadFile(path)

if err != nil {

return nil, err

}

cfg := &Config{

Server: ServerConfig{Host: "localhost", Port: 8080},

Embedding: EmbeddingConfig{Provider: "ollama", Model: "nomic-embed-text"},

LLM: LLMConfig{Provider: "ollama", Model: "qwen2.5:7b"},

}

// TOML section 解析

lines := strings.Split(string(data), "\n")

currentSection := ""

for _, line := range lines {

if strings.HasPrefix(line, "[") && strings.HasSuffix(line, "]") {

currentSection = line[1:len(line)-1]

continue

}

// ... 解析 key=value

}

return cfg, nil

}

```

### 2. 本地 Embedding 实现

Ollama 提供了原生 Embedding API，无需 OpenAI Key：

```go

func (c *Client) ollamaEmbeddings(ctx context.Context, texts []string) ([][]float32, error) {

baseURL := "http://localhost:11434"

vectors := make([][]float32, 0, len(texts))

for _, text := range texts {

reqBody := map[string]interface{}{

"model": c.config.Model, // nomic-embed-text

"input": text,

}

reqBytes, _ := json.Marshal(reqBody)

// Ollama embed API

resp, err := http.Post(baseURL+"/api/embed", "application/json", bytes.NewReader(reqBytes))

if err != nil {

return nil, err

}

var result struct {

Embeddings [][]float32 `json:"embeddings"`

}

json.NewDecoder(resp.Body).Decode(&result)

vectors = append(vectors, result.Embeddings[0])

}

return vectors, nil

}

```

**关键点**：

- 使用 Ollama 新版 `/api/embed` 接口

- nomic-embed-text 生成 768 维向量

- 支持批量处理提升效率

### 3. 统一 LLM 客户端接口

设计统一的 `Client` 接口，支持多后端切换：

```go

type Client interface {

Chat(ctx context.Context, messages []Message) (string, error)

ChatStream(ctx context.Context, messages []Message, callback func(string)) error

GetModel() string

}

// Ollama 实现

type OllamaClient struct {

model string

baseURL string

}

func (c *OllamaClient) Chat(ctx context.Context, messages []Message) (string, error) {

reqBody := map[string]interface{}{

"model": c.model,

"messages": messages,

"stream": false,

}

resp, err := http.Post(c.baseURL+"/api/chat", "application/json", bytes.NewReader(reqBytes))

// ... 解析响应

return result.Message.Content, nil

}

// OpenAI/DeepSeek 兼容实现

type OpenAIClient struct {

model string

apiKey string

baseURL string

}

```

### 4. Markdown 文档切片器

按标题层级智能切分，保留文档结构：

```go

type MarkdownSplitter struct{}

func (s *MarkdownSplitter) Split(text string) []Chunk {

chunks := make([]Chunk, 0)

headingRegex := regexp.MustCompile(`^(#{1,6})\s+(.+)`)

var currentContent strings.Builder

var currentHeading string

for _, line := range strings.Split(text, "\n") {

if match := headingRegex.FindStringSubmatch(line); match != nil {

// 遇到新标题，保存上一个切片

if currentContent.Len() > 0 {

chunks = append(chunks, Chunk{

Content: currentContent.String(),

Metadata: map[string]interface{}{

"heading": currentHeading,

})

}

currentContent.Reset()

currentHeading = match[2]

} else {

currentContent.WriteString(line + "\n")

}

return chunks

}

```

### 5. 多轮对话上下文管理

基于 Session ID + SQLite 存储历史消息：

```go

func (s *RAGService) Ask(req *AskRequest) (*AskResponse, error) {

// 1. 获取或创建会话

sessionID := req.SessionID

if sessionID == "" {

sessionID = fmt.Sprintf("session_%d", time.Now().UnixNano())

}

// 2. 获取历史消息（限制最近 10 条）

history, _ := s.repo.GetChatHistory(sessionID)

messages := make([]llm.Message, 0)

startIdx := 0

if len(history) > 10 {

startIdx = len(history) - 10

}

for _, msg := range history[startIdx:] {

messages = append(messages, llm.Message{

Role: msg.Role,

Content: msg.Content,

})

}

// 3. 添加当前问题

messages = append(messages, llm.Message{

Role: "user",

Content: req.Question,

})

// 4. 调用 LLM

answer, _ := s.llmClient.Chat(ctx, messages)

// 5. 保存对话记录

s.repo.CreateMessage(&ChatMessage{

SessionID: sessionID,

Role: "user",

Content: req.Question,

})

s.repo.CreateMessage(&ChatMessage{

SessionID: sessionID,

Role: "assistant",

Content: answer,

})

return &AskResponse{

SessionID: sessionID,

Answer: answer,

}, nil

}

```

## 部署与测试

### 前置条件

```bash

# 1. 安装 Ollama

winget install Ollama.Ollama

# 2. 拉取模型

ollama pull nomic-embed-text # Embedding 模型（274MB）

ollama pull qwen2.5:7b # LLM 模型（4.7GB）

```

### 启动服务

```bash

cd cmd/rag

go build -o rag-server.exe .

./rag-server.exe

```

### 测试流程

```bash

# 1. 健康检查

curl http://localhost:8080/api/v1/health

# 2. 上传文档

curl -X POST http://localhost:8080/api/v1/documents/upload \

-F "file=@test.md"

# 3. 单轮问答

curl -X POST http://localhost:8080/api/v1/chat/ask \

-H "Content-Type: application/json" \

-d '{"question": "Go语言的特性有哪些？"}'

# 4. 多轮追问（使用返回的 session_id）

curl -X POST http://localhost:8080/api/v1/chat/ask \

-H "Content-Type: application/json" \

-d '{"session_id": "session_xxx", "question": "详细说说并发性"}'

```

## 测试结果

| 功能 | 状态 | 说明 |

|------|------|------|

| 文档切片 | ✅ | Markdown 9 个切片成功入库 |

| 向量化 | ✅ | 768 维向量生成正常 |

| RAG 检索 | ✅ | Top-5 文档召回准确 |

| LLM 回答 | ✅ | Qwen2.5:7b 响应正常 |

| 多轮对话 | ✅ | 上下文传递正确 |

| 离线运行 | ✅ | 断网状态仍可问答 |

## 遇到的问题与解决

### 问题 1: Embedding URL 拼接错误

**错误**：`ollama request failed: 404`

**原因**：OpenAI 兼容接口配置了 `/v1` 前缀，但 Ollama 原生 API 是 `/api/embed`

**解决**：在 `ollamaEmbeddings` 方法中正确处理 baseURL：

```go

baseURL := "http://localhost:11434"

if c.config.BaseURL != "" {

baseURL = strings.TrimSuffix(c.config.BaseURL, "/v1")

}

```

### 问题 2: TOML 配置解析失败

**错误**：配置值包含注释内容，如 `ollama" # 可选...`

**原因**：解析器未处理 `#` 注释

**解决**：添加注释处理逻辑：

```go

value := strings.TrimSpace(parts[1])

if idx := strings.Index(value, "#"); idx != -1 {

value = strings.TrimSpace(value[:idx])

}

```

### 问题 3: 向量维度不匹配

**错误**：`vector dim 0 not match collection definition`

**原因**：Embedding 返回空向量，因 baseURL 配置问题

**解决**：统一 Embedding 和 LLM 客户端的 baseURL 处理逻辑

## 下一步优化

1. **流式输出** - 添加 SSE 接口 `/api/v1/chat/stream`

2. **Milvus 集成** - 替换内存存储，支持大规模数据

3. **更多文档格式** - 支持 PDF、Word 解析

4. **混合检索** - BM25 + 向量混合检索

5. **Web UI** - 添加前端界面

## 项目地址

GitHub: https://github.com/Blue-wu/golllm

分支: `feature/local-model`

---

**总结**：通过本项目，实现了完全离线的 RAG 知识库系统。核心在于：

1. Ollama 本地模型替代云端 API

2. 统一客户端接口设计实现多后端切换

3. 配置化管理降低维护成本

这为后续的私有化部署、敏感数据处理、成本控制等场景提供了可行的技术方案。

用 Go + Ollama 构建本地离线 RAG 知识库系统

DesktopNaotu：跨平台离线思维导图解决方案的三大核心价值

公交实时监控、准点率、异常调度业务复盘

建筑物混凝土立面缺陷数据墙体脱落剥落图像分割数据集数据集第10159期

数据库系统原理期末复习（四）

打破物种壁垒：多谱系神经原代细胞在类器官与疾病模型中的进阶应用

吃透Spring事务：核心原理，传播机制，隔离级别，使用场景

DesktopNaotu：跨平台离线思维导图解决方案的三大核心价值

公交实时监控、准点率、异常调度业务复盘

建筑物混凝土立面缺陷数据 墙体脱落剥落图像分割数据集 数据集第10159期

数据库系统原理期末复习（四）

打破物种壁垒：多谱系神经原代细胞在类器官与疾病模型中的进阶应用

吃透Spring事务 ：核心原理，传播机制，隔离级别，使用场景

建筑物混凝土立面缺陷数据墙体脱落剥落图像分割数据集数据集第10159期

吃透Spring事务：核心原理，传播机制，隔离级别，使用场景