Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,31 @@ weclone-cli make-dataset
```
More Parameter Details: [Data Preprocessing](https://docs.weclone.love/docs/deploy/data_preprocessing.html#related-parameters)

#### Online LLM Providers for Data Cleaning

When using `online_llm_clear`, you can set `llm_provider` to quickly configure a supported provider. The provider presets automatically populate `base_url` and `model_name` (which can still be overridden explicitly):

| Provider | `llm_provider` | Default Model | API Docs |
|----------|----------------|---------------|----------|
| OpenAI | `"openai"` | `gpt-4o-mini` | [platform.openai.com](https://platform.openai.com/docs) |
| DeepSeek | `"deepseek"` | `deepseek-chat` | [platform.deepseek.com](https://platform.deepseek.com/docs) |
| MiniMax | `"minimax"` | `MiniMax-M2.7` | [platform.minimax.io](https://platform.minimax.io/docs/api-reference/text-openai-api) |
| Custom | `"custom"` | *(manual)* | Any OpenAI-compatible API |

Example using MiniMax for data cleaning:
```jsonc
"make_dataset_args": {
"online_llm_clear": true,
"llm_provider": "minimax",
"llm_api_key": "your-minimax-api-key",
// base_url and model_name are auto-filled:
// base_url → https://api.minimax.io/v1
// model_name → MiniMax-M2.7
}
```

> You can also use the China endpoint by explicitly setting `"base_url": "https://api.minimaxi.com/v1"`.

## Configure Parameters and Fine-tune Model

- (Optional) Modify `model_name_or_path`, `template`, `lora_target` in `settings.jsonc` to select other locally downloaded models.
Expand Down
25 changes: 25 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,31 @@ weclone-cli make-dataset
```
数据处理更多参数说明:[数据预处理](https://docs.weclone.love/zh/docs/deploy/data_preprocessing.html#%E7%9B%B8%E5%85%B3%E5%8F%82%E6%95%B0)

#### 在线 LLM 供应商(数据清洗)

使用 `online_llm_clear` 时,可通过 `llm_provider` 快速配置供应商,自动填充 `base_url` 和 `model_name`(仍可手动覆盖):

| 供应商 | `llm_provider` | 默认模型 | API 文档 |
|--------|----------------|----------|----------|
| OpenAI | `"openai"` | `gpt-4o-mini` | [platform.openai.com](https://platform.openai.com/docs) |
| DeepSeek | `"deepseek"` | `deepseek-chat` | [platform.deepseek.com](https://platform.deepseek.com/docs) |
| MiniMax | `"minimax"` | `MiniMax-M2.7` | [platform.minimax.io](https://platform.minimax.io/docs/api-reference/text-openai-api) |
| 自定义 | `"custom"` | *(手动填写)* | 任何 OpenAI 兼容 API |

使用 MiniMax 进行数据清洗示例:
```jsonc
"make_dataset_args": {
"online_llm_clear": true,
"llm_provider": "minimax",
"llm_api_key": "your-minimax-api-key",
// base_url 和 model_name 自动填充:
// base_url → https://api.minimax.io/v1
// model_name → MiniMax-M2.7
}
```

> 国内用户可手动设置 `"base_url": "https://api.minimaxi.com/v1"`(注意多一个 i)。

## 配置参数并微调模型

- (可选)修改 `settings.jsonc` 的 `model_name_or_path` 、`template`、 `lora_target`选择本地下载好的其他模型。
Expand Down
5 changes: 4 additions & 1 deletion examples/mllm.template.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,12 @@
}
},
"online_llm_clear": false,
// llm_provider: 可选 "openai", "deepseek", "minimax", "custom"
// 设置后会自动填充 base_url 和 model_name(若未显式指定)
// "llm_provider": "minimax",
"base_url": "https://xxx/v1",
"llm_api_key": "xxxxx",
"model_name": "xxx", //建议使用参数较大的模型,例如DeepSeek-V3
"model_name": "xxx", //建议使用参数较大的模型,例如DeepSeek-V3, MiniMax-M2.7
"clean_batch_size": 10,
"vision_api": {
"enable": false, // 设置为 true 来开启此功能
Expand Down
6 changes: 5 additions & 1 deletion examples/tg.template.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,13 @@
}
},
"online_llm_clear": false,
// llm_provider: "openai", "deepseek", "minimax", or "custom"
// When set, auto-fills base_url and model_name if not explicitly provided.
// e.g. "minimax" → base_url: https://api.minimax.io/v1, model_name: MiniMax-M2.7
// "llm_provider": "minimax",
"base_url": "https://xxx/v1",
"llm_api_key": "xxxxx",
"model_name": "xxx", // Recommend using models with larger parameters, e.g. DeepSeek-V3
"model_name": "xxx", // Recommend using models with larger parameters, e.g. DeepSeek-V3, MiniMax-M2.7
"clean_batch_size": 10,
"vision_api": {
"enable": false, // Set to true to enable this feature
Expand Down
6 changes: 5 additions & 1 deletion settings.template.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,13 @@
}
},
"online_llm_clear": false,
// llm_provider: 可选 "openai", "deepseek", "minimax", "custom"
// 设置后会自动填充 base_url 和 model_name(若未显式指定)
// 例如设为 "minimax" 时,base_url 默认 https://api.minimax.io/v1,model_name 默认 MiniMax-M2.7
// "llm_provider": "minimax",
"base_url": "https://xxx/v1",
"llm_api_key": "xxxxx",
"model_name": "xxx", //建议使用参数较大的模型,例如DeepSeek-V3
"model_name": "xxx", //建议使用参数较大的模型,例如DeepSeek-V3, MiniMax-M2.7
"clean_batch_size": 50,
"vision_api": {
"enable": false, // 设置为 true 来开启此功能
Expand Down
Loading