<<AI人工智慧 PyTorch自學>> 10.3 Baichuan2 部署與分析－HCHUNGW的部落格

10.3 Baichuan2 部署與分析

baichuan作為首批開源的國產大語言模型，具備7B和13B兩個尺寸，在多個子任務上有出色表現，本節就來瞭解baichuan系列大模型。

Baichuan 簡介

Baichuan開源大模型由百川智慧研發，目前最新開源版本為Baichuan2，閉源版本為Baichuan3。

百川智慧成立於2023年4月10日，由前搜狗公司CEO王小川創立。公司以幫助大眾輕鬆、吾普惠地獲取世界知識和專業服務為使命，致力於通過語言AI的突破，構建中國最優秀的大模型底座。

Baichuan2提供7B，13B兩個尺寸，具體如下

	基座模型	對齊模型	對齊模型 4bits 量化
7B	Baichuan2-7B-Base	Baichuan2-7B-Chat	Baichuan2-7B-Chat-4bits
13B	Baichuan2-13B-Base	Baichuan2-13B-Chat	Baichuan2-13B-Chat-4bits

更多關於Baichuan的資訊，可查閱：

本地部署安裝

第一步，下載下載baichuan2代碼

git clone https://github.com/baichuan-inc/Baichuan2

Copy

第二步，下載7B-in4模型權重

git clone https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat-4bits

（也可以通過github desktop下載）

Copy

第三步，環境配置

根據Baichuan2中的 requirements.txt進行安裝，其中pytorch環境自行配置，要求 pytorch ≥ 2.x

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Copy

第四步，報錯處理

根據官方的教程，安裝了requirements.txt後報錯，通常會報錯：

init model ...

A matching Triton is not available, some optimizations will not be enabled

Traceback (most recent call last):

File "C:\Users\yts32\anaconda3\envs\pt220\lib\site-packages\xformers\__init__.py", line 55, in _is_triton_available

from xformers.triton.softmax import softmax as triton_softmax # noqa

File "C:\Users\yts32\anaconda3\envs\pt220\lib\site-packages\xformers\triton\softmax.py", line 11, in <module>

import triton

ModuleNotFoundError: No module named 'triton'

C:\Users\yts32\anaconda3\envs\pt220\lib\site-packages\bitsandbytes\cuda_setup\main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

Copy

如果是linux，可以嘗試

pip install bitsandbytes==0.41.1 -q

pip install accelerate==0.25.0 -q

參考自：https://github.com/baichuan-inc/Baichuan2/issues/52

Copy

如果是windows，可以嘗試，先下載bitsandbytes-windows版的0.41.1的安裝包，再手動安裝。原因是通過pip install bitsandbytes，只能獲得linux的，而windows的目前最高版本時0.37.x，因此需要手動下載安裝。

https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

pip install bitsandbytes-0.41.1-py3-none-win_amd64.whl

參考自：https://github.com/baichuan-inc/Baichuan2/issues/35

Copy

如果報錯：TypeError: 'NoneType' object is not subscriptable

pip install accelerate==0.25.0 -q

Copy

如果報錯：auto-gptq 0.7.1 requires accelerate>=0.26.0, but you have accelerate 0.25.0 which is incompatible.

https://github.com/AutoGPTQ/AutoGPTQ

pip install auto-gptq==0.6

Copy

第五步，配置路徑

model， model.generation_config， tokenizer三個的路徑需要配置

def init_model():

print("init model ...")

model = AutoModelForCausalLM.from_pretrained(

r"G:\04-model-weights\Baichuan2-7B-Chat-4bits",

torch_dtype=torch.float16,

device_map="auto",

trust_remote_code=True

)

model.generation_config = GenerationConfig.from_pretrained(

r"G:\04-model-weights\Baichuan2-7B-Chat-4bits"

)

tokenizer = AutoTokenizer.from_pretrained(

r"G:\04-model-weights\Baichuan2-7B-Chat-4bits",

use_fast=False,

trust_remote_code=True

)

return model, tokenizer

Copy

第六步，運行cli_demo.py

C:\Users\yts32\anaconda3\envs\chatglm\python.exe D:\github_desktop\Baichuan2\cli_demo.py

init model ...

bin C:\Users\yts32\anaconda3\envs\chatglm\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll

歡迎使用百川大模型，輸入進行對話，vim 多行輸入，clear 清空歷史，CTRL+C 中斷生成，stream 開關流式生成，exit 結束。

用戶：你好

Baichuan 2：

你好今天我能為您提供什麼幫助？

用戶：你是誰

Baichuan 2：我是百川大模型，是由百川智慧的工程師們創造的大語言模型，我可以和人類進行自然交流、解答問題、協助創作，幫助大眾輕鬆、普惠的獲得世界知識和專業服務。如果你有任何問題，可以隨時向我提問

Copy

模型結構分析

Baichuan2的模型結構可通過如下UML類圖瞭解，其他更多模型結構可以參考前兩節Qwen和ChatGLM的結構分析。

<<AI人工智慧 PyTorch自學>> 10.3 Baic

Prompt 結構分析

baichuan2的Prompt結構是經典的三角色設計，包括system, user, assistant。

在示例代碼中，並沒有給出system的預設，需要分析原始程式碼後才看到system可通過messages來維護。bichuan2中的messages等同於history的作用，用於記錄歷史對話資訊。

一個真實的messages如下：

[{'content': '你好', 'role': 'user'},

{'content': '你好今天我能為您提供什麼幫助？', 'role': 'assistant'},

{'content': '今天天氣如何', 'role': 'user'}]

Copy

特殊token處理

不同的角色之間，通常用特殊token標記，在baichun2代碼中，可通過generation_config中看到特殊token的index，但對應的text沒有顯示給出。

\.cache\huggingface\modules\transformers_modules\Baichuan2-7B-Chat-4bits\generation_utils.py

# 以下代碼是組裝歷史對話的程式碼片段，首先判斷當前角色，然後獲取角色分隔token

for message in round:

if message["role"] == "user":

round_tokens.append(model.generation_config.user_token_id)

else:

round_tokens.append(model.generation_config.assistant_token_id)

Copy

單輪推理長度限制

模型支援的上下文是4K，這裡包括輸入+輸出=4K，在單輪對話時，會對輸入長度做限制。

首先，預留2K是用於本輪對話的輸出，因此輸入的最大長度為4K-2K=2K。詳細代碼如下：

max_input_tokens = model.config.model_max_length - max_new_tokens

input_tokens = input_tokens[-max_input_tokens:] # truncate left

其中：

model.config.model_max_lengt = 4096

max_new_tokens = 2048

參考自：C:\Users\yts32\.cache\huggingface\modules\transformers_modules\Baichuan2-7B-Chat-4bits\generation_utils.py

Copy

對於輸入超過2K的情況，是會被向左截斷。

顯存與上下文長度分析

百川官方給出了字串長度與token之間的換算的比例，一般情況下Baichuan2大模型1個token約等於1.5個中文漢字。詳見產品定價：https://cq6qe6bvfr6.feishu.cn/wiki/DOxNw9t97iwL3hkPB41ctfsMnMI

通過分析發現：

在未進行對話時，顯存僅佔用5.3GB
第一次對話時，顯存立即飆升到6.4GB
前2000字元顯存消耗不高，2000之後顯存消耗激增
超過3500字元後，同樣出現了截斷（參考Qwen、ChatGLM的分析）

<<AI人工智慧 PyTorch自學>> 10.3 Baic

統計代碼如下，完整代碼cli_demo.py位於github

conversation_length = sum([len(content['content']) for content in messages])

import subprocess

import json

result = subprocess.run(['gpustat', '--json'], stdout=subprocess.PIPE)

output = result.stdout.decode()

data = json.loads(output)

used_memory = data['gpus'][0]['memory.used']

f.writelines("{}, {}\n".format(conversation_length, used_memory))

f.flush()

Copy

小結

本節對Baichuan2模型進行了本地部署安裝，並分析模型結構、prompt結構、推理上限機制、顯存分析等內容，有助於進一步理解LLM原理。

下一小節，分析Yi。

HCHUNGW

HCHUNGW的部落格

HCHUNGW 發表在痞客邦留言(0) 人氣()

HCHUNGW的部落格

破軍突破革新希望多元開放平等進步

<<AI人工智慧 PyTorch自學>> 10.3 Baichuan2 部署與分析

歷史上的今天

留言列表

站方公告

活動快報

天海旅...

我的好友

熱門文章

文章分類

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY

HCHUNGW的部落格

破軍 突破 革新 希望 多元 開放 平等 進步

<<AI人工智慧 PyTorch自學>> 10.3 Baichuan2 部署與分析

歷史上的今天

留言列表

站方公告

活動快報

天海旅...

我的好友

熱門文章

文章分類

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY

破軍突破革新希望多元開放平等進步