Basic Configuration
nvidia-smi
- show NVIDIA GPU information and Driver version, CUDA version.
nvidia-smi
Create conda env
conda create -n llm_env python=3.10.12 -y
my env
- requirements.txt
torch==2.5.1+cu121
transformers==4.46.3
peft==0.13.2
datasets==3.2.0
numpy==1.22.2
HPC env
torch==2.3.1
transformers==4.46.3
accelerate==0.34.0
peft==0.13.2
trl==0.14.0
datasets==3.2.0
wandb==0.19.6
numpy==1.23.5
Keep GPU running!
import torch
import time
device = torch.device('cuda')
while True:
tensor3 = tensor1 @ tensor2
time.sleep(2)
print(tensor3.device)
VSCode connect to HPC
Ensure download Python REPL extension, which is used
for interactively running python code. And it is suggested that
/.vscode/settings.json
has the following content:
{
"python.defaultInterpreterPath": "/opt/miniconda3/envs/pytorch/bin/python"
}
Model
Download model and tokenizer from Huggingface
Before downloading any model from huggingface, activate huggingface user token.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'
model_download_path = '/hpc2hdd/home/tzou317/models'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', torch_dtype=torch.float32, force_download=True, cache_dir=model_download_path)
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=model_download_path, force_download=True)
torch_dtype
:torch_dtype
can be torch.bfloat16 or torch.float32.
Load model and tokenizer
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = '/hpc2hdd/home/tzou317/models/models--deepseek-ai--DeepSeek-R1-Distill-Llama-8B/snapshots/ebf7e8d03db3d86a442d22d30d499abb7ec27bea'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Make sure the files included in model_name
are like:
config.json
generation_config.json
model-00001-of-000002.safetensors
model-00002-of-000002.safetensors
model.safetensors.index.json
tokenizer.json
tokenizer_config.json
Show model structure:
print(model)
Tokenizer
from transformers import AutoTokenizer
model_name = '/hpc2hdd/home/tzou317/models/models--deepseek-ai--DeepSeek-R1-Distill-Llama-8B/snapshots/ebf7e8d03db3d86a442d22d30d499abb7ec27bea'
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
Make sure the files included in model_name
are like:
tokenizer.json
tokenizer_config.json
Tokens
Special Tokens
print(f'bos_token is: {tokenizer.bos_token}, bos_id is: {tokenizer.bos_token_id}')
print(f'eos_token is: {tokenizer.eos_token}, eos_id is: {tokenizer.eos_token_id}')
print(f'pad_token is: {tokenizer.pad_token}, pad_id is: {tokenizer.pad_token_id}')
Common operations
# find a token by token_id
mytoken = tokenizer.convert_ids_to_tokens(128011)
print(mytoken)
apply_chat_template()
tokenizer.chat_template
python code
print(tokenizer.chat_template)
变量初始化部分:
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
提取系统消息部分:
{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}
开始添加标记和系统信息:
{{bos_token}}{{ns.system_prompt}}
处理用户信息:
{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}
处理助手信息(content为空的情况):
{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}
如果助手的content
为空,则处理工具调用信息。
- 处理助手信息(content不为空的情况):
{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}
如果助手的content
不为空,则根据ns.is_tool
标记的状态处理消息。这里好像与DeepSeek-R1的</think>
标签有关。在助手信息的前后分别加上<|Assistant|>
标签和<|end▁of▁sentence|>'}
标签。
- 结束部分:
{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}
如果ns.is_tool=True
,则添加<|tool▁outputs▁end|>'}
标签;如果add_generation_prompt=True
且ns.is_tool=False
,则添加<|Assistant|>
标签。
output
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}
AutoModelForCausalLM
Model Inference
single-turn inference
model.eval()
user_input = "在中文中应该用“随心所欲的创作”还是“所心所欲地创作”"
message = [{'role': 'user', 'content': user_input}]
model_input = tokenizer.apply_chat_template(message, tokenize=True, return_tensors='pt', add_generation_prompt=True).to(model.device)
response = model.generate(
model_input,
max_new_tokens=1000,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
output = tokenizer.decode(response[0], skip_special_tokens=True)
print(output)
multi-turn inference
model.eval()
messages = []
for idx, user_input in enumerate(iter(lambda: input("请输入你的问题:"), "")):
messages.append({'role': 'user', 'content': user_input})
prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
inputs = tokenzier(prompt, return_tensors="pt", truncation=True).to("cuda" if torch.cuda.is_available() else "cpu")
generated_ids = model.generate(
inputs["input_ids"],
max_new_tokens=200,
num_return_sequences=1,
do_sample=True,
attention_mask=inputs["attention_mask"],
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
PEFT (LoRA)
LoRA fine-tune
Suppose I have \(r=16\), then the number of parameters will be reduced to \(\frac{4096\times16\times2}{4096\times4096}=0.8\%\) of the original.
\[\boldsymbol{\Delta W}_{4096\times4096}=\boldsymbol{B}_{4096\times16}\boldsymbol{A}_{16\times4096}\]
\(\boldsymbol{A}\) is initiated by \(\mathcal{N}(0, 1)\), and \(\boldsymbol{A}\) is initiated by \(\boldsymbol{0}\). The different initialization methods are a knid of balance between random search and stable training.
Code
different datasource
From csv
from peft import LoraConfig, TaskType, PeftModel, LoftQConfig, get_peft_model
from trl import DataCollatorForCompletionOnlyLM, SFTTrainer
from datasets import load_dataset
from transformers import TrainingArguments
import wandb
wandb.login(key="...")
dataset = load_dataset('csv', data_files={
'train': '../train_dataset.csv',
'validation': '../dev_dataset.csv'
})
peft_output_dir = '../fine-tune/LoRA_Layer'
def formatting_func(dataset):
formatted_texts = []
for i in range(len(dataset['prompts'])): # be care of this line
system_message = '下面是小明与小红两个人之间的对话,你需要模仿小红的讲话风格和内容然后与小明进行聊天。\n'
user_message = dataset['prompts'][i]
assistant_message = dataset['responses'][i]
message = [
{'role': 'system', 'content': system_message},
{'role': 'user', 'content': user_message},
{'role': 'assistant', 'content': assistant_message}
]
text = tokenizer.apply_chat_template(
message,
tokenize=False,
add_generation_prompt=False,
bos_token=tokenizer.bos_token,
eos_token=tokenizer.eos_token)
formatted_texts.append(text)
return formatted_texts
# print(formatting_func(dataset['train'])[0])
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=64,
lora_alpha=32,
use_rslora=True,
lora_dropout=0.1,
target_modules=[
"q_proj",
"v_proj",
"k_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head"
]
# init_lora_weights='loftq',
# loftq_config=loftq_config
)
training_arguments = SFTConfig(
output_dir=peft_output_dir,
overwrite_output_dir=True,
num_train_epochs=1,
load_best_model_at_end=False,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
evaluation_strategy="steps", eval_steps=0.15,
max_grad_norm=0.3,
auto_find_batch_size=False,
save_total_limit=3,
gradient_accumulation_steps=16,
save_steps=50,
logging_steps=10,
learning_rate=5e-5,
weight_decay=0.01,
bf16=False,
warmup_ratio=0.01,
group_by_length=True,
lr_scheduler_type="cosine",
report_to="wandb",
neftune_noise_alpha=5,
max_seq_length=3000,
packing=False
)
instruction_template = '<|User|>'
response_template = '<|Assistant|>'
collator = DataCollatorForCompletionOnlyLM(instruction_template=instruction_template, response_template=response_template, tokenizer=tokenizer)
trainer = SFTTrainer(
model=model,
peft_config=peft_config,
train_dataset=dataset['train'],
eval_dataset=dataset['validation'],
formatting_func=formatting_prompt_func,
data_collator=collator,
tokenizer=tokenizer,
args=training_arguments
)
trainer.train()
trainer.model.save_pretrained(peft_output_dir)
When adding tokenizer.add_bos_token = False
, the to be
training dataset in
train_dataloader = trainer.get_train_dataloader()
will only
have one bos_token at the begining of each text. If not
adding tokenizer.add_bos_token = False
, the dataset will
have two bos_token, and I don’t
know why. And I think the reason may belong to
SFTTrainer()
.
From jsonl
.jsonl
file format likes:
{"conversations": [{"role": "user", "content": "你好,你是谁"}, {"role": "assistant", "content": "我是由华为公司开发的大模型"}]}
{"conversations": [{"role": "user", "content": "你是什么大模型?"}, {"role": "assistant", "content": "我是由华为公司开发的大模型"}]}
{"conversations": [{"role": "user", "content": "你是由谁开发的?"}, {"role": "assistant", "content": "我是由华为公司开发的大模型"}]}
...
from datasets import load_dataset
dataset = load_dataset("json", data_files="myjsonl.jsonl", split="train") # 文本未预先被分割,暂时就默认将其整个当作训练集
dataset = dataset.rename_column("conversations", "messages")
dataset = dataset.train_test_split(test_size=0.1)
train_data = dataset["train"]
valid_data = dataset["test"]
peft_config = LoraConfig(...)
training_arguments = SFTConfig(...)
trainer = SFTTrainer(
model=model,
peft_config=peft_config,
train_dataset=train_data,
eval_dataset=valid_data,
data_collator=collator,
tokenizer=tokenizer,
args=training_arguments
)
trainer.train()
验证Tokenizer聊天模板:
sample = dataset["train"][0]["message"]
print(sample)
print(tokenizer.apply_chat_template(sample, tokenize=False))
finetune on multi-GPU
remove
device_map="auto"
inAutoModelForCausalLM.from_pretrained()
.accelerate config
accelerate launch --multi_gpu train.py
data_collator
trainer.get_train_dataloader()
import itertools
train_dataloader = trainer.get_train_dataloader()
sample = next(itertools.islice(train_dataloader, 100, 101))
# The shape of the three tensor are same: (batch_size, seq_length)
print(sample['input_ids'].shape, sample['attention_mask'].shape, sample['labels'].shape, sep='\n')
first_sample = {k: v[0] for k, v in sample.items()}
input_text = tokenizer.decode(
first_sample["input_ids"],
skip_special_tokens=False
)
print(input_text)
wandb
The key for wandb.login()
should be found in https://wandb.ai/site.
The training process will be reported to wandb.
LoRA Layer merge
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "path/to/peft/checkpoint")
model = model.merge_and_unload()
print("Lora layer merged successfully")
Adapter & Prefix Tuning
Adapter在自注意力模块之后,在残差连接之前、和在MLP模块之后,在残差连接之前添加适配器层作为可训练的参数。