[FT]chatglm2微调

1.准备工作

显卡一张：A卡，H卡都可以，微调需要1-2张，ptunig需要一张，大概显存得30~40G吧；全量微调需要两张卡，总显存占用100G以上
环境安装：
- 尽量在虚拟环境安装：参见，https://blog.csdn.net/u010212101/article/details/103351853
- 环境安装参见：https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning

2.如何微调：

参见：https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning

2.0.训练数据格式，整理成如下格式，格式1和2都行：

格式1：

{"content":"xxx","summary":"xxx"}
{"content":"xxx","summary":"xxx"}
... ...

格式2：

[
{"content":"xxx","summary":"xxx"}
{"content":"xxx","summary":"xxx"}
... ...]

2.1.ptuning微调：

参见：https://github.com/THUDM/ChatGLM2-6B/blob/main/ptuning/train.sh

PRE_SEQ_LEN=128
LR=2e-2
NUM_GPUS=1

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py \
    --do_train \
    --train_file AdvertiseGen/train.json \
    --validation_file AdvertiseGen/dev.json \
    --preprocessing_num_workers 10 \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path THUDM/chatglm2-6b \
    --output_dir output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 128 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --predict_with_generate \
    --max_steps 3000 \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate $LR \
    --pre_seq_len $PRE_SEQ_LEN \
    --quantization_bit 4

注意点：

将模型下载到本地，可以从modelscope或者huggingface上下载，建议直接modelscope下载，huggingface下载慢/还得科学上网，麻烦。
- https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary
将上述model_name_or_path 的路径改为你本地模型路径
资源紧张，下述一些参数需要调整：
调节batchsize设置： per_device_train_batch_sizeper_device_eval_batch_sizegradient_accumulation_steps=batchsize，这三项酌情调整
训练步数：
- max_steps=5000步
max_source_length：输入的最大长度
max_target_length：输出的最大长度

2.2.全量微调

全量微调速度比较慢，建议使用deepspeed，直接pip安装即可
全量微调，几百上千条数据的情况，显存得100G以上，也即需要至少2张A800卡

deepspeed 全量微调chatglm2命令如下：

参见：https://github.com/THUDM/ChatGLM2-6B/blob/main/ptuning/ds_train_finetune.sh

LR=1e-4

MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py \
    --deepspeed deepspeed.json \
    --do_train \
    --train_file AdvertiseGen/train.json \
    --test_file AdvertiseGen/dev.json \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path THUDM/chatglm2-6b \
    --output_dir ./output/adgen-chatglm2-6b-ft-$LR \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 64 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --predict_with_generate \
    --max_steps 5000 \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate $LR \
    --fp16