代码拉取完成,页面将自动刷新
name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
在910A环境下,进行单机8卡的baichuan2大模型微调训练失败。参考文档:https://gitee.com/mindspore/mindformers/blob/dev/research/baichuan2/baichuan2.md#lora%E5%BE%AE%E8%B0%83
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU/CPU/kirin/等其他芯片
Ascend 910A
PyNative
/Graph
):Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
/mode graph
docker run -itd -u root --ipc=host --network host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /etc/localtime:/etc/localtime -v /etc/hccn.conf:/etc/hccn.conf -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /var/log/npu/:/usr/slog -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/bin/hccn_tool:/usr/bin/hccn_tool -v /mindformer-share/nj/dataset:/workspace/dataset -v /mindformer-share/nj/model:/workspace/model -v /mindformer-share/nj/training-framework:/workspace/training-framework --name mindformers_test swr.cn-central-221.ovaijisuan.com/mindformers/mindformers1.0_mindspore2.2.11:aarch_20240125 /bin/bash
bash research/run_singlenode.sh "python research/baichuan2/run_baichuan2.py --config research/baichuan2/run_baichuan2_7b_lora_910b.yaml --load_checkpoint /workspace/model/baichuan2 --auto_trans_ckpt True --use_parallel True --run_mode finetune --train_data /workspace/dataset/baichuan2/train" /workspace/training-framework/mindformers-1.0/hccl_8p_01234567_192.168.12.126.json [0,8] 8
cat output/log/rank_0/mindformer.log
模型微调训练成果
2024-05-11 17:37:35,948 - mindformers[mindformers/core/context/build_context.py:194] - INFO - full_batch will be forced to False when the parallel mode is stand_alone or data_parallel
[WARNING] HCCL_ADPT(17565,ffff82a4b1c0,python):2024-05-11-17:37:36.616.898 [mindspore/ccsrc/plugin/device/ascend/hal/hccl_adapter/hccl_adapter.cc:63] GenHcclOptions] The environment variable DEPLOY_MODE is not set. Now set to default value 0
2024-05-11 17:37:36,891 - mindformers[mindformers-1.0/research/baichuan2/run_baichuan2.py:88] - INFO - 当前工作路径:/workspace/training-framework/mindformers-1.0/
2024-05-11 17:37:36,927 - mindformers[mindformers/tools/utils.py:153] - INFO - set output path to '/workspace/training-framework/mindformers-1.0/output'
2024-05-11 17:37:36,930 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'CausalLanguageModelingTrainer', 'model_name': 'baichuan2_7b_lora'}
2024-05-11 17:37:36,931 - mindformers[mindformers/trainer/base_trainer.py:90] - INFO - Now Running Task is: text_generation, Model is: baichuan2_7b_lora
2024-05-11 17:37:36,933 - mindformers[mindformers/trainer/base_trainer.py:131] - WARNING - Input model name is not in the supported list or unspecified.
2024-05-11 17:37:36,934 - mindformers[mindformers/trainer/base_trainer.py:132] - WARNING - See the list of supported task and model name: OrderedDict([('general', OrderedDict([('common', '/workspace/training-framework/mindformers-1.0/configs/general/run_general_task.yaml')])), ('masked_image_modeling', OrderedDict([('mae_vit_base_p16', '/workspace/training-framework/mindformers-1.0/configs/mae/run_mae_vit_base_p16_224_800ep.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/mae/run_mae_vit_base_p16_224_800ep.yaml')])), ('image_classification', OrderedDict([('vit_base_p16', '/workspace/training-framework/mindformers-1.0/configs/vit/run_vit_base_p16_224_100ep.yaml'), ('swin_base_p4w7', '/workspace/training-framework/mindformers-1.0/configs/swin/run_swin_base_p4w7_224_100ep.yaml'), ('mindspore/vit_base_p16', '/workspace/training-framework/mindformers-1.0/configs/vit/run_vit_base_p16_224_100ep.yaml'), ('mindspore/swin_base_p4w7', '/workspace/training-framework/mindformers-1.0/configs/swin/run_swin_base_p4w7_224_100ep.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/vit/run_vit_base_p16_224_100ep.yaml')])), ('fill_mask', OrderedDict([('bert_base_uncased', '/workspace/training-framework/mindformers-1.0/configs/bert/run_bert_base_uncased.yaml'), ('bert_tiny_uncased', '/workspace/training-framework/mindformers-1.0/configs/bert/run_bert_tiny_uncased.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/bert/run_bert_tiny_uncased.yaml')])), ('contrastive_language_image_pretrain', OrderedDict([('clip_vit_b_32', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_32_pretrain_flickr8k.yaml'), ('blip2_stage1_vit_g', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage1_vit_g_qformer_pretrain.yaml'), ('blip2_stage2_vit_g_baichuan_7b', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage2_vit_g_baichuan_7b.yaml'), ('blip2_stage2_vit_g_llama_7b', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage2_vit_g_llama_7b.yaml'), ('mindspore/clip_vit_b_32', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_32_pretrain_flickr8k.yaml'), ('clip_vit_b_16', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_16_pretrain_flickr8k.yaml'), ('clip_vit_l_14', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_l_14_pretrain_flickr8k.yaml'), ('clip_vit_l_14@336', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_l_14@336_pretrain_flickr8k.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_32_pretrain_flickr8k.yaml')])), ('image_to_text_retrieval', OrderedDict([('blip2_stage1_evaluator', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage1_vit_g_retrieval_flickr30k.yaml')])), ('zero_shot_image_classification', OrderedDict([('clip_vit_b_32', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_32_zero_shot_image_classification_cifar100.yaml'), ('mindspore/clip_vit_b_32', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_32_zero_shot_image_classification_cifar100.yaml'), ('clip_vit_b_16', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_16_zero_shot_image_classification_cifar100.yaml'), ('clip_vit_l_14', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_l_14_zero_shot_image_classification_cifar100.yaml'), ('clip_vit_l_14@336', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_l_14@336_zero_shot_image_classification_cifar100.yaml'), ('blip2_stage1_classification', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage1_vit_g_zero_shot_image_classification_cifar100.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/clip/run_clip_vit_b_32_zero_shot_image_classification_cifar100.yaml')])), ('image_to_text_generation', OrderedDict([('itt_blip2_stage2_vit_g_baichuan_7b', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage2_vit_g_baichuan_7b_image_to_text_generation.yaml'), ('itt_blip2_stage2_vit_g_llama_7b', '/workspace/training-framework/mindformers-1.0/configs/blip2/run_blip2_stage2_vit_g_llama_7b_image_to_text_generation.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/clip/run_blip2_stage2_vit_g_llama_7b_image_to_text_generation.yaml')])), ('translation', OrderedDict([('t5_small', '/workspace/training-framework/mindformers-1.0/configs/t5/run_t5_small_on_wmt16.yaml'), ('t5_tiny', '/workspace/training-framework/mindformers-1.0/configs/t5/run_t5_tiny_on_wmt16.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/t5/run_t5_small_on_wmt16.yaml')])), ('text_classification', OrderedDict([('txtcls_bert_base_uncased', '/workspace/training-framework/mindformers-1.0/configs/txtcls/run_txtcls_bert_base_uncased.yaml'), ('txtcls_bert_base_uncased_mnli', '/workspace/training-framework/mindformers-1.0/configs/txtcls/run_txtcls_bert_base_uncased_mnli.yaml'), ('mindspore/txtcls_bert_base_uncased_mnli', '/workspace/training-framework/mindformers-1.0/configs/txtcls/run_txtcls_bert_base_uncased_mnli.yaml'), ('gpt2_txtcls', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2_txtcls.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/txtcls/run_txtcls_bert_base_uncased.yaml')])), ('token_classification', OrderedDict([('tokcls_bert_base_chinese', '/workspace/training-framework/mindformers-1.0/configs/tokcls/run_tokcls_bert_base_chinese.yaml'), ('tokcls_bert_base_chinese_cluener', '/workspace/training-framework/mindformers-1.0/configs/tokcls/run_tokcls_bert_base_chinese_cluener.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/tokcls/run_tokcls_bert_base_chinese.yaml')])), ('question_answering', OrderedDict([('qa_bert_base_uncased', '/workspace/training-framework/mindformers-1.0/configs/qa/run_qa_bert_base_uncased.yaml'), ('qa_bert_base_uncased_squad', '/workspace/training-framework/mindformers-1.0/configs/qa/run_qa_bert_base_uncased.yaml'), ('mindspore/qa_bert_base_uncased', '/workspace/training-framework/mindformers-1.0/configs/qa/run_qa_bert_base_uncased.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/qa/run_qa_bert_base_uncased.yaml')])), ('text_generation', OrderedDict([('gpt2', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2.yaml'), ('gpt2_lora', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2_lora.yaml'), ('gpt2_13b', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2_13b.yaml'), ('gpt2_52b', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2_52b.yaml'), ('gpt2_xl', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2_xl.yaml'), ('gpt2_xl_lora', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2_xl_lora.yaml'), ('llama_7b', '/workspace/training-framework/mindformers-1.0/configs/llama/run_llama_7b.yaml'), ('llama_13b', '/workspace/training-framework/mindformers-1.0/configs/llama/run_llama_13b.yaml'), ('llama_65b', '/workspace/training-framework/mindformers-1.0/configs/llama/run_llama_65b.yaml'), ('llama2_7b', '/workspace/training-framework/mindformers-1.0/configs/llama2/run_llama2_7b.yaml'), ('llama2_13b', '/workspace/training-framework/mindformers-1.0/configs/llama2/run_llama2_13b.yaml'), ('llama2_70b', '/workspace/training-framework/mindformers-1.0/configs/llama2/run_llama2_70b.yaml'), ('codellama_34b', '/workspace/training-framework/mindformers-1.0/configs/codellama/run_codellama_34b_910b.yaml'), ('llama_7b_lora', '/workspace/training-framework/mindformers-1.0/configs/llama/run_llama_7b_lora.yaml'), ('pangualpha_2_6b', '/workspace/training-framework/mindformers-1.0/configs/pangualpha/run_pangualpha_2_6b.yaml'), ('pangualpha_13b', '/workspace/training-framework/mindformers-1.0/configs/pangualpha/run_pangualpha_13b.yaml'), ('glm_6b', '/workspace/training-framework/mindformers-1.0/configs/glm/run_glm_6b_finetune.yaml'), ('glm_6b_chat', '/workspace/training-framework/mindformers-1.0/configs/glm/run_glm_6b_infer.yaml'), ('glm_6b_lora', '/workspace/training-framework/mindformers-1.0/configs/glm/run_glm_6b_lora.yaml'), ('glm_6b_lora_chat', '/workspace/training-framework/mindformers-1.0/configs/glm/run_glm_6b_lora_infer.yaml'), ('glm2_6b', '/workspace/training-framework/mindformers-1.0/configs/glm2/run_glm2_6b.yaml'), ('glm2_6b_lora', '/workspace/training-framework/mindformers-1.0/configs/glm2/run_glm2_6b_lora.yaml'), ('glm2_6b_ptuning2', '/workspace/training-framework/mindformers-1.0/configs/glm2/run_glm2_6b_ptuning2.yaml'), ('glm3_6b', '/workspace/training-framework/mindformers-1.0/configs/glm3/run_glm3_6b.yaml'), ('codegeex2_6b', '/workspace/training-framework/mindformers-1.0/configs/codegeex2/run_codegeex2_6b.yaml'), ('bloom_560m', '/workspace/training-framework/mindformers-1.0/configs/bloom/run_bloom_560m.yaml'), ('bloom_7.1b', '/workspace/training-framework/mindformers-1.0/configs/bloom/run_bloom_7.1b.yaml'), ('bloom_65b', '/workspace/training-framework/mindformers-1.0/configs/bloom/run_bloom_65b.yaml'), ('bloom_176b', '/workspace/training-framework/mindformers-1.0/configs/bloom/run_bloom_176b.yaml'), ('baichuan_7b', '/workspace/training-framework/mindformers-1.0/research/baichuan/run_baichuan_7b.yaml'), ('baichuan2_7b', '/workspace/training-framework/mindformers-1.0/research/baichuan2/run_baichuan2_7b.yaml'), ('baichuan2_13b', '/workspace/training-framework/mindformers-1.0/research/baichuan2/run_baichuan2_13b.yaml'), ('ziya_13b', '/workspace/training-framework/mindformers-1.0/research/ziya/run_ziya_13b.yaml'), ('skywork_13b', '/workspace/training-framework/mindformers-1.0/research/skywork/run_skywork_13b.yaml'), ('internlm_7b', '/workspace/training-framework/mindformers-1.0/research/internlm/run_internlm_7b.yaml'), ('internlm_7b_lora', '/workspace/training-framework/mindformers-1.0/research/internlm/run_internlm_7b_lora.yaml'), ('qwen_7b', '/workspace/training-framework/mindformers-1.0/research/qwen/run_qwen_7b.yaml'), ('qwen_7b_lora', '/workspace/training-framework/mindformers-1.0/research/qwen/run_qwen_7b_lora.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2.yaml')])), ('segment_anything', OrderedDict([('sam_vit_b', '/workspace/training-framework/mindformers-1.0/configs/sam/run_sam_vit-b.yaml'), ('sam_vit_l', '/workspace/training-framework/mindformers-1.0/configs/sam/run_sam_vit-l.yaml'), ('sam_vit_h', '/workspace/training-framework/mindformers-1.0/configs/sam/run_sam_vit-h.yaml'), ('common', '/workspace/training-framework/mindformers-1.0/configs/sam/run_sam_vit-h.yaml')]))])
2024-05-11 17:37:36,937 - mindformers[mindformers/trainer/base_trainer.py:133] - WARNING - The default model config: /workspace/training-framework/mindformers-1.0/configs/gpt2/run_gpt2.yaml will now be used for the text_generation task
2024-05-11 17:37:36,939 - mindformers[mindformers/core/parallel_config.py:45] - INFO - initial recompute_config from dict: {'recompute': True, 'select_recompute': False, 'parallel_optimizer_comm_recompute': False, 'mp_comm_recompute': True, 'recompute_slice_activation': True}
2024-05-11 17:37:36,941 - mindformers[mindformers/core/parallel_config.py:51] - INFO - initial parallel_config from dict: {'data_parallel': 8, 'model_parallel': 1, 'pipeline_stage': 1, 'micro_batch_num': 1, 'vocab_emb_dp': True, 'gradient_aggregation_group': 4}
2024-05-11 17:37:36,943 - mindformers[mindformers/trainer/base_trainer.py:233] - INFO - The current parallel mode is data_parallel, batch size per card will not be changed: batch_size_per_card = 2
2024-05-11 17:37:36,944 - mindformers[mindformers/trainer/base_trainer.py:237] - INFO - global_batch_size = batch_size_per_card * device_num * gradient_accumulation_steps = 16 = 2 * 8 * 1
2024-05-11 17:37:36,946 - mindformers[mindformers/trainer/base_trainer.py:246] - INFO - parallel_config will be change to default config: [ParallelConfig]
_recompute:[ParallelConfig]
_recompute:True
_select_recompute:False
_parallel_optimizer_comm_recompute:False
_mp_comm_recompute:True
_recompute_slice_activation:True
select_recompute:False
use_seq_parallel:False
_gradient_aggregation_group:4
_embed_dp_mp_config:[ParallelConfig]
_dp_mp_config:[ParallelConfig]
_data_parallel:1
_model_parallel:1
use_seq_parallel:False
select_recompute:False
_vocab_emb_dp:True
use_seq_parallel:False
select_recompute:False
_pp_config:[ParallelConfig]
_pipeline_stage:1
_micro_batch_num:1
_moe_config:[ParallelConfig]
_dpmp:[ParallelConfig]
_data_parallel:1
_model_parallel:1
use_seq_parallel:False
select_recompute:False
_expert_parallel:1
use_seq_parallel:False
select_recompute:False
.
2024-05-11 17:37:36,951 - mindformers[mindformers/trainer/base_trainer.py:629] - INFO - .........Build Dataset For Train..........
2024-05-11 17:37:36,953 - mindformers[mindformers/trainer/base_trainer.py:353] - INFO - .........Build Dataset From Config..........
2024-05-11 17:37:36,955 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'CausalLanguageModelDataset', 'dataset_config': {'data_loader': {'type': 'MindDataset', 'dataset_dir': '/workspace/dataset/baichuan2/train', 'shuffle': True}, 'tokenizer': {'type': 'Baichuan2Tokenizer', 'vocab_file': '../../model/baichuan2/tokenizer.model'}, 'input_columns': ['input_ids', 'labels'], 'num_parallel_workers': 8, 'python_multiprocessing': False, 'drop_remainder': True, 'repeat': 1, 'numa_enable': False, 'prefetch_size': 1, 'do_eval': False, 'seed': 0, 'auto_tune': False, 'filepath_prefix': './autotune', 'autotune_per_step': 10, 'profile': False, 'batch_size': 2}}
2024-05-11 17:37:36,957 - mindformers[mindformers/dataset/causal_language_model_dataset.py:166] - INFO - Now Create Causal Language Model Dataset.
2024-05-11 17:37:36,960 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'MindDataset', 'shuffle': True}
2024-05-11 17:37:36,969 - mindformers[mindformers/trainer/utils.py:149] - INFO - Will be Training epochs:1, sink_size:4
2024-05-11 17:37:36,971 - mindformers[mindformers/trainer/utils.py:151] - INFO - Create training dataset finish, dataset size:625
2024-05-11 17:37:36,973 - mindformers[mindformers/tools/check_rules.py:122] - WARNING - full_batch could only be used under semi_auto_parallel or auto_parallel, but get data_parallel, full_batch has been forced to False
2024-05-11 17:37:36,975 - mindformers[mindformers/trainer/base_trainer.py:661] - INFO - .........Build Net For Train..........
2024-05-11 17:37:36,977 - mindformers[mindformers/trainer/base_trainer.py:388] - INFO - .........Build Network From Config..........
2024-05-11 17:37:36,978 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'LlamaConfig', 'batch_size': 1, 'seq_length': 512, 'hidden_size': 4096, 'num_layers': 32, 'num_heads': 32, 'vocab_size': 125696, 'multiple_of': 256, 'rms_norm_eps': 1e-06, 'bos_token_id': 1, 'eos_token_id': 2, 'pad_token_id': 0, 'ignore_token_id': -100, 'user_token_id': 195, 'assistant_token_id': 196, 'compute_dtype': 'float16', 'layernorm_compute_type': 'float32', 'softmax_compute_type': 'float32', 'rotary_dtype': 'float32', 'param_init_type': 'float16', 'use_past': False, 'compute_in_2d': True, 'use_flash_attention': False, 'offset': 0, 'checkpoint_name_or_path': None, 'repetition_penalty': 1.05, 'temperature': 1.0, 'max_decode_length': 512, 'top_k': 5, 'top_p': 0.85, 'do_sample': False, 'pet_config': {'pet_type': 'lora', 'lora_rank': 8, 'lora_alpha': 32, 'lora_dropout': 0.1, 'target_modules': '.*query_key_value*'}}
2024-05-11 17:37:36,980 - mindformers[mindformers/models/llama/llama_config.py:184] - WARNING - Argument `compute_in_2d` is deprecated.
2024-05-11 17:37:36,981 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'Baichuan7BV2ForCausalLM'}
2024-05-11 17:37:36,983 - mindformers[mindformers/version_control.py:60] - INFO - The Cell Reuse compilation acceleration feature is not supported when the environment variable ENABLE_CELL_REUSE is 0 or MindSpore version is earlier than 2.1.0 or stand_alone mode or pipeline_stages <= 1
2024-05-11 17:37:36,985 - mindformers[mindformers/version_control.py:64] - INFO -
The current ENABLE_CELL_REUSE=0, please set the environment variable as follows:
export ENABLE_CELL_REUSE=1 to enable the Cell Reuse compilation acceleration feature.
2024-05-11 17:37:36,986 - mindformers[mindformers/version_control.py:73] - INFO - The Cell Reuse compilation acceleration feature only works in pipeline parallel mode(pipeline_stage>1).Current pipeline stage=1, the feature is disabled by default.
[WARNING] ME(17565:281472873574848,MainProcess):2024-05-11-17:37:36.997.741 [mindspore/ops/primitive.py:228] The in_strategy of the operator in your network will not take effect in data_parallel mode. This means the the shard function called in the network is ignored.
If you want to enable it, please use semi auto or auto parallel mode by context.set_auto_parallel_context(parallel_mode=ParallelMode.SEMI_AUTO_PARALLEL or context.set_auto_parallel_context(parallel_mode=ParallelMode.AUTO_PARALLEL)
2024-05-11 17:37:45,608 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
[WARNING] ME(17565:281472873574848,MainProcess):2024-05-11-17:37:45.613.373 [mindspore/common/parameter.py:786] This interface may be deleted in the future.
2024-05-11 17:37:48,173 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:37:50,832 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:37:53,419 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:37:55,974 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:37:58,651 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:38:01,326 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:38:04,069 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:38:06,745 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:38:09,424 - mindformers[mindformers/modules/layers.py:554] - WARNING - The user passed the custom defined activation function True. If the user want to enable shard for the activation cell, the user should set the shard for each primitives in the cell.
2024-05-11 17:39:15,593 - mindformers[mindformers/models/base_model.py:117] - INFO - model built, but weights is unloaded, since the config has no checkpoint_name_or_path attribute or checkpoint_name_or_path is None.
2024-05-11 17:39:15,611 - mindformers[mindformers/models/base_model.py:117] - INFO - model built, but weights is unloaded, since the config has no checkpoint_name_or_path attribute or checkpoint_name_or_path is None.
[INFO] 2024-05-11 17:39:15,616 [17565] [SDK] : Start to freeze model for delta, mode: lora, include list: None, exclude list: None
[INFO] 2024-05-11 17:39:15,616 [17565] [SDK] : Start to freeze model, include list: ['*'], exclude list: ['*mindpet_delta_lora*']
[INFO] 2024-05-11 17:39:15,625 [17565] [SDK] : End to freeze model.
[INFO] 2024-05-11 17:39:15,625 [17565] [SDK] : End to freeze model for delta.
2024-05-11 17:39:15,633 - mindformers[mindformers/trainer/base_trainer.py:540] - INFO - Network Parameters: 0 M.
2024-05-11 17:39:15,635 - mindformers[mindformers/trainer/base_trainer.py:686] - INFO - .........Build Optimizer For Train..........
2024-05-11 17:39:15,637 - mindformers[mindformers/trainer/base_trainer.py:435] - INFO - .........Build Optimizer From Config..........
2024-05-11 17:39:15,639 - mindformers[mindformers/trainer/base_trainer.py:469] - INFO - .........Build LR Schedule From Config..........
2024-05-11 17:39:15,640 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'CosineWithWarmUpLR', 'learning_rate': 5e-05, 'lr_end': 2e-06, 'total_steps': 624, 'warmup_steps': 0}
2024-05-11 17:39:15,646 - mindformers[mindformers/trainer/optimizer_grouped_parameters.py:74] - WARNING - dynamic_lr_schedule will be reset and invalid when layer_scale is False.
2024-05-11 17:39:15,650 - mindformers[mindformers/trainer/optimizer_grouped_parameters.py:113] - INFO - Param groups = {}
2024-05-11 17:39:15,652 - mindformers[mindformers/trainer/base_trainer.py:451] - INFO - .........Build Optimizer From Config config.optimizer={'type': 'FP32StateAdamWeightDecay', 'beta1': 0.9, 'beta2': 0.98, 'eps': 1e-08}..........
2024-05-11 17:39:15,653 - mindformers[mindformers/tools/register/register.py:160] - INFO - get_instance_from_cfg.cfg={'type': 'FP32StateAdamWeightDecay', 'beta1': 0.9, 'beta2': 0.98, 'eps': 1e-08}
2024-05-11 17:39:15,659 - mindformers[mindformers/tools/cloud_adapter/cloud_monitor.py:43] - ERROR - Traceback (most recent call last):
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/register/register.py", line 193, in get_instance_from_cfg
return obj_cls(**args)
File "/workspace/training-framework/mindformers-1.0/mindformers/core/optim/optim.py", line 437, in __init__
super(nn.AdamWeightDecay, self).__init__(learning_rate, params, weight_decay)
File "/root/miniconda3/envs/mindspore2.2.11_py39/lib/python3.9/site-packages/mindspore/nn/optim/optimizer.py", line 199, in __init__
parameters = self._parameters_base_check(parameters, "parameters")
File "/root/miniconda3/envs/mindspore2.2.11_py39/lib/python3.9/site-packages/mindspore/nn/optim/optimizer.py", line 400, in _parameters_base_check
raise ValueError(f"For 'Optimizer', the argument {param_info} must not be empty.")
ValueError: For 'Optimizer', the argument parameters must not be empty.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/cloud_adapter/cloud_monitor.py", line 34, in wrapper
result = run_func(*args, **kwargs)
File "/workspace/training-framework/mindformers-1.0/research/baichuan2/run_baichuan2.py", line 279, in main
trainer.finetune(finetune_checkpoint=ckpt, auto_trans_ckpt=config.auto_trans_ckpt, resume_training=resume)
File "/root/miniconda3/envs/mindspore2.2.11_py39/lib/python3.9/site-packages/mindspore/_checkparam.py", line 1313, in wrapper
return func(*args, **kwargs)
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/trainer.py", line 485, in finetune
self.trainer.train(
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/causal_language_modeling/causal_language_modeling.py", line 99, in train
self.training_process(
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/base_trainer.py", line 688, in training_process
optimizer = self.create_optimizer_scheduler(network, layer_scale=config.layer_scale)
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/base_trainer.py", line 452, in create_optimizer_scheduler
self.optimizer = build_optim(
File "/workspace/training-framework/mindformers-1.0/mindformers/core/optim/build_optim.py", line 67, in build_optim
return MindFormerRegister.get_instance_from_cfg(
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/register/register.py", line 195, in get_instance_from_cfg
raise type(e)('{}: {}'.format(obj_cls.__name__, e))
ValueError: FP32StateAdamWeightDecay: For 'Optimizer', the argument parameters must not be empty.
Traceback (most recent call last):
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/register/register.py", line 193, in get_instance_from_cfg
return obj_cls(**args)
File "/workspace/training-framework/mindformers-1.0/mindformers/core/optim/optim.py", line 437, in __init__
super(nn.AdamWeightDecay, self).__init__(learning_rate, params, weight_decay)
File "/root/miniconda3/envs/mindspore2.2.11_py39/lib/python3.9/site-packages/mindspore/nn/optim/optimizer.py", line 199, in __init__
parameters = self._parameters_base_check(parameters, "parameters")
File "/root/miniconda3/envs/mindspore2.2.11_py39/lib/python3.9/site-packages/mindspore/nn/optim/optimizer.py", line 400, in _parameters_base_check
raise ValueError(f"For 'Optimizer', the argument {param_info} must not be empty.")
ValueError: For 'Optimizer', the argument parameters must not be empty.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/training-framework/mindformers-1.0/research/baichuan2/run_baichuan2.py", line 347, in <module>
main(task=args.task,
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/cloud_adapter/cloud_monitor.py", line 44, in wrapper
raise exc
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/cloud_adapter/cloud_monitor.py", line 34, in wrapper
result = run_func(*args, **kwargs)
File "/workspace/training-framework/mindformers-1.0/research/baichuan2/run_baichuan2.py", line 279, in main
trainer.finetune(finetune_checkpoint=ckpt, auto_trans_ckpt=config.auto_trans_ckpt, resume_training=resume)
File "/root/miniconda3/envs/mindspore2.2.11_py39/lib/python3.9/site-packages/mindspore/_checkparam.py", line 1313, in wrapper
return func(*args, **kwargs)
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/trainer.py", line 485, in finetune
self.trainer.train(
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/causal_language_modeling/causal_language_modeling.py", line 99, in train
self.training_process(
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/base_trainer.py", line 688, in training_process
optimizer = self.create_optimizer_scheduler(network, layer_scale=config.layer_scale)
File "/workspace/training-framework/mindformers-1.0/mindformers/trainer/base_trainer.py", line 452, in create_optimizer_scheduler
self.optimizer = build_optim(
File "/workspace/training-framework/mindformers-1.0/mindformers/core/optim/build_optim.py", line 67, in build_optim
return MindFormerRegister.get_instance_from_cfg(
File "/workspace/training-framework/mindformers-1.0/mindformers/tools/register/register.py", line 195, in get_instance_from_cfg
raise type(e)('{}: {}'.format(obj_cls.__name__, e))
ValueError: FP32StateAdamWeightDecay: For 'Optimizer', the argument parameters must not be empty.
Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli @Shawny
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
您好,建议移步mindformers issue获取更多支持:https://gitee.com/mindspore/mindformers/issues
您好,由于问题单没有回复,我们后续会关闭,如您仍有疑问,可以反馈下具体信息,并将ISSUE状态修改为WIP,我们这边会进一步跟踪,谢谢
登录 后才可以发表评论