Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. You can use our models without installing it.
4.32.0 is preferred.
Please check if you have updated the code to the latest, and correctly downloaded all the sharded checkpoint files.
qwen.tiktoken
is not found. What is it?This is the merge file of the tokenizer. You have to download it. Note that if you just git clone the repo without git-lfs, you cannot download this file.
Run the command pip install -r requirements.txt
. You can find the file at https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt.
Yes, see web_demo.py
for web demo and cli_demo.py
for CLI demo. See README for more information.
Yes, run python cli_demo.py --cpu-only
will load the model and inference on CPU only.
Yes. See the function chat_stream
in modeling_qwen.py
.
This is because tokens represent bytes and a single token may be a meaningless string. We have updated the default setting of our tokenizer to avoid such decoding results. Please update the code to the latest version.
Please check if you are loading Qwen-Chat instead of Qwen. Qwen is the base model without alignment, which behaves differently from the SFT/Chat model.
Yes, the quantization is supported by AutoGPTQ.
Updating the code to the latest version can help.
Please ensure that NTK is applied. use_dynamc_ntk
and use_logn_attn
in config.json
should be set to true
(true
by default).
Yes, we now support SFT, including full-parameter finetuning, LoRA, and Q-LoRA. Also you can check other projects like FastChat, Firefly, LLaMA Efficient Tuning, etc.
However, temporarily we do not support RLHF. We will provide the code in the near future.
In our training, we only use <|endoftext|>
as the separator and padding token. You can set bos_id, eos_id, and pad_id to tokenizer.eod_id. Learn more about our tokenizer from our documents about the tokenizer.
When downloading our official docker image, you may have a slow download speed due to some network issues. You can refer to Alibaba Cloud Container Image Service to accelerate the download of official images.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。