🐦 Follow me on X • 🤗 Hugging Face • 💻 Blog • 📙 Hands-on GNN
The LLM course is divided into three parts:
For an interactive version of this course, I created two LLM assistants that will answer questions and test your knowledge in a personalized way:
A list of notebooks and articles related to large language models.
Notebook | Description | Notebook |
---|---|---|
🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | |
🥱 LazyMergekit | Easily merge models using MergeKit in one click. | |
🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | |
⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | |
🌳 Model Family Tree | Visualize the family tree of merged models. | |
🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. |
Notebook | Description | Article | Notebook |
---|---|---|---|
Fine-tune Llama 2 with SFT | Step-by-step guide to supervised fine-tune Llama 2 in Google Colab. | Article | |
Fine-tune CodeLlama using Axolotl | End-to-end guide to the state-of-the-art tool for fine-tuning. | Article | |
Fine-tune Mistral-7b with SFT | Supervised fine-tune Mistral-7b in a free-tier Google Colab with TRL. | Article | |
Fine-tune Mistral-7b with DPO | Boost the performance of supervised fine-tuned models with DPO. | Article | |
Fine-tune Llama 3 with ORPO | Cheaper and faster fine-tuning in a single stage with ORPO. | Article |
Notebook | Description | Article | Notebook |
---|---|---|---|
1. Introduction to Quantization | Large language model optimization using 8-bit quantization. | Article | |
2. 4-bit Quantization using GPTQ | Quantize your own open-source LLMs to run them on consumer hardware. | Article | |
3. Quantization with GGUF and llama.cpp | Quantize Llama 2 models with llama.cpp and upload GGUF versions to the HF Hub. | Article | |
4. ExLlamaV2: The Fastest Library to Run LLMs | Quantize and run EXL2 models and upload them to the HF Hub. | Article |
Notebook | Description | Article | Notebook |
---|---|---|---|
Decoding Strategies in Large Language Models | A guide to text generation from beam search to nucleus sampling | Article | |
Improve ChatGPT with Knowledge Graphs | Augment ChatGPT's answers with knowledge graphs. | Article | |
Merge LLMs with MergeKit | Create your own models easily, no GPU required! | Article | |
Create MoEs with MergeKit | Combine multiple experts into a single frankenMoE | Article |
This section introduces essential knowledge about mathematics, Python, and neural networks. You might not want to start here, but refer to it as needed.
Before mastering machine learning, it is important to understand the fundamental mathematical concepts that power these algorithms.
📚 Resources:
Python is a powerful and flexible programming language that's particularly good for machine learning, thanks to its readability, consistency, and robust ecosystem of data science libraries.
📚 Resources:
Neural networks are a fundamental part of many machine learning models, particularly in the realm of deep learning. To utilize them effectively, a comprehensive understanding of their design and mechanics is essential.
📚 Resources:
NLP is a fascinating branch of artificial intelligence that bridges the gap between human language and machine understanding. From simple text processing to understanding linguistic nuances, NLP plays a crucial role in many applications like translation, sentiment analysis, chatbots, and much more.
📚 Resources:
This section of the course focuses on learning how to build the best possible LLMs using the latest techniques.
While an in-depth knowledge about the Transformer architecture is not required, it is important to have a good understanding of its inputs (tokens) and outputs (logits). The vanilla attention mechanism is another crucial component to master, as improved versions of it are introduced later on.
📚 References:
While it's easy to find raw data from Wikipedia and other websites, it's difficult to collect pairs of instructions and answers in the wild. Like in traditional machine learning, the quality of the dataset will directly influence the quality of the model, which is why it might be the most important component in the fine-tuning process.
📚 References:
Pre-training is a very long and costly process, which is why this is not the focus of this course. It's good to have some level of understanding of what happens during pre-training, but hands-on experience is not required.
📚 References:
Pre-trained models are only trained on a next-token prediction task, which is why they're not helpful assistants. SFT allows you to tweak them to respond to instructions. Moreover, it allows you to fine-tune your model on any data (private, not seen by GPT-4, etc.) and use it without having to pay for an API like OpenAI's.
📚 References:
After supervised fine-tuning, RLHF is a step used to align the LLM's answers with human expectations. The idea is to learn preferences from human (or artificial) feedback, which can be used to reduce biases, censor models, or make them act in a more useful way. It is more complex than SFT and often seen as optional.
📚 References:
Evaluating LLMs is an undervalued part of the pipeline, which is time-consuming and moderately reliable. Your downstream task should dictate what you want to evaluate, but always remember Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."
📚 References:
Quantization is the process of converting the weights (and activations) of a model using a lower precision. For example, weights stored using 16 bits can be converted into a 4-bit representation. This technique has become increasingly important to reduce the computational and memory costs associated with LLMs.
📚 References:
📚 References:
This section of the course focuses on learning how to build LLM-powered applications that can be used in production, with a focus on augmenting models and deploying them.
Running LLMs can be difficult due to high hardware requirements. Depending on your use case, you might want to simply consume a model through an API (like GPT-4) or run it locally. In any case, additional prompting and guidance techniques can improve and constrain the output for your applications.
📚 References:
Creating a vector storage is the first step to build a Retrieval Augmented Generation (RAG) pipeline. Documents are loaded, split, and relevant chunks are used to produce vector representations (embeddings) that are stored for future use during inference.
📚 References:
With RAG, LLMs retrieves contextual documents from a database to improve the accuracy of their answers. RAG is a popular way of augmenting the model's knowledge without any fine-tuning.
📚 References:
Real-life applications can require complex pipelines, including SQL or graph databases, as well as automatically selecting relevant tools and APIs. These advanced techniques can improve a baseline solution and provide additional features.
📚 References:
Text generation is a costly process that requires expensive hardware. In addition to quantization, various techniques have been proposed to maximize throughput and reduce inference costs.
📚 References:
Deploying LLMs at scale is an engineering feat that can require multiple clusters of GPUs. In other scenarios, demos and local apps can be achieved with a much lower complexity.
📚 References:
In addition to traditional security problems associated with software, LLMs have unique weaknesses due to the way they are trained and prompted.
📚 References:
This roadmap was inspired by the excellent DevOps Roadmap from Milan Milanović and Romano Roth.
Special thanks to:
Disclaimer: I am not affiliated with any sources listed here.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。