Hugging Face Blog其他AI

Beyond LoRA: Can you beat the most popular fine-tuning technique?

2026年6月18日 00:00

重點摘要

站內 AI 整理稿

Back to Articles Beyond LoRA: Can you beat the most popular fine-tuning technique? Published June 18, 2026 Update on GitHub Upvote 4 Benjamin Bossan BenjaminB Follow Sayak Paul sayakpaul Follow Marian hubnemo Follow Kashif Rasul kashif Follow When you plan to fine-tune a model in a parameter-efficient way, think beyond LoRA If you want to fine-tune an open model on your own data, you are probably interested in so-called parameter-efficient fine-tuning, in short PEFT. This term describes techniques that significantly reduce the memory requirement to fine-tune a model. Although there are dozens of these techniques, almost everyone chooses one called “LoRA”. In this blog post, we explore whether LoRA is really the best choice, what tools are available to make an informed decision, and how you can benefit from extending your horizon beyond LoRA. What is PEFT and when do you need it There are countless open models available, but they often aren't quite good enough for your use case. Prompting may help, but it usually isn't enough. Rather than training a new model from scratch, you should consider fine-tuning an existing one. Fine-tuning, however, is memory-hungry: you generally need enough memory to fit the whole model several times over. Quantization reduces a model's memory footprint, but quantized models can't be fine-tuned directly. So a set of techniques emerged to cut the memory needed for fine-tuning, called "parameter-efficient fine-tuning", or PEFT. With PEFT, you can fine-tune a model using only a fraction of that memory and even fine-tune quantized models. It offers other advantages, such as tiny checkpoint sizes, greater resistance to catastrophic forgetting, and the ability to serve multiple fine-tunes from the same base model. At Hugging Face, we develop the PEFT library, which implements many PEFT techniques behind a unified API and integrates well with the ecosystem, for example Transformers and Diffusers. It also supports multiple quantization methods, enabling further accessibility in parameter-efficient fine-tuning. PEFT provides a good starting point, whether you want to fine-tune on your own data or you're researching a new PEFT method. LoRA: The queen of fine-tuning techniques 👑 One parameter-efficient fine-tuning technique that emerged early and proved to be quite effective is called “Low Rank Adaptation”, or short “LoRA”. It works by adding a handful of parameters on top of the base model, freezing the base model weights, and only training those few parameters. Among all PEFT techniques, LoRA is by far the most popular. Here are a few estimates: Of a sample of 20,834 model cards on Hugging Face Hub that mention exactly one PEFT technique, 20,509 mention LoRA (98.4%). We checked which PEFT techniques are popular for image generation on an external site, too. Using a sample of 10,000 checkpoints, we found 7,111 to be LoRAs. The other identified PEFT techniques are LoCon (363) and DoRA (11, arguably a LoRA variant). That means 95.0% of PEFT checkpoints are LoRAs. Searching for the code snippet from peft import <PEFT CONFIG> on GitHub (example GH query), 71.3% of results are for LoRA. The runners-up are LoHa (3.7%) and AdaLoRA (3.5%). Although these estimates are not perfect, the conclusion is nonetheless that LoRA is almost certainly by far the most common PEFT technique. This could just mean that LoRA works best for everyone, and this fact is reflected in its usage statistics. There is, however, another possibility: LoRA was one of the earlier, popular PEFT techniques. So maybe its usage became self-reinforcing: LoRA has the highest visibility, the highest number of tutorials/examples, and it has the best support in downstream packages. Thus LoRA's popularity feeds on itself. This all leads to the question: Are we all leaving performance on the table by shunning better techniques? After all, there are countless researchers whose papers claim their technique beats LoRA. Isn't that sufficient proof that we should go beyond LoRA in favor of newer techniques? Choosing the right PEFT technique based on paper results is problematic There are dozens of papers that investigate fine-tuning techniques other than LoRA. Just in the PEFT library, there are more than 40 distinct PEFT techniques at the time of writing (and numerous more when counting variations of PEFT techniques). For almost all of them, you will find researchers claiming that their technique beats LoRA according to their benchmarks. The trouble with these claims is that researchers are under pressure to provide results that beat the existing benchmark. Even without ill intent, this can bias the results, e.g. by spending less time tuning the alternative techniques compared to the one proposed by the researchers. One study found, for instance, that LoRA can match supposedly better PEFT techniques by tuning the learning rate (https://arxiv.org/abs/2602.04998). Another complication is that each paper chooses a different set of PEFT techniques to compare to, and a different set of benchmarks to run. And even if the same technique is compared on the same benchmark, the code is often not available or not easy to run yourself, which makes results hard to reproduce. Overall, it's difficult to figure out the PEFT technique that works best for you by only checking paper results. Therefore, you might be tempted to just go with the default, LoRA. How we approach benchmarking in PEFT At Hugging Face, we thought about how we can help users make informed decisions about which PEFT technique to use. With the PEFT library, we already provide a package that implements many PEFT techniques and exposes them with the same API. The next step is to provide benchmarks that can shed more light on the discussed issue. We already had a benchmark that checks fine-tuning of LLMs on a math dataset for some time. This benchmark takes an LLM and fine-tunes it on chain-of-thought reasoning to produce the result to a mathematical question using a base model that is not instruction fine-tuned. The benchmark thus checks if the model can learn to perform mathematical reasoning and also to adjust the generated output to the expected format. To extend our findings on another modality, we also added an image generation benchmark. This one tests whether the model can be fine-tuned to learn a new concept, a cat plushy, and generate it in new contexts without forgetting existing concepts. Left: Sample question and answer from the MetaMathQA dataset. Right: Sample image from the cat plushy dataset. All PEFT techniques are evaluated according to the exact same conditions: same base model, same dataset, same training and evaluation code, same hardware. As different users have different needs, we track more than just test performance. Besides VRAM usage, we track metrics like forgetting/drift, runtime, and checkpoint size. The results are designed to run on consumer hardware, and adding a new experiment only requires adding a new PEFT config and running a script. Since we compare all PEFT techniques on equal footing and have no horse in the race, we believe that these benchmarks can draw an objective picture of how well different PEFT techniques work. We argue that if you have your own dataset, you can take a similar approach and take advantage of the PEFT library to evaluate multiple PEFT techniques. Our findings: LoRA works well but is not necessarily the best choice After finishing the benchmark runs, we found that although LoRA works well, other PEFT methods can beat it on one or multiple axes and should thus be considered. Check the image below that compares the performance of LoRA and five other PEFT techniques. Some results from the benchmark. When it comes to test performance and memory usage, LoRA is not necessarily the best choice. Left: MetaMathQA benchmark; right: image generation benchmark. Consult this Space for the most up-to-date results. One way to interpret the results above is to think in terms of tradeof

原始來源：Hugging Face Blog ↗

查看原始來源