[ad_1]
Following Hugging Face’s Zephyr recipe
Discovering good coaching hyperparameters for brand spanking new LLMs is all the time troublesome and time-consuming. With Zephyr Gemma 7B, Hugging Face appears to have discovered a superb recipe for fine-tuning Gemma. They used a mixture of distilled supervised fine-tuning and DPO just like what they did for his or her unique Zephyr based mostly on Mistral 7B. Nonetheless, coaching Gemma with DPO on client {hardware} is difficult as a result of its reminiscence consumption.
On this article, I first evaluation the recipe utilized by Hugging Face to coach Zephyr Gemma 7B. Then, I present tips on how to use this recipe with Unsloth, a framework implementing varied optimizations for quick and memory-efficient coaching. The strategy offered on this article has a peak reminiscence consumption of 19 GB of VRAM and a complete coaching time of solely 8 hours. In different phrases, DPO coaching for Gemma is feasible on client {hardware}.
Supervised Tremendous-tuning (SFT)
DPO should use for reference a mannequin educated with supervised fine-tuning (SFT) on an instruction dataset. Hugging Face additionally launched this SFT mannequin:
[ad_2]
Supply hyperlink