Technical
The Ten Commandments of Fine-Tuning in Prod (a Mastering LLMs Conference Talk)
Kyle Corbitt
May 23, 2024
5 minutes
Hi friends, this is Kyle Corbitt, OpenPipe's CEO. I recently had the pleasure of giving a talk at Mastering LLMs Conference titled “Ten Commandments of Fine-Tuning in Prod.” I got a lot of positive feedback and had a lot of fun sharing some of the engineering best practices we’ve learned while training thousands of models and building the leading fine-tuning platform here at OpenPipe, so I’m excited to share these insights with you too.
I’ve included a video of the full talk at the end of this post, simply jump to the bottom to dive into detail! But first I’d like to whet your appetite with a quick summary of the commandments.
The Ten Commandments of Fine-Tuning in Production:
Thou Shalt Not Fine-Tune: Just use prompting! And optionally few-shot examples/RAG. Fine-tuning is expensive, slow, and complex. Only do it if your use case really requires it.
Thou Shalt Write a Freaking Prompt: Create a baseline and prove the task is possible with prompting.
Thou Shalt Review Thy Freaking Data: If you must fine-tune, make sure you understand your data thoroughly.
Thou Shalt Use Thy Actual Freaking Data: Your model will only be as good as the data it's trained on. Make sure your training data is as close as possible to the data your model will see in production.
Thou Shalt Reserve a Test Set: Always reserve a portion of your data for testing to evaluate your model's performance.
Thou Shalt Choose an Appropriate Model: The more parameters a model has, the more expensive and slower it is to train. Choose a model that is appropriate for your task and your budget.
Thou Shalt Write Fast Evals: Write evaluation metrics that are fast to compute so you can quickly iterate on your model.
Also, Thou Shalt Write Slow Evals: Write evaluation metrics that are more comprehensive and take longer to compute, to give you a deeper understanding of your model's performance.
Thou Shalt Not Fire and Forget: Don't just deploy your model and forget about it. Monitor its performance and be prepared to retrain or update it as needed.
Thou Shalt Not Take the Commandments Too Seriously: These commandments are meant to be helpful guidelines, not hard and fast rules. Use your best judgment and adapt them to your specific needs.
The Full Talk
Ready to Start Fine-Tuning in Prod?
With OpenPipe it only takes a few minutes to get started fine-tuning models that outperform SOTA LLMs. We support your ability to quickly fine-tune models across the entire model development lifecycle.
Start with seamless data collection by plugging in our SDK (100% compatible with OpenAI’s) to automatically capture real-world training data. Or simply upload your pre-existing dataset in JSONL format.
Next, our platform allows you to easily select the highest quality training set by either manually reviewing your data, using “SQL like” targeting criteria, or even using LLMs with custom criteria to select training data at scale with OpenPipe’s “Pipelines” feature.
With your training data in hand, quickly fire off fine-tune jobs across a variety of base models, then use our platform’s ability to easily evaluate the performance of your fine-tuned models against each other and SOTA models from OpenAI, Anthropic, and more.
Finally, we automatically provision an inference endpoint you can send completion requests to – it’s that easy!