Discover more from Coding with Intelligence
The interplay of pre-training and fine-tuning: learn what's happening
Week 44 of Coding with Intelligence
“Open Source AI/LLMs” is about more than just model drops from open labs. It’s about the interplay of academic researchers & industry participants figuring out how to cost effectively get the most performance from deep learning based models. It’s great to see the open knowledge around how to get the most performance/dollar/watt flourishing. If you’re building LLM apps, there’s no shortage of thoughtful experiments you can run to optimize inference speed, cost & accuracy.
Many of the listings in this week’s CoWI touch on this rapidly evolving landscape. I hope these curated resources make you a better AI researcher & engineer!
This could be interesting for improving OS models. I'd keep an eye on what he publishes over the next months!
Great news for OS model initiatives if true. Source is a Microsoft paper that's now retracted 👀
Just checkout the example code in the README, it's so clean! It builds on the strengths of Pydantic for validated structured outputs from LLMs.
Prompt to UI is gaining traction.
This can be particularly useful for routing strategies/systems to LLMs at different cost/speed/quality levels.
Voyager: in-memory nearest-neighbor search by Spotify
Voyager is used extensively in production at Spotify, and is queried hundreds of millions of times per day to power numerous user-facing features.
This Stanford paper introduces the idea of using a fine-tuned small model (e.g. Llama-2-chat-7B) and a pre-trained large model (e.g. Llama-2-base-70B) to get the benefits of fine-tuning without costly fine-tuning procedures on large pre-trained models. They pair the two models and use an EFT (Emulated Fine-Tuning) variant of the speculative decoding scheme to predict tokens with high throughput. The main goal of the paper is to disentangle the contribution of the pre-training and fine-tuning stage. The two stage process by which modern LLMs like Llama-2 and GPT-4 are trained. They summarize that fine-tuning generally improves helpfulness, while scaling up pre-training tends to improve factuality.
Similar to the approach proposed by DeepMind in https://arxiv.org/abs/2310.01714
Very interesting paper that gives more insight into how models are able to perform in-context learning. This is a related paper https://arxiv.org/abs/2310.15213
The paper is nicely done and shows the validation losses for familiar datasets in the RedPajama dataset for sources like StackExchange, GitHub, arXiv and Wikipedia.
Interpretability work is very exciting as success in the area will lead to more efficient architectures that improve performance on downstream tasks. This is a nice win for a technique in mechanistic interpretability called "circuit analysis". It looks at what happens inside the language model when performing a specific task, in this paper: answering multiple choice questions. Kudos DeepMind for contributing this result to the broader community!
If you're training & serving your own OS models might be worth evaluating.
By the fine folks from AI Snake Oil, a blog by two Princeton scholars.
Quite nice work by @dan_p_simpson. He narrates the mathematics nicely such that you can get a gist of the general areas of math involved in diffusion models. Good read if you're looking to get started making better diffusion models (for image creation or otherwise sampling from distributions we observe solely from available data).
This could be useful if you're fine-tuning too, finding relevant bits for synthetic dataset creation using frontier models for domain oriented transformation.
Especially his ideas about Solomonoff induction and the ability to go from sequence models to more powerful (AGI) systems is intriguing.
Want more? Follow me on Twitter! @ricklamers