Did Qwen 1.5 72B just overtake closed-source Mistral-Medium & GPT-3.5 Turbo?
Week 6 of Coding with Intelligence
Like others have iterated they worked with the ecosystem for good adoption: axolotl (fine-tuning), AutoAWQ (quantization), llama.cpp / ollama (local-inference). The biggest size (72B) seems on par with Mistral-Medium and outperforms Mixtral and GPT-3.5 Turbo.
Interesting resource for researchers doing work on the pre-training phase of LLMs.
M3 stands for Multi-lingual (100+ languages), Multi-length (input length up to 8192), Multifunctional (dense, lexical, multi-vec retrieval).
Why settle for one LLM if you can have many?
Looking forward to it!
Try it here.
It’s 3x-31x faster than vLLM kernels in certain cases.
DeepMind strikes again, CoT improvement with up to 32% higher performance
This awesome paper by Microsoft & MIT shows an increase in task performance when matrices of certain layers of the network are replaced by their low-rank approximation. Crazy result that reminds me of classical overfitting and regularization techniques in ML. Must-read for everyone that works with LLMs of which you can control the weight matrices (OSS LLMs or if you're building your own models).
The GitHub repo.
Improve your data mix.
Clever tree based retrieval strategy that meaningfully impacts document Q&A performance. ICLR paper.
Another technique to improve the pretraining data mix.
Google and ByteDance leading in generative video. Very impressive. Content creation is changed forever.
Really cool & convincing text-to-audio with text-to-face.
Recording from yours truly to demo the Autogram project that uses Ollama for local inference
A great resource if you're experimenting with fine-tuning!
Interesting new approach to measuring reasoning ability.
The big thing: preference data is no longer needed!
Want more? Follow me on Twitter! @ricklamers