Did Qwen 1.5 72B just overtake closed-source Mistral-Medium & GPT-3.5 Turbo?
Week 6 of Coding with Intelligence
π° News
- Qwen 1.5 released: 0.5B, 1.8B, 4B, 7B, 14B, and 72B - Like others have iterated they worked with the ecosystem for good adoption: axolotl (fine-tuning), AutoAWQ (quantization), llama.cpp / ollama (local-inference). The biggest size (72B) seems on par with Mistral-Medium and outperforms Mixtral and GPT-3.5 Turbo. 
- DeepSeek releases 7B math focused model that approaches GPT-4 on math reasoning 
- OLMo: AI2 releases fully open 7B LLM - Interesting resource for researchers doing work on the pre-training phase of LLMs. 
- BAAI release M3 series embedding model - M3 stands for Multi-lingual (100+ languages), Multi-length (input length up to 8192), Multifunctional (dense, lexical, multi-vec retrieval). 
- vLLM gets multi-LoRA support as contributed by Punica project - Why settle for one LLM if you can have many? 
- Next DeepSeek Code model will be MoE - Looking forward to it! 
π¦ Repos
- Hugging Face open source Assistants feature in their chat - Try it here. 
- FlashInfer: a library of state of the art LLM inference and serving kernels - Itβs 3x-31x faster than vLLM kernels in certain cases. 
π Papers
- Self-Discover: Large Language Models Self-Compose Reasoning Structures - DeepMind strikes again, CoT improvement with up to 32% higher performance 
- The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction - This awesome paper by Microsoft & MIT shows an increase in task performance when matrices of certain layers of the network are replaced by their low-rank approximation. Crazy result that reminds me of classical overfitting and regularization techniques in ML. Must-read for everyone that works with LLMs of which you can control the weight matrices (OSS LLMs or if you're building your own models). 
- KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization - The GitHub repo. 
- D4: Improving LLM Pretraining via Document De-Duplication and Diversification - Improve your data mix. 
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval - Clever tree based retrieval strategy that meaningfully impacts document Q&A performance. ICLR paper. 
- DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining - Another technique to improve the pretraining data mix. 
- ByteDance released controllable text-2-motion model Boximator - Google and ByteDance leading in generative video. Very impressive. Content creation is changed forever. 
π± Demos
- Real-time streaming avatar by HeyGen - Really cool & convincing text-to-audio with text-to-face. 
- Run Mistral model fine-tuned for Grammar Correction as a shortcut on macOS - Recording from yours truly to demo the Autogram project that uses Ollama for local inference 
π Resources
- Nous Research releases OpenHermes-2.5: 15 datasets cleaned & merged into one (filterable) - A great resource if you're experimenting with fine-tuning! 
- NPHardEval Leaderboard: measuring Reasoning Abilities of LLMs through Complexity Classes - Interesting new approach to measuring reasoning ability. 
- Aligning LLMs using a constitution: Constitutional AI - The big thing: preference data is no longer needed! 
Want more? Follow me on Twitter! @ricklamers

