Did Qwen 1.5 72B just overtake closed-source Mistral-Medium & GPT-3.5 Turbo?

Feb 08, 2024

📰 News

Qwen 1.5 released: 0.5B, 1.8B, 4B, 7B, 14B, and 72B
Like others have iterated they worked with the ecosystem for good adoption: axolotl (fine-tuning), AutoAWQ (quantization), llama.cpp / ollama (local-inference). The biggest size (72B) seems on par with Mistral-Medium and outperforms Mixtral and GPT-3.5 Turbo.
DeepSeek releases 7B math focused model that approaches GPT-4 on math reasoning
OLMo: AI2 releases fully open 7B LLM
Interesting resource for researchers doing work on the pre-training phase of LLMs.
BAAI release M3 series embedding model
M3 stands for Multi-lingual (100+ languages), Multi-length (input length up to 8192), Multifunctional (dense, lexical, multi-vec retrieval).
vLLM gets multi-LoRA support as contributed by Punica project
Why settle for one LLM if you can have many?
Next DeepSeek Code model will be MoE
Looking forward to it!

📦 Repos

📄 Papers

Self-Discover: Large Language Models Self-Compose Reasoning Structures
DeepMind strikes again, CoT improvement with up to 32% higher performance
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
This awesome paper by Microsoft & MIT shows an increase in task performance when matrices of certain layers of the network are replaced by their low-rank approximation. Crazy result that reminds me of classical overfitting and regularization techniques in ML. Must-read for everyone that works with LLMs of which you can control the weight matrices (OSS LLMs or if you're building your own models).
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
The GitHub repo.
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Improve your data mix.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Clever tree based retrieval strategy that meaningfully impacts document Q&A performance. ICLR paper.
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Another technique to improve the pretraining data mix.
ByteDance released controllable text-2-motion model Boximator
Google and ByteDance leading in generative video. Very impressive. Content creation is changed forever.

📱 Demos

Real-time streaming avatar by HeyGen
Really cool & convincing text-to-audio with text-to-face.
Run Mistral model fine-tuned for Grammar Correction as a shortcut on macOS
Recording from yours truly to demo the Autogram project that uses Ollama for local inference

📚 Resources

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence