Stanford researchers drop SGLang: outperforms vLLM with up to 5x higher throughput
Week 4 of Coding with Intelligence
📰 News
Snorkel AI releases Mistral 7B DPO tune
Alpaca-Eval 2.0 score goes from Mistral-7B-Instruct-v0.2's 14.72 to 30.22 and with further DPO sample selection to 34.86 - ranked 2nd. The best model on the leaderboard is "gpt-4-turbo", which is also the judge of optimal responses. Interesting use of iteratively generating DPO data using Pairwise Reward Model.
FireLLaVA: the first commercially permissive OSS LLaVA model
Impressive performance too!
📦 Repos
Fast and Expressive LLM Inference with RadixAttention and SGLang
Beats vLLM on several benchmarks, from the same authors of vLLM.
nanatron: minimal LLM training library with 3D parallelism
By Hugging Face
Medusa 2 release: axolotl training and self-distillation
It increases inference speed up to 3.6x over original model
📄 Papers
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
No measurable loss in downstream task performance and perplexity at 50% prune for 175B OPT is quite impressive!
Benchmarking Large Multimodal Models against Common Corruptions
Interestingly, CogVLM seems to be doing really well. You can find that model here.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
State space models go multimodal. From the abstract “Vim achieves higher performance compared to well-established vision transformers like DeiT, while also demonstrating significantly improved computation & memory efficiency”
Learning to Filter Context for Retrieval-Augmented Generaton
WARM: On the Benefits of Weight Averaged Reward Models
They merge multiple reward models into one that's more reliable and robust. WARM efficiently captures the best of each to mitigate reward hacking.
📚 Resources
State-of-the-art Code Generation with AlphaCodium – From Prompt Engineering to Flow Engineering
The significance of this article is that it captures an underlying trend in building with LLMs: breaking down a domain specific task into specific stages and utilizing LLMs during each stage to improve over single-prompt based strategies. So called “Flow Engineering”. There is also an implementation (AGPL licensed, not permissive) and a paper.
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
Efficient fine-tuning by focusing on the most relevant samples by bootstrapping with the current abilities of the LLM to select the right subset.
Foundations of Vector Retrieval (Book)
Free on arXiv
Trending page for AI/ML Papers
Based on GitHub stars and X likes
ML Engineering by DeepSpeed/HF Transformers/PyTorch contributor (Book)
It’s free, and somewhat WIP
Want more? Follow me on Twitter! @ricklamers