Stanford researchers drop SGLang: outperforms vLLM with up to 5x higher throughput

Jan 24, 2024

📰 News

Yi drops 6B and 34B Vision Language model
Snorkel AI releases Mistral 7B DPO tune
Alpaca-Eval 2.0 score goes from Mistral-7B-Instruct-v0.2's 14.72 to 30.22 and with further DPO sample selection to 34.86 - ranked 2nd. The best model on the leaderboard is "gpt-4-turbo", which is also the judge of optimal responses. Interesting use of iteratively generating DPO data using Pairwise Reward Model.
FireLLaVA: the first commercially permissive OSS LLaVA model
Impressive performance too!
DeepMind researchers leave to start their own AI startups

📦 Repos

📄 Papers

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
No measurable loss in downstream task performance and perplexity at 50% prune for 175B OPT is quite impressive!
Benchmarking Large Multimodal Models against Common Corruptions
Interestingly, CogVLM seems to be doing really well. You can find that model here.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
State space models go multimodal. From the abstract “Vim achieves higher performance compared to well-established vision transformers like DeiT, while also demonstrating significantly improved computation & memory efficiency”
Learning to Filter Context for Retrieval-Augmented Generaton
WARM: On the Benefits of Weight Averaged Reward Models
They merge multiple reward models into one that's more reliable and robust. WARM efficiently captures the best of each to mitigate reward hacking.

📚 Resources

State-of-the-art Code Generation with AlphaCodium – From Prompt Engineering to Flow Engineering
The significance of this article is that it captures an underlying trend in building with LLMs: breaking down a domain specific task into specific stages and utilizing LLMs during each stage to improve over single-prompt based strategies. So called “Flow Engineering”. There is also an implementation (AGPL licensed, not permissive) and a paper.
Implementation of Meta’s Iterative DPO
Karpathy reflects on AGI through self-driving car analogy
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
Efficient fine-tuning by focusing on the most relevant samples by bootstrapping with the current abilities of the LLM to select the right subset.
CB Insights: Gen AI predictions for 2024
Foundations of Vector Retrieval (Book)
Free on arXiv
Trending page for AI/ML Papers
Based on GitHub stars and X likes
LoRA from Scratch by Sebastian Raschka from Lightning AI
MoE LLM from scratch on Hugging Face blog
ML Engineering by DeepSpeed/HF Transformers/PyTorch contributor (Book)
It’s free, and somewhat WIP

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence