Stanford researchers drop SGLang: outperforms vLLM with up to 5x higher throughput
Week 4 of Coding with Intelligence
Alpaca-Eval 2.0 score goes from Mistral-7B-Instruct-v0.2's 14.72 to 30.22 and with further DPO sample selection to 34.86 - ranked 2nd. The best model on the leaderboard is "gpt-4-turbo", which is also the judge of optimal responses. Interesting use of iteratively generating DPO data using Pairwise Reward Model.
Impressive performance too!
Beats vLLM on several benchmarks, from the same authors of vLLM.
By Hugging Face
It increases inference speed up to 3.6x over original model
No measurable loss in downstream task performance and perplexity at 50% prune for 175B OPT is quite impressive!
Interestingly, CogVLM seems to be doing really well. You can find that model here.
State space models go multimodal. From the abstract “Vim achieves higher performance compared to well-established vision transformers like DeiT, while also demonstrating significantly improved computation & memory efficiency”
They merge multiple reward models into one that's more reliable and robust. WARM efficiently captures the best of each to mitigate reward hacking.
The significance of this article is that it captures an underlying trend in building with LLMs: breaking down a domain specific task into specific stages and utilizing LLMs during each stage to improve over single-prompt based strategies. So called “Flow Engineering”. There is also an implementation (AGPL licensed, not permissive) and a paper.
Efficient fine-tuning by focusing on the most relevant samples by bootstrapping with the current abilities of the LLM to select the right subset.
Free on arXiv
Based on GitHub stars and X likes
It’s free, and somewhat WIP
Want more? Follow me on Twitter! @ricklamers