8x faster inference with Flash Decoding - OS models are getting faster
Week 42 of Coding with Intelligence
📰 News
Together AI: Flash-Decoding for long-context inference
8x inference speedup on long sequences. From the same author that brought Flash Attention, Tri Dao.
📄 Papers
RAFA: A Framework for LLM Agents with Provable Sample Efficiency
BitNet: Scaling 1-bit Transformers for Large Language Models
🛠️ Products
📚 Resources
Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments
Fireside Chat with Ilya Sutskever and Jensen Huang: AI Today and Vision of the Future
Want more? Follow me on Twitter! @ricklamers