This week’s newsletter is slightly delayed because of a PACKED week at AI Engineer World Fair in San Francisco. I hope you can forgive me, to compensate, I've linked to the timestamps of all my favorite talks from the conference.
And I've also included the slides of my talk (it wasn’t recorded, but plenty of people showed up)!
📰 News
Some of my personal favoritesMixtral team: https://youtu.be/R0X7mPagRiE?t=2
Rémi from Outlines: https://youtu.be/R0X7mPagRiE?t=767
Daniel from Unsloth: https://youtu.be/R0X7mPagRiE?t=10470
Model Merging by Maxime Labonne: https://youtu.be/R0X7mPagRiE?t=11715
llamafile: https://youtu.be/5zE2sMka620?t=3163
Cursor: https://youtu.be/5zE2sMka620?t=11442
Dylan Patel from SemiAnalysis: https://youtu.be/JVSKlEmUr0k?t=1893
Cognition: https://youtu.be/JVSKlEmUr0k?t=13694
moondream: https://youtu.be/vaIiNZoXymg?t=9244
Cartesia: https://youtu.be/vaIiNZoXymg?t=11695
Gemma 2 27B released + technical report
And it’s available on Kaggle
📦 Repos
ReaL: Efficient RLHF Training for LLMs with Parameter Reallocation
PPO training benchmarks on LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.
TLDR evals should be easy to run, that’s the promise of this repo. Includes MMLU, MATH, GPQA, DROP, MGSM and HumanEval.
📄 Papers
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Evaluating agents in simulated representative environments is a good first step to figure out practices and techniques that can improve agentic performance for real-world tasks.
Detecting hallucinations in large language models using semantic entropy
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
"MoA increases the effective context length by 3.9× with the same average attention span, boosting retrieval accuracy by 1.5−7.1× over the uniform-attention baseline across Vicuna-7B, Vicuna-13B, and Llama3-8B models ... MoA achieves a 1.2−1.4× GPU memory reduction and boosts decode throughput by 5.5−6.7× for 7B and 13B dense models on a single GPU, with minimal impact on performance."
🛠️ Products
📚 Resources
My AI Engineer talk: Tool Use with Open-Source LLMs
Send me your thoughts on the subject or any questions on X!
From bare metal to a 70B model: infrastructure set-up and scripts
An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability
Want more? Follow me on X! @ricklamers