AI Engineer World Fair in SF

Week 26 of Coding with Intelligence

Jun 30, 2024

This week’s newsletter is slightly delayed because of a PACKED week at AI Engineer World Fair in San Francisco. I hope you can forgive me, to compensate, I've linked to the timestamps of all my favorite talks from the conference.

And I've also included the slides of my talk (it wasn’t recorded, but plenty of people showed up)!

My talk about Tool Use with Open-Source LLMs

📰 News

AI Engineer live streams

Some of my personal favorites
Mixtral team: https://youtu.be/R0X7mPagRiE?t=2
Rémi from Outlines: https://youtu.be/R0X7mPagRiE?t=767
Daniel from Unsloth: https://youtu.be/R0X7mPagRiE?t=10470
Model Merging by Maxime Labonne: https://youtu.be/R0X7mPagRiE?t=11715
llamafile: https://youtu.be/5zE2sMka620?t=3163
Cursor: https://youtu.be/5zE2sMka620?t=11442
Dylan Patel from SemiAnalysis: https://youtu.be/JVSKlEmUr0k?t=1893
Cognition: https://youtu.be/JVSKlEmUr0k?t=13694
moondream: https://youtu.be/vaIiNZoXymg?t=9244
Cartesia: https://youtu.be/vaIiNZoXymg?t=11695
Gemma 2 27B released + technical report
And it’s available on Kaggle

📦 Repos

ReaL: Efficient RLHF Training for LLMs with Parameter Reallocation
PPO training benchmarks on LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.
FiddleCube: generate ideal question-answers for testing RAG
Simple Evals lib by OpenAI
TLDR evals should be easy to run, that’s the promise of this repo. Includes MMLU, MATH, GPQA, DROP, MGSM and HumanEval.

📄 Papers

WARP: On the Benefits of Weight Averaged Rewarded Policies
Scaling Scaling Laws with Board Games
Adaptable Logical Control for Large Language Models
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Evaluating agents in simulated representative environments is a good first step to figure out practices and techniques that can improve agentic performance for real-world tasks.
Detecting hallucinations in large language models using semantic entropy
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
"MoA increases the effective context length by 3.9× with the same average attention span, boosting retrieval accuracy by 1.5−7.1× over the uniform-attention baseline across Vicuna-7B, Vicuna-13B, and Llama3-8B models ... MoA achieves a 1.2−1.4× GPU memory reduction and boosts decode throughput by 5.5−6.7× for 7B and 13B dense models on a single GPU, with minimal impact on performance."

🛠️ Products

Substrate: parallel inference library

📚 Resources

My AI Engineer talk: Tool Use with Open-Source LLMs
Send me your thoughts on the subject or any questions on X!
Testing AMD’s Giant MI300X
From bare metal to a 70B model: infrastructure set-up and scripts
How Character.ai optimizes inference
An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability

Want more? Follow me on X! @ricklamers

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts