Why Llama 3 changes the game

Apr 24, 2024

📰 News

Reason 1: Meta releases Llama 3 - it comes in 70B and 8B variants
Benchmarks indicate it’s competitive with GPT-4 and Claude Opus.
Reason 2: Groq ships Llama 3 @ 300 tokens/s with tool calling
Frontier level tool calling scores on the Berkeley Tool Calling Leaderboard, competitive with GPT-4 and Claude Opus. I personally worked on shipping this at Groq, what a ride.
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Apple releases their own LLM (biggest size 3B and Instruct tuned). It’s also on HF.
BigAction, an open-source initiative to collect datasets to train and evaluate large action models
Interesting initiative. Also check out the LaVague on GitHub if you're interested in Large Action Models. Repo.
Llama 3 zero-shot jailbreak
98% attack success rate with 75-length prefix.
VASA-1 real-time talking faces by Microsoft Research
Super convincing! And did I mention it’s real time?
JAT: Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Really cool exploration of generalist agents with a unified model (GPT-2 + ViT). What if this gets scaled up? 👀

📦 Repos

AgentRun
Code isolation for safe dynamic code execution (code generated by LLMs). Uses Docker containers for isolation (YMMV - isolation isn't the focus of Docker Containers by default - see their docs).
Penzai: A JAX research toolkit for building, editing, and visualizing neural networks.
Training-free Context Window extension
Llama 3 long-context soon? 👀
Mistral Common
Very useful tools for the Mixtral 8x22B token-native tokenizer.

📄 Papers

📚 Resources

A Visual Guide to Vision Transfromers
Very cool interactive experience, best viewed on desktop.
Fine-tune Llama 3 with ORPO
Must-read from fine-tuning Guru Maxime Labonne where he dives into practical details of fine-tuning Llama 3 using ORPO: Monolithic Preference Optimization without Reference Model.
🍷 FineWeb: 15T high-quality web tokens by Hugging Face
15T is how many tokens Llama 3 used, now that's a commodity too! Thanks 🤗!
Phi-3 mini (3.8B) 128K model by Microsoft
Quite decent for such a small model. Try it at LMSYS Chat.
OpenAI Assistants API v2
file_search tool that supports up to 10K files, tool_choice support, model configuration: top_p, temperature, response_format support.
Exploration of what comes after scaling up text tokens
Spoiler: unified video-language generative model, because only video data gives orders of magnitude increase.
Reward model distilled from Llama 3 lands 2nd spot in benchmark
10M annotations are expensive, and now available to all :)
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
Fine-tuning for agent use-cases using agent trajectories is a powerful under-explored paradigm. Rich research opportunities here using limited compute using strong base models like Llama 3 70B!

Want more? Follow me on X! @ricklamers

Coding with Intelligence