π° News
Reason 1: Meta releases Llama 3 - it comes in 70B and 8B variants
Benchmarks indicate itβs competitive with GPT-4 and Claude Opus.
Reason 2: Groq ships Llama 3 @ 300 tokens/s with tool calling
Frontier level tool calling scores on the Berkeley Tool Calling Leaderboard, competitive with GPT-4 and Claude Opus. I personally worked on shipping this at Groq, what a ride.
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Apple releases their own LLM (biggest size 3B and Instruct tuned). Itβs also on HF.
BigAction, an open-source initiative to collect datasets to train and evaluate large action models
Interesting initiative. Also check out the LaVague on GitHub if you're interested in Large Action Models. Repo.
98% attack success rate with 75-length prefix.
VASA-1 real-time talking faces by Microsoft Research
Super convincing! And did I mention itβs real time?
JAT: Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Really cool exploration of generalist agents with a unified model (GPT-2 + ViT). What if this gets scaled up? π
π¦ Repos
Code isolation for safe dynamic code execution (code generated by LLMs). Uses Docker containers for isolation (YMMV - isolation isn't the focus of Docker Containers by default - see their docs).
Penzai: A JAX research toolkit for building, editing, and visualizing neural networks.
Training-free Context Window extension
Llama 3 long-context soon? π
Very useful tools for the Mixtral 8x22B token-native tokenizer.
π Papers
Deconstructing In-Context Learning: Understanding Prompts via Corruption
> We find that repeating text within the prompt boosts model performance, and bigger models (β₯30B) are more sensitive to the semantics of the prompt.
Analyzing and Improving the Training Dynamics of Diffusion Models
Phenomenally written paper. Implementation on GitHub.
OpenAI introduces technique to avoid prompt injections
A simple idea: privileged instructions that should take precedence.
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Sparse inference of dense models
In-Context Learning State Vector with Inner and Momentum Optimization
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
There's also a repo and a demo (see repo, link looks subject to change). Weights promised to come soon. By Tencent researchers.
π Resources
A Visual Guide to Vision Transfromers
Very cool interactive experience, best viewed on desktop.
Must-read from fine-tuning Guru Maxime Labonne where he dives into practical details of fine-tuning Llama 3 using ORPO: Monolithic Preference Optimization without Reference Model.
π· FineWeb: 15T high-quality web tokens by Hugging Face
15T is how many tokens Llama 3 used, now that's a commodity too! Thanks π€!
Phi-3 mini (3.8B) 128K model by Microsoft
Quite decent for such a small model. Try it at LMSYS Chat.
file_search tool that supports up to 10K files, tool_choice support, model configuration: top_p, temperature, response_format support.
Exploration of what comes after scaling up text tokens
Spoiler: unified video-language generative model, because only video data gives orders of magnitude increase.
Reward model distilled from Llama 3 lands 2nd spot in benchmark
10M annotations are expensive, and now available to all :)
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
Fine-tuning for agent use-cases using agent trajectories is a powerful under-explored paradigm. Rich research opportunities here using limited compute using strong base models like Llama 3 70B!
Want more? Follow me on X! @ricklamers