📰 News
Text image text image text image text …
OpenAI working on new reasoning technology under code name ‘Strawberry’ (Q*)
Claude 3.5 Sonnet doubles max output tokens to 8192
More tokens is more better :)
Groq release of Tool Use Llama 3 finetuned models was well received
Thanks y'all for the positive comments! 🫶
📦 Repos
📄 Papers
AgentInstruct: Toward Generative Teaching with Agentic Flows
This paper is what the title of the newsletter is referring to, AgentInstruct isn’t the only work describing this fundamental approach of using agents to generate synthetic data to then go train on with backprop, but I think it’s a good paper to get started on the topic.Cradle: Empowering Foundation Agents Towards General Computer Control
by the Google DeepMind team.
Teaching Transformers Causal Reasoning through Axiomatic Training
Generative Space-Time Enhancement for Video Generation
Demos look very impressive!
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Empirical Analysis of Layer Dynamics in Pretrained Transformer Models
Prover-Verifier Games improve legibility of language model outputs
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
RLHF is finicky, don’t dive in without grokking this paper.
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
📱 Demos
Groq Tool Use Demo on Hugging Face 🤗 Spaces
In case you haven't played with the tool use models yet! Very nice deploy experience on HF Spaces, you should try it with Gradio.
📚 Resources
What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives
I especially like their recommendation at the end of making it easier to include the right context for the LLM. I think many ChatGPT-esque products still make that hard today. Cursor comes to mind as a product that has done this quite well with @-tagging, but it is limited to the coding domain.
Run CUDA, unmodified, on AMD GPUs
By UK based HPC consultant
Interesting take. Still not a full solutions to the agent reliability problem, however.
Want more? Follow me on X! @ricklamers