Native tool use Llama 3.1 lands on Groq API

Week 34 of Coding with Intelligence

Aug 21, 2024

This week is DOUBLY packed since I couldn’t get last week’s release out in time due to crunch period, I hope Llama 3.1 native tool use at Groq speeds (🏎️) makes up for it.

📰 News

Native tool use/function calling for Llama 3.1 lands on Groq API
I've worked on this personally, let me know if you have any thoughts! @ricklamers on X. So all model IDs with 3.1 in them. I even found a small trick to allow parallel tool calling while that’s not natively supported by the Llama 3.1 spec (you can disable this if you want using the OpenAI ‘parallel_tool_calls’ boolean in the chat completion payload).
Answer.AI releases answerai-colbert-small-v1
Beats bge-small-en
Phi 3.5 models released
Dream Machine 1.5 released
They claim improved image-to-video performance, better text support and better prompt adherence.
Groq raises $640M in funding (Series D) to accelerate AI inference
Let’s go TEAM!
Grok-2 released by xAI
Grok != Groq, impressive model on paper but limited rollout as of yet.
ShieldGemma, Google's answer to Purple Llama
Anthropic launches prompt caching for Claude
Not too dissimilar to Gemini’s recent launch of context caching, although details differ (Claude has a timeout of 5m).
Sakana AI releases AI scientists
Most arguments against the ability of LLMs to create net new knowledge dismiss an important direction which is programs guided by LLMs (agents) collecting primary data to execute hypothesis testing at scale. I’m very bullish on this direction being presented by Sakana.
MultiOn claims breakthrough on agentic performance with Agent Q
The blog post links to this paper. As MultiOn is not open sourcing the work directly I would take the results with healthy skepticism and evaluate it for yourself. Notably, on of the authors (Rafael Rafailov) of the Stanford DPO paper contributed to the Agent Q project.
Ideogram 2.0
Very impressive looking images, text and increased level of control like color palette or style (realism).

📦 Repos

Flash Linear Attention in Triton
Eg Mamba2, RWKV6

📄 Papers

📱 Demos

Chat with a Postgres tool
Runs in-browser using WASM, neat!

📚 Resources

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention
Cool optimization scale invariance result contextualized by Simo Ryu
Here’s the related paper he cites.
Databricks compares long-context performance across various SOTA models
GPT-4o seems to rule on unseen long-context Q&A.
Interactive Transformer Explained
Karpathy discusses prompt injection chat template vector
World record CIFAR10-airbench
OpenAI releases verified subset of SWE bench
Eleuther AI Cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.

Want more? Follow me on X! @ricklamers

Coding with Intelligence

Native tool use Llama 3.1 lands on Groq API

Week 34 of Coding with Intelligence

Discussion about this post