Did Claude 3 perfectly undercut OpenAI's pricing?

Mar 06, 2024

📰 News

Anthropic releases Claude 3: Opus and Sonnet
It seems on par with GPT-4 and seems to exceed it in some dimensions. A 3rd model, Haiku, will be released later. Pricing of Opus is steep and more expensive than GPT-4-Turbo but slightly cheaper than GPT-4. You can think of Sonnet as the GPT-3.5-Turbo competitor.

Pricing tweet by Simon Willison with more details.

You can try it without signup/free of charge at the LMSYS Chat.

Some reports of severe hallucination, here's one by PyTorch lead Soumith Chintala.
OpenAI publicly responds to Elon Musk
tl;dr: they show a bunch of internal emails, Ilya admits they only shared openly in the beginning for recruitment purposes, interestingly Ilya seems to have co-authored the blog post implying he might be on good terms again with Sam & rest of leadership. There's a weird claim in there "Albania is using OpenAI’s tools to accelerate its EU accession by as much as 5.5 years" which seems pretty ridiculous and gives the whole blog post a vibe of being rushed/poorly considered. Also, they define "Open" in a way that basically matches the definition of every for-profit company in existence "The Open in openAI means that everyone should benefit from the fruits of AI after its built, but it's totally OK to not share the science".

tl;dr Building your own GPTs but need more features & control? Readers of CoWI get early access by signing up here.

📦 Repos

bonito: a lightweight library for generating synthetic instruction tuning datasets for your data
It includes its own model optimized for generating instruction tuning data. More details are included in their paper. Library is created by academics from the CS group at Brown University.
distilabel: AI Feedback framework for scalable LLM alignment
Framework for generating fine-tuning/alignment data.
Orca-Math: trained on synthetic data through agents
They introduce Orca-Math 🧮🐳, a Mistral-7B offshoot excelling in math word problems. Impressive 86.81% score on GSM8k. That’s about as good as Gemini Pro and the next best 7B model (ToRA-Code) scores 72.6%. They've shared the dataset (MIT license). Interesting about this release is it shows an agentic approach to generating fine-tuning data. They don't release a model. Here's an X thread if you want to dig deeper.

📄 Papers

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Griffin outperforms Llama 2 13B averaged across various benchmarks (HellaSwag, PIQA, ARC-E, ARC-C, ...) while only being trained on 15% (!) as many training tokens (300B vs 2T).
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Improves on QA-LoRA and QLoRA in their benchmarks.
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Adding Attention to Mamba/SSM models improves its in-context learning abilities on non-standard retrieval.
Stable Diffusion 3 paper is out
The generated images are extremely impressive. If you're more into blog posts than papers, then check out the blog post here (contains image examples).

🛠️ Products

📚 Resources

F* You, Show Me The Prompt
He makes a good point about how the abstractions introduced by libraries can get in the way of transparency and knowing what's actually happening in the end. Would advise to use LangSmith/Langfuse over mitmproxy though.
Essential Math for AI
A book that covers the mathematics used in modern AI. By Hala Nelson, Professor of Mathematics at James Madison University.
Blog post on the implications of the arrival of long context windows
How large context LLMs could change the status quo & some more thoughts on the direction LLMs are taking.
Vercel launches Generative UI: streaming React components
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
As models get better we need harder benchmarks. The authors say the following about GPQA "experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy".
Prompt Library by Wharton professor
Good writing = good prompting, and Ethan Mollick is a great writer. Check out his prompt examples! Mostly focused on teaching but I think you can learn from his style of prompting.

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence