Anthropic Circuits Updates: your best bet for understanding LLMs?
Week 31 of Coding with Intelligence
📰 News
Mistral Large 2: 123B with 128k context at ~Llama 3.1 405B level
123B parameters, 128k-context model supporting 80+ coding languages, achieving 84% MMLU accuracy, multilingual tasks (including low-resource languages), GPT-4/Claude-level performance on code generation, math, and reasoning benchmarks while prioritizing inference efficiency on single nodes. Unfortunately, it's not permissively licensed, research license + open weights.
Self-Directed Synthetic Dialogues and Revisions dataset released by AllenAI
The most significant gap they are looking to file is licensing restrictions and an increased number of turns. Data is available on Hugging Face allenai/sdsd-dialogues allenai/sdsd-revisions a great gift to the community of open source AI.
📦 Repos
📄 Papers
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Interesting but relatively well known result by FAIR research, I think the principle should be "if you can evaluate an answer fairly well/cheaply using a judge, prompting smaller models multiple times is preferred over prompting a large model once". Of course latency might suffer because of required sequencing of inferences (draw samples, judge samples, send synthesis/selected sample). Judging is easier in code contexts where unit tests can guide the judging process.
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Interesting attempt to quantify the role of the vocab size in model scaling.
They've released the model weights on Hugging Face
Apple Intelligence Foundation Language Models paper
Surprisingly Google's GCP TPUs not GPUs. Alas no size details for AFM-server, their larger, Llama 3 rivaling server-side foundation model.
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Models on Hugging Face and code on GitHub
📱 Demos
Segment Anything 2 Demo by Meta
Super impressive web demo 👏
🛠️ Products
Meshy AI: text-2-3D by MIT PhD
Really impressive generation results, although it can take a while to generate.
📚 Resources
Anthropic interpretability work update - Circuits Updates - July 2024
ICML 2024 Tutorial: Physics of Language Models
By Meta AI researcher. UPDATE: The video had to be set to private because of ICML conference agreements. Should be back up Aug 20th. In the interim check out https://physics.allen-zhu.com/home and his other videos! https://www.youtube.com/@zhuzeyuan/videos
How fast can grammar-structured generation be?
By the creators of the Outlines library .txt
AppWorld Engine: a high fidelity execution environment simulating 457 APIs
This is a great contribution to get more robust evaluations for autonomous agent systems.
Want more? Follow me on X! @ricklamers