π° News
Canadian government committing $1.75B USD on GenAI initiatives
I think we can officially say AI is a national concern now.
Llama 3 preview release slated for next week
LeCun still saying JEPA is better than generative AI. Time will tell π but it's excited to see people trying other ideas.
CodeGemma and RecurrentGemma released by Google
CodeGemma still somewhat below DeepSeek-coder, RecurrentGemma (based on Griffin architecture, see paper) is much faster when input token length increases (e.g. 5x Gemma token/s on 8K input sequence).
Groq lands support for function calling/tool use
I've led the implementation - let me know what you think! We're working on some exciting new features related to this that you haven't yet seen at any other LLM API provider, stay tuned.
Seems like it's not instruction tuned, so no chat yet.
$10K A::B LLM challenge solved
A::B is an example of a formal system of string rewriting. See problem definition here. Itβs cool to see problem solving ability on problems like these can be unlocked by prompting.
DeepMind files patent for sequence to sequence models (LLMs) paired with Monte Carlo tree search
Discussion on Reddit: is this Q*?
Hugging Face TGI inference server reverts to Apache 2.0
Permissive licensing is back for TGI. Makes the choice between vLLM and TGI less obvious.
Intel Gaudi 3 looks competitive to NVIDIA H100s for LLM training
"Apply edits" and "reasoning about the entire codebase" seem like killer features. Curious to see Cursor's response. Extension looks very barebones to be honest.
104B, 128k context window, good citation support, tool use, multi-lingual, about half GPT-4 Turbo pricing, try on LMSYS Chat.
Gemini 1.5 Pro now widely available
1M context-window, text, image, audio & video support. Also, new embedding model with good MTEB performance.
Combines GPT-4 Vision with Function Calling, not all reviews are positive.
tl;dr Building your own GPTs but need more features & control? Readers of CoWI get early access by signing up here.
π¦ Repos
llm.c: a minimal C/CUDA based training library by Andrej Karpathy
Education value = high. Up next: direct CUDA, CPU version, Llama2/Gemma arch.
JetMoE: 2B active params MoE, outperforms 7B Llama-2 (<$100k)
Trained on public data only. MoE architecture variant.
π Papers
Or as Gary Marcus put it "Breaking news: Scaling will never get us to AGI". I don't think the paper result "multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance" comes as a surprise. Most folks know LLMs approximately require doubling of data for linear increase in accuracy metric performance.
Striped Attention: Faster Ring Attention for Causal Transformers
Training LLMs over Neurally Compressed Text
Interesting idea to reduce sequence lengths
Advancing LLM Reasoning Generalists with Preference Trees
Paper. Interesting application of KTO fine-tuning algorithm.
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Interesting: "Prepending training data with domain names (e.g., wikipedia.org) significantly increases a modelβs knowledge capacity."
Visual AutoRegressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
Higher quality image generation with 20x faster inference. Feels nearly too good to be true, but the demo delivers. Very impressive work by Bytedance.
Long-context LLMs Struggle with Long In-context Learning
Important shift in emphasis: not find the needle in the haystack (context-window) but actual utilization (in-context learning).
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Apple getting more into AI: "Ferret-UI excels not only beyond most open-source UI MLLMs, but also surpasses GPT-4V on all the elementary UI tasks." Interesting implications for agent products that deal with UIs.
Linear Attention Sequence Parallelism
Authors remark on capabilities beyond sequence parallelism found in Megatron-LM and DeepSpeed libraries. Mainly that it is independent of attention heads partitioning, enabling it to support varying numbers or styles of attention heads, such as multi-head, multi-query, and grouped-query attention.
ReALM: Reference Resolution As Language Modeling
Apple Research does original LLM research on reference resolution and in doing so outperforms GPT-4 on this task.
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
By Google DeepMind
π οΈ Products
Krea: generative AI image studio
Doodle to image on steroids.
π Resources
Attention in Transformers explained by 3Blue1Brown
Great video to build more intuition, even if you've seen the math already.
Ring Attention by Andreas KΓΆpf
Hypothesized this is how Gemini gets to 1M/10M tokens in their context-window.
Mamba architecture explained in a blog post
Or a YouTube video if thatβs more your style (different source, same topic).
Unsloth: automatic RoPE Scaling for fine-tuning Mistral v2 7B (16bit LoRA or 4bit QLoRA)
Uses cleaned Alpaca data.
Want more? Follow me on X! @ricklamers