π° News
Launch of industry consortium: AI Alliance
Notable participants: Meta, IBM, AMD, ServiceNow, Hugging Face, Dell and Intel. The k8s moment for AI?
Beijing Academy of Artificial Intelligence releases Aquila2-70B
They also released the highest scoring embedding model `bge`
π¦ Repos
80% faster QLoRA LLM fine-tuning using custom Triton kernels
Created by ex-NVIDIA intern. How many VCs have emailed Daniel is left as an exercise for the reader.
Apple launches Apple Silicon specific PyTorch/JAX alternative
If you're building for the Apple platform specifically there are probably gains to be had from using their framework.
π Papers
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Structured state space models (SSMs) approach that outperforms Transformer in various downstream tasks (language tasks like HellaSwag, DNA sequences, audio generation). Co-author invented Flash-Attention. So importantly, this model has linear scaling in sequence length.
π± Demos
3D LLM visualization in your browser
This is really cool! And educational if youβre not too familiar with the Transformer architecture.
π Resources
Thought piece about Agents using tools through a marketplace of tools
Reminds me of https://agentprotocol.ai/
Extensive Vector DB Feature Matrix in a Google Sheet
From this LinkedIn thread
Overview of recent progress in Instruction Tuning
From a PhD student at Princeton. Expect a bunch of pointers to dive deeper and a general survey of what's considered SotA
Making Llama inference fast with PyTorch: 25 tok/s to 107 tok/s
If you allow for some quality degradation (int4 weights) you can even get to 244 tok/s. Cool post about using `torch.compile` to maximum advantage. Blog post
Want more? Follow me on Twitter! @ricklamers