Subquadratic LLMs: will they unseat Attention Transformers?
Week 51 of Coding with Intelligence
The blog post contains some interesting details about performance for various SSM based models (Mamba).
Well deserved recipient of a16z Open Source grant. Follow what he's working on, much of it is very interesting. In particular, this repo that summarizes innovations for the transformer architecture is very cool https://github.com/lucidrains/x-transformers
Seems to jailbreak out of alignment, indicating how alignment techniques are probably brittle
By Nathan Lambert
The excellent Zoology blog post series by Chris Ré's lab at Stanford continues.
Extremely useful addition to "just" hosting Llama/Mistral models. This is a huge unlock for building on Open Source LLMs.
By Microsoft. Like Mark Twain said: if I had more time I would have written a shorter letter.
Podcast interview with the author: https://www.latent.space/p/axolotl. Author also received a16z grant.
There's also a paper: https://ipads.se.sjtu.edu.cn/_media/publications/powerinfer-20231219.pdf
I think it’s interesting to revisit more exotic retrieval ideas to see how we can make models better through better synergy between training and inference time retrieval. This is an older paper but worth reviewing.
Promises Transformer-level performance without quadratic scaling in sequence length. Because this is much more similar to the original Transformer architecture adoption could be much faster than SSM or Hyena/Monarch based architectures. On of the authors is Jürgen Schmidhuber, controversial yet widely regarded as deep expert in neural networks.
In some cases resulting in 20-25x performance increase on GPU inference.
Mentioned by the Cursor founder as a way to perform retrieval on codebases.
Want more? Follow me on Twitter! @ricklamers