Jamba: are hybrid Attention/Structured State Space models the future?
Week 14 of Coding with Intelligence
📰 News
AI21 Labs releases Jamba: claimed Transformer level SSM/Mamba
Base model available on Hugging Face. Because Mamba isn't quadratic in sequence length it has 3X the performance from an ops perspective compared to Mixtral. Context-window up to 256K. LoRA support also available, and a paper.
Replit creates a dedicated Code Repair model
Detailed blog post with many interesting references and datapoints
Grok 1.5: 74% HumanEval, 81% MMLU, 128K context window
They stated it’s coming to X soon
tl;dr Building your own GPTs but need more features & control? Readers of CoWI get early access by signing up here.
📦 Repos
Could be interesting for easier tasks that need high speed.
RepoAgent: never write docs again
It also has a special mode for keeping docs up-to-date! Here's a short video.
📄 Papers
Long-form factuality in large language models
By Google DeepMind & Stanford. TLDR using agents equipped with Google Search as a tool can improve fact checking performance over crowd-sourced fact checking.
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Lower memory footprint and better performance than LoRA, looks very promising. Waiting for wider adoption in FT libraries.
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Very impressive performance relative to compute on benchmark tasks like TaxiBJ.
🛠️ Products
Browserbase headless browser as a service to power your AI applications
Interesting product!
📚 Resources
Interesting overview and surprisingly comprehensive by a Chinese AI startup.
Many-shot Jailbreaking by Anthropic
Interesting exploration of circumvention risks for model alignment
Recent talk from Yann LeCun at Harvard
“Towards AI systems that can learn, remember, reason, plan, have common sense, yet are steerable and safe”
Hugging Face releases TensorRT integration into Transformers
Check it out on GitHub!
SWE-agent, open source Devin alternative, hits 12.29 on SWE-bench vs Devin’s 13.84
By Princeton NLP group
Want more? Follow me on X! @ricklamers