Jamba: are hybrid Attention/Structured State Space models the future?

Apr 03, 2024

📰 News

AI21 Labs releases Jamba: claimed Transformer level SSM/Mamba
Base model available on Hugging Face. Because Mamba isn't quadratic in sequence length it has 3X the performance from an ops perspective compared to Mixtral. Context-window up to 256K. LoRA support also available, and a paper.
Replit creates a dedicated Code Repair model
Detailed blog post with many interesting references and datapoints
DBRX 4-bit quantized on Apple Silicon using MLX
Grok 1.5: 74% HumanEval, 81% MMLU, 128K context window
They stated it’s coming to X soon
BitNet 1.58 paper reproduced by Nous Research

tl;dr Building your own GPTs but need more features & control? Readers of CoWI get early access by signing up here.

📦 Repos

Qwen releases 16B-2.7B MoE
Could be interesting for easier tasks that need high speed.
RepoAgent: never write docs again
It also has a special mode for keeping docs up-to-date! Here's a short video.

📄 Papers

InternLM2 Technical Report
Long-form factuality in large language models
By Google DeepMind & Stanford. TLDR using agents equipped with Google Search as a tool can improve fact checking performance over crowd-sourced fact checking.
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Lower memory footprint and better performance than LoRA, looks very promising. Waiting for wider adoption in FT libraries.
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Very impressive performance relative to compute on benchmark tasks like TaxiBJ.

🛠️ Products

📚 Resources

Principles of LLMs
Interesting overview and surprisingly comprehensive by a Chinese AI startup.
Many-shot Jailbreaking by Anthropic
Interesting exploration of circumvention risks for model alignment
Recent talk from Yann LeCun at Harvard
“Towards AI systems that can learn, remember, reason, plan, have common sense, yet are steerable and safe”
Hugging Face releases TensorRT integration into Transformers
Check it out on GitHub!
OpenAI on voice cloning
SWE-agent, open source Devin alternative, hits 12.29 on SWE-bench vs Devin’s 13.84
By Princeton NLP group
Weights & Biases releases OpenUI a v0 alternative

Want more? Follow me on X! @ricklamers

Coding with Intelligence