Multimodal comes to Open Source & lossless prompt compression with Gists
Week 52 of Coding with Intelligence
π° News
π¦ Repos
Obsidian: Multimodal LLM based on Mistral 7B
A research collaboration between Nous Research & Virtual Interactive.
Hosted OSS LLM Inference leaderboard
tl;dr Anyscale & Together AI seem to top it. Not entirely unbiased: it's done by the folks from Anyscale. Itβs received criticism from the Together AI folks on Twitter.
Clean PyTorch Mamba implementation in a single file
Great educational value, thanks John (Zhiyao) Ma (@johnma2006).
Apple releases FERRET a multimodal model
Research use only, extends LLaVA
OpenVoice: voice cloning by PhD students from MIT/Tsinghua University
π Papers
Learning to Compress Prompts with Gist Tokens
Interesting paper that describes avoiding the trade-off of either fine-tuning (inflexible, takes time, model management) or prompting (occupies many tokens in the context window, long/expensive inference).
Towards Understanding Mixture of Experts in Deep Learning
Good primer for understanding MoE models
π± Demos
Diffusion models can do text now!
π οΈ Products
The images are spectacular. Some examples: https://twitter.com/nickfloats/status/1737728299332460681
Suno: AI Audio generation with good vocals/lyrics
Check the examples, it's truly SOTA.
Want more? Follow me on Twitter! @ricklamers