📰 News
Paris-based Holistic AI exits stealth with $220M Seed
Stanford dropout and 4 ex-DeepMind employees. From their materials "frontier action models to boost the productivity of workers”. Not unlike Imbue and Adept.
Cursor announces 1000 tok/s for specialized Llama 3 70B code edit model
Likely they're using some form of constrained + speculative decoding which is known increase tokens per second due to the ability to skip expensive decoding steps of the full model's forward pass.
Beats GPT4 V/Gemini Pro on TextVQA, DocVQA and ChartQA by a decent margin, 19B params, Llama 3 8B (Instruct) text backbone, 8K context length, 1344 X 1344 resolution supported, commercial use allowed.
📦 Repos
📄 Papers
LoRA Learns Less and Forgets Less
By MosaicML/Databricks researchers. Essentially LoRA is a tradeoff of better remembering of the pre-trained data at the cost of fitting to the new data less well. To be expected, but nice to see it investigated.
What matters when building vision-language models?
Details about the Idefics2 model and general concerns when developing vision-language models.
Meta multi-modal LLM, Chameleon: Mixed-Modal Early-Fusion Foundation Models
Supports imagegen: "performs non-trivial image generation" and "exceeds the performance of much larger models, including Gemini Pro and GPT-4V".
"we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time"
Introduces meta tokens to alleviate patch information redundancy and with it achieves 1.7× speedup in inference. Repo.
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Pretty mindblowing results for an approach that doesn't require training.
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
📚 Resources
What are Diffusion Models? (2021)
Lilian Weng is terrific as always.
Cody (Sourcegraph coding assistant) releases OpenCtx
Standardizing rich context information for coding assistants.
Mapping the Mind of a Large Language Model
New interpretability work from Anthropic.
OpenAI’s GPT-4o does really well on this long context task, surprising given some of the other reported results like (link)
To InfiniBand or to Ethernet; to cluster makers that's the question
153 pages of details about the latest Gemini models. Covers both Gemini Pro and Gemini Flash.
PaliGemma fine-tuning notebook
JAX and big_vision based
Want more? Follow me on X! @ricklamers