๐ฐ News
Liquid AI launches Liquid Foundation Models
They release 1B, 3B, and 40B LFMs. The main improvement seems to be their efficiency on long context inputs, beyond Transformer and even SSM models. Their work isn't open source (yet) so claims will be hard to verify until they do. Some more historical context in https://www.liquid.ai/blog/liquid-neural-networks-research. Try them in the Playground.
Meta AI showcases high quality text-to-video model
No model or product release at this time unfortunately. Quality looks as good or better than OpenAI's Sora.
Feature is similar to Artifact from Anthropic and the Cursor AI IDE. Responses have been mixed, use it yourself and leave your thoughts in the comments! I'd love to read it.
๐ฆ Repos
HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly
A long context eval from the Princeton NLP group.
xjdr of X fame open sources sampling technique for emulating o1-style LLM completions
Modded-NanoGPT: Keller Jordan adapts NanoGPT to show improved optimizer
"The proposed optimizer has the following properties: half the memory usage of Adam, 1.43x faster training, <7% wallclock overhead."
๐ Papers
Contextual Document Embeddings
The method optimizes contrastive learning via adversarial batch construction and utilizes a two-stage encoder ฯ(d; D) to incorporate corpus statistics, enabling superior cross-domain generalization.
How to Train Long-Context Language Models (Effectively)
Princeton NLP group proposes the ProLong model and reaches an effective context window size of 512k.
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
By Meta AI. Very interesting new post-training approach.
๐ฑ Demos
๐ Resources
Intuition of diffusion models in an X post
By research scientist Sander Dieleman at Google DeepMind.
Impressions of OpenAI Dev Day (10/1/24)
By your (my?) favorite LLM sommelier Simon Willison.
State Space Models (1): introduction to SSMs
Neat 3-part (all 3 available) blog post series about intuition behind SSM models.
Reverse engineering OpenAIโs o1
By Nathan Lambert from Allen AI.
[Video] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Yannic Kilcher covers a recent paper from Google DeepMind about scaling test-time compute.
Want more? Follow me on X! @ricklamers