π° News
Aria MoE A3.9B new SOTA open source multimodal LLM
Performance looks competitive with Pixtral, Llama 3.2 11B and in some cases even with GPT4o/GPT4o-mini.
$10k (o1) reasoning challenge by Victor Taelin
A challenge to see if frontier LLMs can reason in a way that causes generalization. The task is to invert a perfect binary tree but he adds 3 criteria that make it novel enough to be outside of the pretraining corpus with high likelihood.
MatMamba: An Elastic and Efficient Neural Network Architecture
"Combining the speed of State Space Models (SSMs) like Mamba2 with the adaptability of Matryoshka-style learning." by Scaled Foundations a startup with MIT co-founders working on autonomous robotics. Implementation available on GitHub https://github.com/scaledfoundations/matmamba
OpenAI Leaders Say Microsoft Isnβt Moving Fast Enough to Supply Servers
Paywalled unfortunately. But the key message is that there are rumors that OpenAI is becoming more independent from Microsoft at the datacenter level. I guess they want to move faster than Microsoft allows at the infrastructure level.
π¦ Repos
Retry is all you need.
Very good emotive quality TTS: "non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT)", see paper and repo for more details.
π Papers
An interesting architecture modification that shows better scaling properties than vanilla Transformers. Will be interesting to see if large open source/frontier groups adopt this.
Addition is All You Need for Energy-efficient Language Models
This paper proposes replacing multiplications with additions and shows some convincing data that there's merit to this idea. As with all architecture modifications, the jury isn't out until the ideas are scaled up.
Intelligence at the Edge of Chaos
In this paper they pretrain GPT2 on cellular automata and show that pretraining on more complex automata increases downstream task performance on tasks like chess and abstract reasoning. Fascinating result that seems to reveal something fundamental about transfer learning.
Generative Reward Models - A Unified Approach to RLHF and RLAIF
Interesting survey of alignment methods and a proposed technique that allows for combining expensive human preference data with synthetically generated preference data. They emphasize OOD (out-of-distribution) performance which is sometimes not highlighted enough when comparing alignment techniques. A co-author of the paper invented the DPO method at Stanford.
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
"In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized." the challenge is getting them to reliably utilize the correct information they contain.
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Reproducing Reinforcement Learning implementations is notoriously fraught with many gotchas that individually can cause failure if handled incorrectly. A massive contribution by folks from Mila, Hugging Face and others.
Semantic Training Signals Promote Hierarchical Syntactic Generalization in Transformers
Interesting exploration of the effects of hierarchical biases in the Transformer architecture.
Analyzing CoT behavior in LLMs using a specific task (decoding shift ciphers). Conclusion is positive. "Overall, we conclude that CoT prompting performance reflects both memorization and a probabilistic version of genuine reasoning."
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Interesting work from Mila that aims to improve on metrics that quantify video generation quality. Since that's so hard to quantify I think this work has a lot of potential to help researchers discover which techniques actually make a meaningful difference. I'm sure all text-to-video startups are all over this (Luma Labs, Runway, OpenAI, Kling, etc.).
Some welcome details on the new Pixtral multimodal model.
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
π± Demos
π Resources
A list of crowdsourced .cursorrules. No need to write your own prompts for language specific Cursor rules.
Machines of Loving Grace - long read by Anthropic founder Dario Amodei
Subtitled "How AI Could Transform the World for the Better", a positive and grounded essay about the impact of AI from on of the, if not the, best AI labs in the world.
Kling AI community short films
Check out SOTA generative AI projects
TxT360 - open source 15T corpus and processing pipeline
"We demonstrate a simple but effective upsampling recipe that creates a 15+ trillion-token corpus, outperforming FineWeb 15T on several key metrics."
O1 replication journey by GAIR (Shanghai Jiaotong University)
Want more? Follow me on X! @ricklamers