📰 News
I highly encourage checking out the chain of thought reasoning text samples (8) in the blog post, these are the only examples we have of the raw chain of thought outputs as the API and ChatGPT hide them.
Note how the 'o1' model is different from 'o1-preview' and 'o1-mini' and performs significantly better on many tasks. AKA the most powerful model hasn't been released.Microsoft releases GRIN 16x3.8B MoE
Beats 8x22B and close to Llama 3 70B with only 6.6B active parameters!
Emotive Speech-to-Speech model Moshi weights released by Kyutai
Demo https://moshi.chat/ including the code repo https://github.com/kyutai-labs/moshi
The best Qwen 2.5-Coder model appears to outperform DeepSeek V2.5 which is pretty impressive.
📦 Repos
zml: High performance AI inference stack. Built for production.
LeanRL: RL finetuning implementation
Uses torch.compile and cudagraphs for fast RL.
📄 Papers
Training Language Models to Self-Correct via Reinforcement Learning
Strawberry paper by Google DeepMind
CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Interesting theory result about the power of chain of thought reasoning for Transformers models
Fast Forwarding Low-Rank Training
LoRA loss surface is special 👀
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Strawberry/o1 like techniques.
📱 Demos
📚 Resources
BFCL V3: Multi-Turn & Multi-Step Function Calling Evaluation
Multi-Turn is a welcome addition to make this benchmark reflect real-world use cases more!
o1/🍓 takes RL to another level, learn more about RL in DL
Parables on the Power of Planning in AI: From Poker to Diplomacy: Noam Brown (OpenAI)
MIT EI seminar, Hyung Won Chung from OpenAI. "Don't teach. Incentivize."
Async Tensor Parallelism implemented in PyTorch by the PyTorch team
Soumith Chintala highlights this capability has largely been exclusively available in proprietary codebases.
Want more? Follow me on X! @ricklamers
I wonder if fast LoRa is really just capturing some of the need to have a higher learning rate if doing LoRa, like rslora. Will email them