How o1 thinks: 8 full Chain-of-Thought traces

Week 38 of Coding with Intelligence

Sep 21, 2024

📰 News

o1 technical blog post
I highly encourage checking out the chain of thought reasoning text samples (8) in the blog post, these are the only examples we have of the raw chain of thought outputs as the API and ChatGPT hide them.

Note how the 'o1' model is different from 'o1-preview' and 'o1-mini' and performs significantly better on many tasks. AKA the most powerful model hasn't been released.
Gru and Honeycomb top SWE-bench verified
Microsoft releases GRIN 16x3.8B MoE
Beats 8x22B and close to Llama 3 70B with only 6.6B active parameters!
Emotive Speech-to-Speech model Moshi weights released by Kyutai
Demo https://moshi.chat/ including the code repo https://github.com/kyutai-labs/moshi
Goodfire AI shows off feature prompting/steering
Qwen 2.5 release
The best Qwen 2.5-Coder model appears to outperform DeepSeek V2.5 which is pretty impressive.
ScaleAI attempts to crowdsource hardest eval to date
Nous teases reasoning LLM endpoint
NVIDIA releases NVLM a close to SOTA 72B VLM

📦 Repos

zml: High performance AI inference stack. Built for production.
LeanRL: RL finetuning implementation
Uses torch.compile and cudagraphs for fast RL.
Felafax: finetune LLMs on TPUs
LMCache: dedicated KV Cache project with vLLM interop
ICTNLP/Llama-3.1-8B-Omni a speech to speech model

📄 Papers

Training Language Models to Self-Correct via Reinforcement Learning
Strawberry paper by Google DeepMind
Michelangelo: a long context reasoning benchmark by Google
CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Interesting theory result about the power of chain of thought reasoning for Transformers models
Fast Forwarding Low-Rank Training
LoRA loss surface is special 👀
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Strawberry/o1 like techniques.

📱 Demos

Distilabel DataCraft: build datasets using natural language

📚 Resources

BFCL V3: Multi-Turn & Multi-Step Function Calling Evaluation
Multi-Turn is a welcome addition to make this benchmark reflect real-world use cases more!
Spinning Up: an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning
o1/🍓 takes RL to another level, learn more about RL in DL
Parables on the Power of Planning in AI: From Poker to Diplomacy: Noam Brown (OpenAI)
MIT EI seminar, Hyung Won Chung from OpenAI. "Don't teach. Incentivize."
Async Tensor Parallelism implemented in PyTorch by the PyTorch team
Soumith Chintala highlights this capability has largely been exclusively available in proprietary codebases.

Want more? Follow me on X! @ricklamers

Discussion about this post

Trelis Research

I wonder if fast LoRa is really just capturing some of the need to have a higher learning rate if doing LoRa, like rslora. Will email them

Expand full comment

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts