Solid papers on finetuning & interpretability and some great demos 📱

Week 22 of Coding with Intelligence

May 29, 2024

Paper: Not All Language Model Features Are Linear

📰 News

Mistral releases Code model

📦 Repos

ChatTTS: best open source TTS
Approaching ElevenLabs level
Scalable Optimization in the Modular Norm
Mistral Finetuning toolkit

📄 Papers

Paper claims: Your Transformer is Secretly Linear
Counter claim (I buy this more): Not All Language Model Features Are Linear
"In contrast, we explore whether some language model representations may be inherently multi-dimensional. ... We identify tasks where these exact circles are used to solve computational problems involving modular arithmetic in days of the week and months of the year."
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Significantly outperforms many vision language models like GPT-4V, LLaVA-NeXT, Gemini-Pro 1.0.
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
"For example, compared to a conventionally trained 7B model using 300B tokens, our Gstack model converges to the same loss with 194B tokens, resulting in a 54.6% speedup.“
Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
SimPO: Simple Preference Optimization with a Reference-Free Reward
Beats DPO in certain cases
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
LoRA alternative better suited for continued pre-training/absorbing new knowledge.

📱 Demos

Very fast Llava demo on Groq
AI Draw Something guesser 🔥 Awesome work by @geeksplainer
Real-time conversing with digital clones
Very impressive demo, looks like it's running fully locally. Clearly some uncanny valley stuff and ethical considerations are needed here. Elon drops in mid-way.

📚 Resources

A Guide to Creating Neural Circuit Diagrams
By Vincent Abbott (@vtabbott_ on X)
Predibase Fine Tuning Index
Good comparison of fine-tunes across various open source models like Llama 3 8B and Phi 3 for a variety of downstream tasks.
Reward Models slides by Nathan Lambert
What We Learned from a Year of Building with LLMs (Part I)
By Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu and Shreya Shankar
Scale LLM leaderboard
Interestingly it has a private held out test set so overfitting is less likely. The obvious downside if you have to put trust in Scale, which has a pretty good track record so feels acceptable.

Want more? Follow me on X! @ricklamers

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts