70B model training on consumer GPUs: FSDP + QLoRA

Week 11 of Coding with Intelligence

Rick Lamers

Mar 13, 2024

📰 News

DeepSeek releases 7B/1.3B Vision Language (VL) model
Answer.AI releases toolkit for training 70B models on two 3090s
By Tim Dettmers, HF staff and Answer.AI (Jeremy Howard's R&D lab). Benchmarks are in the works. There’s also a repo on GitHub.
xAI announces Grok model will be open sourced

📦 Repos

DeepSpeed releases FP6 support
Qwen-Agent by Qwen team
Function calling optimized for Qwen
Transformer Debugger by OpenAI
Here’s a Loom intro video.
AICI: Prompts as (Wasm) Programs
Very cool project by Microsoft. The project aims to standardize on using a WASM runtime to facilitate structured LLM inference where an explicit control flow is followed as specified by the user. Check out the "generating a numbered list" example in the README. This could serve as the backbone for projects like Guidance, LMQL, SGLang, Outlines, jsonformer, LMFE, etc.

📄 Papers

Yi: Open Foundation Models by 01.AI
Lots of useful information about model pre-training
Is Cosine-Similarity of Embeddings Really About Similarity?
I’ve always felt cosine similarity is a crude approximation of actually semantic document retrieval. This paper by Netflix shines a light on some of its flaws.
Efficient Tool Use with Chain-of-Abstraction Reasoning
Collaboration between EPFL and Meta. Very interesting work on function calling/agentic tool use by LLMs.

🛠️ Products

Deepgram low-latency TTS
Another cool low-latency TTS provider I came across. Check out this video demo using Groq's LLM API.
Coral by Cohere
I like their chat-to-webpages (search) and chat-to-document modes. Their citation system is also nicely implemented.
Devin - an AI Software Engineer by Cognition
They pass 10% on SWE bench, which is a reputable eval from a group at Princeton submitted to ICLR 2024, while GPT-4 is stuck below 5%.

📚 Resources

Simon Willison reflects on the release of GPT-4 level models: Gemini, Mistral Large, Claude Opus and Inflection-2.5
2 minute guide on how to prompt in Cursor
Using Claude Opus Anthropic model in Cursor
Training great LLMs entirely from ground zero in the wilderness as a startup
Nous Research releases Genstruct-7B: an LLM built for synthetic instruction finetuning datasets from raw text corpus
Here's an idea. You can recursively apply this: use Genstruct-7B to create Genstruct-7B-v2 finetuning instructions. On to v3, v4, ... vN.
New benchmark leaderboard for LLMs by reputable Allen Institute for AI
Shows Claude 3 comes in below last GPT-4 (0125-preview) and Mistral Large scoring above Gemini 1 Pro.

Want more? Follow me on Twitter! @ricklamers

/\/€u Th@/_/ght

It’s ironic, and pathetic, that most of these platforms are closed. Yes, I know, or realize, many also have a quasi-open model requiring one to leverage their infrastructure, but, you’d think at this stage of evolution in Tech, we’d be past the greed mindset of the closed model. I presume their argument would be either a matter of safety, as in, their particular model’s probability to be used nefariously, or, that it justifies all of their hard work and expense that they need to profit from.

In the past closed models, did any of the Tech proprietors not benefit from apache.org?

The greed model never ends I guess.

Expand full comment

1 reply by Rick Lamers

1 more comment...

Coding with Intelligence

70B model training on consumer GPUs: FSDP + QLoRA

Week 11 of Coding with Intelligence

Discussion about this post