Falcon 180B lands & OpenAI's story in WIRED: how Alec Radford's discovery of the Transformer changed everything
Week 36 of Coding with Intelligence
📰 News
And it's better than Llama 2 (2% better on HF OpenLLM leaderboard), albeit at nearly 2.6x the parameter budget (70B vs 180B).
What OpenAI Really Wants: long read on OpenAI's story by WIRED
Speculative Sampling in llama.cpp
Full F16 precision 34B Code Llama at >20 t/s on M2 Ultra
Yarn-Llama-2-13b-128k, a Llama-2 model, trained for 128k context length using YaRN scaling
📦 Repos
TinyLlama: an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens
Cheer on the training from the sidelines watching the loss plot on W&B https://wandb.ai/lance777/lightning_logs/reports/metric-train_loss-23-09-04-23-38-15---Vmlldzo1MzA4MzIw?accessToken=5eu2sndit2mo6eqls8h38sklcgfwt660ek1f2czlgtqjv2c6tida47qm1oty8ik9 First 105B token checkpoint performs 43.50 on HellaSwagAcc_norm. ETA for wrapping up training is 2023-12-01. It's being trained on 16 A100-40G GPUs.
📄 Papers
Accelerating Large Language Model Decoding with Speculative Sampling
This is the technique used by llama.cpp
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Very exciting to see that models can self-improve. This provides more evidence there is a more to come in terms of SOTA performance from LLMs.
Efficient RLHF: Reducing the Memory Usage of PPO
Optimizations for RLHF combined with synthetically generated feedback data (RLAIF) means we might get more powerful models from released open source base models fairly soon. Stay tuned (haha get it?)
YaRN: Efficient Context Window Extension of Large Language Models
YaRN (Yet another RoPE extensioN method), "show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow", requiring 10x less tokens and 2.5x less training steps than previous methods
📱 Demos
An SDXL fine-tune based on Apple Emojis
No you don't need this, yes it's really cool.
📚 Resources
Run Code Llama locally using MLC and Simon Willison's llm
Run it locally using llm-mlc datasette plugin: https://github.com/simonw/llm-mlc
Meta launches Belebele: a massively multilingual reading comprehension dataset
I think we should embrace every benchmark we can get, to improve we need to measure not guess. There's also a paper: https://arxiv.org/abs/2308.16884 gpt-3.5-turbo performs about 23% better than llama-2-chat.
OpenAI's GPT inventor Alec Radford guest lecture at UC Berkeley (2020)
Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2
Want more? Follow me on Twitter! @ricklamers