Breaking: new OSS LLM record 3B replit-code-instruct-glaive achieves 63.5% pass@1 HumanEval with 75+ tokens/s on 3090
Week 27 of Coding with Intelligence
π° News
RedPajama based INCITE-7B-Instruct model beats Falcon, MPT, Pythia and Llama
AMD committing to broader ROCm support got consumer RDNA 3 GPUs
Fine-tuned Replit 3B achieves 63.5% pass@1 on HumanEval, includes code for local reproduction
Waiting on dataset details, if there is no contamination it would be a huge milestone for open source code generating models! While evaluating locally I measured 70 to 78 tokens per second without batching on an NVIDIA 3090 graphics card.
π¦ Repos
A collection of Transformer training tricks wrapped into a convenient library. Think of it as Keras for Transformers. They claim a 7x speedup over naive approaches.
Flowise: No-code flow-based LLM app builder
Cool idea and nice execution allowing business users to create LLM apps/automations. A bit like Zapier and n8n.
LMSYS, a UC Berkeley research group that also created Vicuna have launched projects exploring fine-tuning techniques for larger context windows. LongEval attempts to measure how well the longer context is utilized. Not all context-windows are created equal! Does the model really pick up all the provided information? As well as themes like rule-following ability as the number of rules increases with the context-window.
π Papers
Language models are weak learners
Combine and use LLMs in multiple stages of your ML pipeline. This paper gives good hints on how to best use LLMs for your downstream tasks.
Extending Context Window of Large Language Models via Positional Interpolation
It looks like larger context windows will become available for more open source models. Meta is backing up its claims of contribution with actual applied research, this is great!
π± Demos
π οΈ Products
Chat Notebooks adds LLM support to Mathematica for Wolfram Language generation
If you're working within a context that can benefit from the symbolic capabilities of the Mathematica/Wolfram ecosystem you might want to check this out. LLMs are notoriously bad calculators so this looks like a powerful combination.
π Resources
On Chinchilla scaling laws and the potential for training smaller (< 10B) OSS models on more tokens
Training LLMs with AMD MI250 GPUs and MosaicML
A real-world deep-dive on what itβs like to fine-tune LLMs on AMD hardware. With LLMs based on the PyTorch stack no code changes were necessary. The article highlights how components of the AMD stack map to the NVIDIA counterparts like RCCL to NCCL and ROCm to CUDA.
The Path to Achieve Ultra-Low Inference Latency With LLaMA 65B
If you're deploying OSS models on GCP/TPUs then this PyTorch update is a must read. The article provides an overview of the latest "bag of tricks" available to improve inference and specifically max_len inference latency.
Want more? Follow me on Twitter! @ricklamers