Breaking: new OSS LLM record 3B replit-code-instruct-glaive achieves 63.5% pass@1 HumanEval with 75+ tokens/s on 3090

Jul 05, 2023

📰 News

📦 Repos

Composer by MosaicML
A collection of Transformer training tricks wrapped into a convenient library. Think of it as Keras for Transformers. They claim a 7x speedup over naive approaches.
Flowise: No-code flow-based LLM app builder
Cool idea and nice execution allowing business users to create LLM apps/automations. A bit like Zapier and n8n.
LongChat & LongEval
LMSYS, a UC Berkeley research group that also created Vicuna have launched projects exploring fine-tuning techniques for larger context windows. LongEval attempts to measure how well the longer context is utilized. Not all context-windows are created equal! Does the model really pick up all the provided information? As well as themes like rule-following ability as the number of rules increases with the context-window.

📄 Papers

Language models are weak learners
Combine and use LLMs in multiple stages of your ML pipeline. This paper gives good hints on how to best use LLMs for your downstream tasks.
Extending Context Window of Large Language Models via Positional Interpolation
It looks like larger context windows will become available for more open source models. Meta is backing up its claims of contribution with actual applied research, this is great!

📱 Demos

🛠️ Products

Chat Notebooks adds LLM support to Mathematica for Wolfram Language generation
If you're working within a context that can benefit from the symbolic capabilities of the Mathematica/Wolfram ecosystem you might want to check this out. LLMs are notoriously bad calculators so this looks like a powerful combination.

📚 Resources

On Chinchilla scaling laws and the potential for training smaller (< 10B) OSS models on more tokens
Training LLMs with AMD MI250 GPUs and MosaicML
A real-world deep-dive on what it’s like to fine-tune LLMs on AMD hardware. With LLMs based on the PyTorch stack no code changes were necessary. The article highlights how components of the AMD stack map to the NVIDIA counterparts like RCCL to NCCL and ROCm to CUDA.
LangChain Integrations overview
Vast.ai - very cheap GPU marketplace
The Path to Achieve Ultra-Low Inference Latency With LLaMA 65B
If you're deploying OSS models on GCP/TPUs then this PyTorch update is a must read. The article provides an overview of the latest "bag of tricks" available to improve inference and specifically max_len inference latency.

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence