Discover more from Coding with Intelligence
$113M in Seed funding for Mistral AI to create Open Source LLMs
Week 25 of Coding with Intelligence
This memo provides an interesting glimpse into how the LLM landscape is likely to evolve. Some things that stand out to me: they’re not planning to build their own cluster, which could be a massive competitive disadvantage compared to Google (TPU pods) and OpenAI’s position (MSFT owns clusters and OpenAI providing a structural cost advantage).
OpenAI has talked about open sourcing some of their non-best-in-class models, if this is done it effectively eliminates a key hiring argument for Mistral as they too are planning to keep their best model(s) proprietary.
Content deals for proprietary training data are troublesome combined with open weights. Hackers have shown to be able to circumvent alignment measures and this probably means all that content can easily be extracted from the open models.
The core premise of having a leg up with EU enterprises for data privacy reasons might stand the test of time as data is further becoming the differentiating factor in model performance as evidenced by papers like “Textbooks Are All You Need”.
Nevertheless, great to see that all the LLM players are keeping each other on their toes!
Authors (UC Berkeley) claim “vLLM outperforms HuggingFace Transformers (HF) by up to 24x and Text Generation Inference (TGI) by up to 3.5x”. Check this out if you’re doing inference on OSS models!
Given that LLaMA likely outperforms Falcon this model might be the best OSS model available for commercial use at the moment. Key limitation however is that it can't be used for code generation due to whitespace handling issues.
Step 1: let a model write a regex for you that matches the output you desire. Step 2: use ReLLM to force the model at prediction time to only generate allowable tokens. Super cool idea, I'm curious whether this "pre-generation filtering" approach affects performance too much. Would be good to see benchmark evals against ReLLM.
This is especially interesting because code generation has been particularly costly to do on metered proprietary APIs.
Hosting LLMs can be a bit tricky. BentoML has a track record of providing excellent and intuitive APIs for machine learning serving. I would check this one out.
I hear a lot of folks are successfully utilizing this. Probably won’t get you to the same scale that dedicated vector databases like Weaviate and Pinecone can handle.
A new gradient descent algorithm that supposedly drastically outperforms current best in class algorithms like AdamW and NadamW. They specifically evaluate on language models.
Other LLMs that score above 50% on HumanEval require at least 100x the amount of data and 10x the model size.
Distributed training of LLMs is the norm, but how can one make it scale to ever more nodes? This note from Eric Zelikman et al. comments on tricks that can be used to reduce the amount of data sharing required.
With the quality of text-to-speech increasing to human levels it's easy to get fooled. Great to see vendors of text-to-speech engines helping in the fight to discern real from fake audio clips!
Good evidence that unbiased evaluations are still difficult to come by. Francis includes code to reproduce the MMLU evaluation score which is nice to see.
An approximate guide for navigating the open source language model landscape.
Want more? Follow me on Twitter! @ricklamers