Early version of Mistral Medium gifted to the community?
Week 5 of Coding with Intelligence
Looks like they're OK with people using it, see this attribution thread on Hugging Face.
Also capable of working with Lean for theorem proving. Model code and weights are permissively licensed and allow commercial use.
It comes in three base model flavors: Mistral 7B, Vicuna 7/13B and Hermes-Yi-34B. Authors mention their preferred serving engine is SGLang.
Ollama tags: https://ollama.ai/library/codellama/tags (It has already been quantized)
DeepMind shows you can use hardware nodes with limited interconnection for training language models using gradient descent that only occasionally communicate their gradients while maintaining the same efficiency in terms of number of iterations to convergence. They show their technique works for models with up to 150M parameters. This makes it significantly easier and cheaper to acquire hardware to train models with. It sets the stage for initiatives like Folding@Home but for language model training.
Research by the Yoshua Bengio et al. There’s already an example implementation available in Hugging Face’s PEFT library. Linked in the Tweet.
The GitHub repo can be found here.
Similar to SparseGPT that I recently shared, if you’re exploring sparsification of LLMs I’d recommend giving this a read.
It’s more finicky than say a GPT-4 so prompting the right way goes a long way to extracting its true potential.
Want more? Follow me on Twitter! @ricklamers