Early version of Mistral Medium gifted to the community?

Feb 01, 2024

📰 News

An early version of Mistral Medium called “miqu” has been leaked
Looks like they're OK with people using it, see this attribution thread on Hugging Face.
InternLM2-Math: GPT-4 level math model with only 20B parameters
Also capable of working with Lean for theorem proving. Model code and weights are permissively licensed and allow commercial use.
LLaVA-1.6 multimodal LLM released; outperforms Gemini Pro on some benchmarks
It comes in three base model flavors: Mistral 7B, Vicuna 7/13B and Hermes-Yi-34B. Authors mention their preferred serving engine is SGLang.
Meta releases 70B Code Llama
Ollama tags: https://ollama.ai/library/codellama/tags (It has already been quantized)
CodeFuse-DeepSeek-33B achieving pass@1 (greedy decoding) score of 78.65% on HumanEval

📄 Papers

Asynchronous Local-SGD Training for Language Modeling
DeepMind shows you can use hardware nodes with limited interconnection for training language models using gradient descent that only occasionally communicate their gradients while maintaining the same efficiency in terms of number of iterations to convergence. They show their technique works for models with up to 150M parameters. This makes it significantly easier and cheaper to acquire hardware to train models with. It sets the stage for initiatives like Folding@Home but for language model training.
Polytropon: Combining Modular Skills in Multitask Learning
Research by the Yoshua Bengio et al. There’s already an example implementation available in Hugging Face’s PEFT library. Linked in the Tweet.
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
The GitHub repo can be found here.
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Similar to SparseGPT that I recently shared, if you’re exploring sparsification of LLMs I’d recommend giving this a read.

📚 Resources

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence