📰 News
An early version of Mistral Medium called “miqu” has been leaked
Looks like they're OK with people using it, see this attribution thread on Hugging Face.
InternLM2-Math: GPT-4 level math model with only 20B parameters
Also capable of working with Lean for theorem proving. Model code and weights are permissively licensed and allow commercial use.
LLaVA-1.6 multimodal LLM released; outperforms Gemini Pro on some benchmarks
It comes in three base model flavors: Mistral 7B, Vicuna 7/13B and Hermes-Yi-34B. Authors mention their preferred serving engine is SGLang.
Ollama tags: https://ollama.ai/library/codellama/tags (It has already been quantized)
CodeFuse-DeepSeek-33B achieving pass@1 (greedy decoding) score of 78.65% on HumanEval
📄 Papers
Asynchronous Local-SGD Training for Language Modeling
DeepMind shows you can use hardware nodes with limited interconnection for training language models using gradient descent that only occasionally communicate their gradients while maintaining the same efficiency in terms of number of iterations to convergence. They show their technique works for models with up to 150M parameters. This makes it significantly easier and cheaper to acquire hardware to train models with. It sets the stage for initiatives like Folding@Home but for language model training.
Polytropon: Combining Modular Skills in Multitask Learning
Research by the Yoshua Bengio et al. There’s already an example implementation available in Hugging Face’s PEFT library. Linked in the Tweet.
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
The GitHub repo can be found here.
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Similar to SparseGPT that I recently shared, if you’re exploring sparsification of LLMs I’d recommend giving this a read.
📚 Resources
Two Sigma presents a framework for Large Language Model abstractions
It’s more finicky than say a GPT-4 so prompting the right way goes a long way to extracting its true potential.
Want more? Follow me on Twitter! @ricklamers