Discover Mistral Medium: their first proprietary model and it outperforms Gemini Pro

Dec 13, 2023

📰 News

OpenAssistants: build your own reliable function calling assistants
I've worked on this with my awesome team at Definitive Intelligence. Check out the live demo and repo! Our goal is to make it 10X easier to build your own function calling assistants through powerful yet simple abstractions.

Oh, did I mention it works on any chat LLM model? Even those that don't support function calling. Furthermore, fully integrated with LangChain (e.g. LangChain Agent tools).
Live demo: https://openassistants-next.vercel.definitivecorp.io/
La plateforme: API Endpoints for Mistral Medium & embeddings
Mistral Medium outperforms Gemini Pro on MMLU and HellaSwag. Also, they raised $415M with a16z as their lead. Probably means they're going to start balancing Open Weight model releases with proprietary models. Mistral Medium is the first example of that.
You can run phi-2 locally: 2.7B model that outperforms Gemini Nano
My Tweet contains the instructions
Mistral releases 8x7B Mixtral of experts
GPT 3.5 level performance. Mistral is rolling out their own hosted endpoints.
HF Transformers on a roll: AMD support, Mixtral, (Bak)LLaVa and more
Startup Contextual releases DPO alternative: KTO
The authors from Stanford claim significant performance improvements compared to DPO based on a GPT-4 based evaluation of their tuned Llama 30B model.
Run Mixtral MoE on Together AI with 100 tok/s
Great response time by Together AI
StripedHyena 7B: a Transformer architecture alternative that outperforms Mistral’s 7B

📦 Repos

Chat client to concurrently chat with 30+ LLMs
Find the winning output for your task 🙌
MLX from Apple moving fast: Mixtral support
MLX is an Apple specific PyTorch alternative. It just shared an example showing how to run Mistral's new MoE model.
Mamba Chat: a chat LLM based on the state-space model architecture
Hyena/Monarch Mixers, state-space models: the traditional Transformer architecture is facing new competition!
Open Source Midjourney alternative
Really cool to see OSS Stable Diffusion projects pay more attention to usability.
Agent framework, models and benchmark from Chinese social media app 'Kwai'
GPT-4 based agents performed best in their benchmark

📄 Papers

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Merging symbolic and probabilistic systems showing more and more promise. This paper is from the DeepMind team and improves significantly on BIG-Bench Hard through pairing code eval and LLM inference.
Are Emergent Abilities of Large Language Models a Mirage?
tl;dr Likely abilities just slowly improve as LLMs get scaled. This bodes well for safety concerns, meaning that if this research is correct, it's a boon to AI development as we need to decelerate less.

📚 Resources

Meta launches Purple Llama
An umbrella project for LLM security concerns. At a practical level it contains Llama Guard, which is a simple-to-deploy (already available on platforms like Together AI) filtering model that lets you filter/reject inputs/outputs. Safety is a crucial feature for actually making it feasible to deploy OS LLMs in production.
explained: latent consistency models
LCMs are a new kind of image generation models with faster generation speed as one of the big advantages
Resource on technical tricks used for Speeding up Transformers 100X
By Yao Fu, a researcher at University of Edinburgh

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence