Coding with Intelligence

Share this post

$113M in Seed funding for Mistral AI to create Open Source LLMs

codingwithintelligence.com

Discover more from Coding with Intelligence

CoWI is a weekly newsletter covering the latest developments in Large Language Models and Machine Learning. Get the latest News, Repos, Demos, Products, and Papers.
Over 2,000 subscribers
Continue reading
Sign in

$113M in Seed funding for Mistral AI to create Open Source LLMs

Week 25 of Coding with Intelligence

Rick Lamers
Jun 21, 2023
Share this post

$113M in Seed funding for Mistral AI to create Open Source LLMs

codingwithintelligence.com
2
Share

📰 News

  • Pitch memo that raised €105m for four-week-old EU LLM startup Mistral

    This memo provides an interesting glimpse into how the LLM landscape is likely to evolve. Some things that stand out to me: they’re not planning to build their own cluster, which could be a massive competitive disadvantage compared to Google (TPU pods) and OpenAI’s position (MSFT owns clusters and OpenAI providing a structural cost advantage).

    OpenAI has talked about open sourcing some of their non-best-in-class models, if this is done it effectively eliminates a key hiring argument for Mistral as they too are planning to keep their best model(s) proprietary.

    Content deals for proprietary training data are troublesome combined with open weights. Hackers have shown to be able to circumvent alignment measures and this probably means all that content can easily be extracted from the open models.

    The core premise of having a leg up with EU enterprises for data privacy reasons might stand the test of time as data is further becoming the differentiating factor in model performance as evidenced by papers like “Textbooks Are All You Need”.

    Nevertheless, great to see that all the LLM players are keeping each other on their toes!

  • Meta introduces a foundational model for speech synthesis

  • Founders previously at Meta/Deepmind and co-authors of the Chinchilla LLM found Mistral AI and raise monster $113M seed round at a $260M valuation to take on OpenAI

  • LangChain lands support for OpenAI GPT-3.5 and GPT-4 Functions feature

📦 Repos

  • vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

    Authors (UC Berkeley) claim “vLLM outperforms HuggingFace Transformers (HF) by up to 24x and Text Generation Inference (TGI) by up to 3.5x”. Check this out if you’re doing inference on OSS models!

  • OpenLLaMA 13B trained on 1T tokens of RedPajama dataset

    Given that LLaMA likely outperforms Falcon this model might be the best OSS model available for commercial use at the moment. Key limitation however is that it can't be used for code generation due to whitespace handling issues.

  • ReLLM: Exact structure out of any language model completion.

    Step 1: let a model write a regex for you that matches the output you desire. Step 2: use ReLLM to force the model at prediction time to only generate allowable tokens. Super cool idea, I'm curious whether this "pre-generation filtering" approach affects performance too much. Would be good to see benchmark evals against ReLLM.

  • WizardCoder 15B open source code LLM released: achieves 57.3 pass@1 on HumanEval

    This is especially interesting because code generation has been particularly costly to do on metered proprietary APIs.

  • OpenLLM by BentoML - An open platform for operating large language models (LLMs) in production

    Hosting LLMs can be a bit tricky. BentoML has a track record of providing excellent and intuitive APIs for machine learning serving. I would check this one out.

  • pgvector - Open Source vector similarity search for Postgres

    I hear a lot of folks are successfully utilizing this. Probably won’t get you to the same scale that dedicated vector databases like Weaviate and Pinecone can handle.

📄 Papers

  • Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

    A new gradient descent algorithm that supposedly drastically outperforms current best in class algorithms like AdamW and NadamW. They specifically evaluate on language models.

  • Textbooks Are All You Need - 51% on pass@1 HumanEval with just 1.3B parameters and 7B tokens (!)

    Other LLMs that score above 50% on HumanEval require at least 100x the amount of data and 10x the model size.

  • Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

    Distributed training of LLMs is the norm, but how can one make it scale to ever more nodes? This note from Eric Zelikman et al. comments on tricks that can be used to reduce the amount of data sharing required.

📱 Demos

  • AI Speech Classifier by ElevenLabs

    With the quality of text-to-speech increasing to human levels it's easy to get fooled. Great to see vendors of text-to-speech engines helping in the fight to discern real from fake audio clips!

📚 Resources

  • LLaMA likely outperforms Falcon despite HF OpenLLM leaderboard

    Good evidence that unbiased evaluations are still difficult to come by. Francis includes code to reproduce the MMLU evaluation score which is nice to see.

  • HF OpenLLM leaderboard

    An approximate guide for navigating the open source language model landscape.


Want more? Follow me on Twitter! @ricklamers

Share this post

$113M in Seed funding for Mistral AI to create Open Source LLMs

codingwithintelligence.com
2
Share
2 Comments
Share this discussion

$113M in Seed funding for Mistral AI to create Open Source LLMs

codingwithintelligence.com
Vincent Hus 🔥
Writes Cloud Chronicles
Jun 30Liked by Rick Lamers

Thanks for posting, looking forward to the next one! Good links!

Expand full comment
Reply
Share
1 reply by Rick Lamers
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 Rick Lamers
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing