Mistral AI punches above the 7B weight class: a new 7B king in town

Week 40 of Coding with Intelligence

Rick Lamers

Oct 04, 2023

📰 News

Mistral releases Mistral 7B LLM, Apache 2.0 licensed
It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning.
Anthropic poised to raise another $2B round
OpenAI announces Residency program: retrain to be an AI Researcher

📦 Repos

OpenLM: toolkit for pre-training medium scale LLMs by LAION
Qwen-14B model by Alibaba seems to outperform Llama-2-70B
Scores best on e.g. Reasoning/EN on OpenCompass leaderboard. Fun fact about the name: Alibaba has recently unveiled its own version of generative AI, called Tongyi Qianwen, which means “seeking truth by asking a thousand questions.” https://opencompass.org.cn/leaderboard-llm https://github.com/QwenLM/Qwen

📄 Papers

Llama 2 Long: Effective Long-Context Scaling of Foundation Models by Meta
Large Language Models Cannot Self-Correct Reasoning Yet
Looks like self-correction only works if external feedback is provided (e.g. an error trace).
Vision Transformers Need Registers
Really cool result on how optimization of standard architectures can sometimes do something weird and how mining the model activations can inspire architecture improvements. From Reddit by u/Successful-Western27: "By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and *recycle them as temporary storage* instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects."
Keeping LLMs stable when the context window overflows (StreamingLLM)
Official title "Efficient Streaming Language Models with Attention Sinks" but I didn't like that much. Great explanation on Reddit r/LocalLLaMA https://www.reddit.com/r/LocalLLaMA/comments/16xzxwv/comment/k3828yu/
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
Ranking documents is key to better RAG performance.
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Efficient fine-tuning was made possible by LoRA. With QA-LoRA quantized models can be fine-tuned with even better performance. This is effectively a QLoRA alternative.
Think before you speak: Training Language Models With Pause Tokens
It reminds me a bit of the VIT result where patches in the image were used as global storage by the LLM. Also, problems that are computationally harder (e.g. knap-sack) would logically benefit from more "compute" or "circuit" steps. Although symbolic evaluations should probably be delegated to a tool (Toolformer paper comes to mind https://arxiv.org/abs/2302.04761).
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Repo: https://github.com/mit-han-lab/llm-awq

📱 Demos

Pioneer: a Data Science sidekick that runs code interactively
Disclaimer, I've worked on this for https://definitive.io/ Let me know what you think! It also supports Q&A on your custom data (e.g. CSV files).
Use GitHub Copilot locally on your Macbook
By Daniel Gross
Mistral-7B-OpenOrca demo
Impressive fine-tune by Alignment Lab AI, OpenChat and Open Access AI Collective.

🛠️ Products

Upscayl: Open Source image upscaling app
1024x1024->16384x16384 without effort.
Cloudflare Workers AI: edge AI inference
Useful building block!
AI Horde: distributed LLM cluster of volunteers
Both text and image generation. Join the horde!

📚 Resources

Using LLMs for Search with Dense Retrieval and Reranking
OpenCompass: new Open Source LLM benchmark
Prompt tuning on distributed Petals Llama 65B model
From the https://github.com/bigscience-workshop/petals project. Prompt tuning paper in case you're not familiar https://aclanthology.org/2021.emnlp-main.243.pdf
Adapative Computation GH Awesome list
Adaptive Computation is the ability of a machine learning system to adjust its function and compute budget for each example.

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence

Mistral AI punches above the 7B weight class: a new 7B king in town

Week 40 of Coding with Intelligence

Discussion about this post