Mistral AI punches above the 7B weight class: a new 7B king in town
Week 40 of Coding with Intelligence
π° News
Mistral releases Mistral 7B LLM, Apache 2.0 licensed
It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning.
OpenAI announces Residency program: retrain to be an AI Researcher
π¦ Repos
Qwen-14B model by Alibaba seems to outperform Llama-2-70B
Scores best on e.g. Reasoning/EN on OpenCompass leaderboard. Fun fact about the name: Alibaba has recently unveiled its own version of generative AI, called Tongyi Qianwen, which means βseeking truth by asking a thousand questions.β https://opencompass.org.cn/leaderboard-llm https://github.com/QwenLM/Qwen
π Papers
Llama 2 Long: Effective Long-Context Scaling of Foundation Models by Meta
Large Language Models Cannot Self-Correct Reasoning Yet
Looks like self-correction only works if external feedback is provided (e.g. an error trace).
Vision Transformers Need Registers
Really cool result on how optimization of standard architectures can sometimes do something weird and how mining the model activations can inspire architecture improvements. From Reddit by u/Successful-Western27: "By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and *recycle them as temporary storage* instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects."
Keeping LLMs stable when the context window overflows (StreamingLLM)
Official title "Efficient Streaming Language Models with Attention Sinks" but I didn't like that much. Great explanation on Reddit r/LocalLLaMA https://www.reddit.com/r/LocalLLaMA/comments/16xzxwv/comment/k3828yu/
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
Ranking documents is key to better RAG performance.
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Efficient fine-tuning was made possible by LoRA. With QA-LoRA quantized models can be fine-tuned with even better performance. This is effectively a QLoRA alternative.
Think before you speak: Training Language Models With Pause Tokens
It reminds me a bit of the VIT result where patches in the image were used as global storage by the LLM. Also, problems that are computationally harder (e.g. knap-sack) would logically benefit from more "compute" or "circuit" steps. Although symbolic evaluations should probably be delegated to a tool (Toolformer paper comes to mind https://arxiv.org/abs/2302.04761).
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
π± Demos
Pioneer: a Data Science sidekick that runs code interactively
Disclaimer, I've worked on this for https://definitive.io/ Let me know what you think! It also supports Q&A on your custom data (e.g. CSV files).
Use GitHub Copilot locally on your Macbook
By Daniel Gross
Impressive fine-tune by Alignment Lab AI, OpenChat and Open Access AI Collective.
π οΈ Products
Upscayl: Open Source image upscaling app
1024x1024->16384x16384 without effort.
Cloudflare Workers AI: edge AI inference
Useful building block!
AI Horde: distributed LLM cluster of volunteers
Both text and image generation. Join the horde!
π Resources
Prompt tuning on distributed Petals Llama 65B model
From the https://github.com/bigscience-workshop/petals project. Prompt tuning paper in case you're not familiar https://aclanthology.org/2021.emnlp-main.243.pdf
Adapative Computation GH Awesome list
Adaptive Computation is the ability of a machine learning system to adjust its function and compute budget for each example.
Want more? Follow me on Twitter! @ricklamers