Discover more from Coding with Intelligence
Mistral AI punches above the 7B weight class: a new 7B king in town
Week 40 of Coding with Intelligence
It outperforms Llama 2 13B on every benchmark we tried. It is also superior to LLaMA 1 34B in code, math, and reasoning.
Scores best on e.g. Reasoning/EN on OpenCompass leaderboard. Fun fact about the name: Alibaba has recently unveiled its own version of generative AI, called Tongyi Qianwen, which means “seeking truth by asking a thousand questions.” https://opencompass.org.cn/leaderboard-llm https://github.com/QwenLM/Qwen
Looks like self-correction only works if external feedback is provided (e.g. an error trace).
Really cool result on how optimization of standard architectures can sometimes do something weird and how mining the model activations can inspire architecture improvements. From Reddit by u/Successful-Western27: "By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and *recycle them as temporary storage* instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects."
Official title "Efficient Streaming Language Models with Attention Sinks" but I didn't like that much. Great explanation on Reddit r/LocalLLaMA https://www.reddit.com/r/LocalLLaMA/comments/16xzxwv/comment/k3828yu/
Ranking documents is key to better RAG performance.
Efficient fine-tuning was made possible by LoRA. With QA-LoRA quantized models can be fine-tuned with even better performance. This is effectively a QLoRA alternative.
It reminds me a bit of the VIT result where patches in the image were used as global storage by the LLM. Also, problems that are computationally harder (e.g. knap-sack) would logically benefit from more "compute" or "circuit" steps. Although symbolic evaluations should probably be delegated to a tool (Toolformer paper comes to mind https://arxiv.org/abs/2302.04761).
Disclaimer, I've worked on this for https://definitive.io/ Let me know what you think! It also supports Q&A on your custom data (e.g. CSV files).
By Daniel Gross
Impressive fine-tune by Alignment Lab AI, OpenChat and Open Access AI Collective.
1024x1024->16384x16384 without effort.
Useful building block!
Both text and image generation. Join the horde!
Adaptive Computation is the ability of a machine learning system to adjust its function and compute budget for each example.
Want more? Follow me on Twitter! @ricklamers