Qwen 2: a story of Alibaba vs Meta

Week 24 of Coding with Intelligence

Rick Lamers

Jun 12, 2024

📰 News

Mistral launches fine-tuning as a service
Showing close to full fine-tuning performance on LoRAs
Qwen 2 released: 128k context, beats Llama 3 70B
Beats Llama 3 70B on nearly all benchmarks, but most importantly on context-window size.
Dragonfly: A large vision-language model with multi-resolution zoom
Includes Llama3 licensed Hugging Face models.
$1,000,000 ARC Prize by François Chollet
His ARC benchmark is so hard LLMs can't crack it. Two questions that come to mind: 1) is the ARC representative of what we can describe as 'general intelligence' 2) will meaningful progress be made before the content's deadline of November 10 24'?
Nomic releases vision embedding models
Available on Hugging Face

📦 Repos

Inspectus: visualize attention activations
smol-vision: recipes for shrinking, optimizing, customizing cutting edge vision models
By 🤗 employee Merve Noyan.
Google release Streamlit/Gradio alternative Mesop
Checkout this pretty neat demo page.
Inspect: An open-source framework for large language model evaluations
As Hamel Husain said: "VSCode Plugins w/Viz & UI, Composibility & Devex, Made by JJ Allaire (cracked eng w/a track record)"

📄 Papers

GrootVL: Tree Topology is All You Need in State Space Model
Different style State Space Models using tree topologies.
On the Effects of Data Scale on Computer Control Agents
By Google DeepMind
Scalable MatMul-free Language Modeling
They show comparable performance on ARCe, ARCc, HS, OQ, PQ, WGe benchmarks and verify energy efficiency on an FGPA. Very interesting work.
On the Effects of Data Scale on Computer Control Agents
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Ask visual reasoning questions like What is the height of 1️⃣ and 2️⃣, respectively? 1️⃣ is 1.63 meters in height, and 2️⃣ is 2.34 meters in height. Clever techniques of using image segmentation to identify objects in the image and query in relative terms. Should generalize to UIs instead of real-world too.
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
A clever application of reward models to improve T2I models, it's on par with the SOTA Stable Diffusion 3 8B in user preference studies.
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
The big contribution is allowing reparameterization without requiring retraining. Impressive perplexity scores near original FP16 model. Could offer better performance compared to quantization efforts.
pOps: Photo-Inspired Diffusion Operators
Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference
Exploring the tradeoffs when compiling models for constrained targets using a systematic approach. What's not to like? Oh yeah, we can't use it. But hey, at least we can learn from it!
DreamMat: High-quality PBR Material Generation
Impressive quality, especially on complex examples.

📱 Demos

Whisper running in the browser ⚡️

🛠️ Products

Apple Intelligence in 5 minutes
AI explained for ordinary consumers
KWAI a Sora alternative got released
Impressive visual fidelity but still fills like animated stills rather than full blown motion.

📚 Resources

Building AI products by Benedict Evans
A useful pragmatic view on building useful products using imperfect generative AI models.

Want more? Follow me on X! @ricklamers

Coding with Intelligence

Qwen 2: a story of Alibaba vs Meta

Week 24 of Coding with Intelligence

Discussion about this post