📰 News
Magic.dev claims to have a 100M token context window model that can run on < 1 H100
moondream2 breaks 80 on VQAv2 with 2024-08-26 release
A powerful and efficient VLM by a cracked engineer (vikhyatk)
CogVideoX-5B: an open weights text-to-video model by Zhipu AI (Alibaba affiliated)
It's very impressive for an open model!
Meta invites coding agents and frontier labs to their competitive programming competition
OLMoE: Open Mixture-of-Experts Language Models
Truly open source, meaning: model weights, training data, code, and logs. Great work by Allen Institute for A democratizing AI.
📦 Repos
LLM Compressor library by Neural Magic
"before today, creating quantized checkpoints required navigating a fragmented ecosystem of bespoke compression libraries such as AutoGPTQ, AutoAWQ, AutoFP8, etc. We built LLM Compressor from the ground up as a single library for applying the latest compression best practices, including GPTQ, SmoothQuant, SparseGPT, and RTN"
📄 Papers
Beyond Preferences in AI Alignment
"Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant." interesting take on alignment.
Memory-Efficient LLM Training with Online Subspace Descent
"In this work, we provide the first convergence guarantee for arbitrary update rules of projection matrix." that's an impressive feat for low-rank training and will reduce the memory load of LLM training for many projects.
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
ContextCite: Attributing Model Generation to Context
One of the key parts of reducing hallucinations is direct referencing of context window sources, this paper introduces a general technique and includes working code at https://github.com/MadryLab/context-cite
📱 Demos
Poolside video demo from Dec 23’ ($400M raised, Cursor competitor)
TokenProbe: Text Generation with Token Probabilities
Neat demonstration of token probabilities over a longer sequence.
📚 Resources
Want more? Follow me on X! @ricklamers