Dear audience, my newsletter comes to you on Saturday instead of the regular Wednesday. I blame the sunny weather in Spain ☀️🇪🇸 — I was on a short holiday to re-energize ⚡️ — but we’re back with an action packed newsletter, let’s dive in!
A note on the title, the overall theme of the updates of this week is that due to the ever increasing quality of open models (Llama 3.1, DeepSeek, Mi(s|x)tral) the ecosystem that builds on top of them continues to flourish, FLUTE as a novel quantization scheme is one of my favorite examples.
📰 News
SGLang v0.2: beating vLLM and TensorRT-LLM in Llama 3 serving
However, the competition problem translation into Lean format was still done manually.
Significantly improved scores in Arena-Hard compared to vanilla Llama 3 70B Instruct.
📦 Repos
Inference engine with good feature support like quantization and tool use.
E5-V: Universal Embeddings with Multimodal Large Language Models
Parseltongue: Chrome extension for prompt jailbreaking
Works on GPT-4o mini.
FLUTE: Flexible Lookup Table Engine for LUT-quantized LLMs
Supported directly in e.g. vLLM.
Interesting project to optimize the HTML DOM representation in Markdown for LLMs to operate on. Will be interesting to see how "optimal Markdown" differs exactly from regular Markdown. Some example differences: Preserves HTML semantic tags (<header>, <footer>, <nav>, <aside>, etc.), Captures image metadata (alt text, dimensions, etc.), Maintains table structures and data relationships, Preserves link destinations while optimizing for token efficiency.
Triton kernels in vLLM for AWQ quantization
Fast quantized inference with AMD GPUs.
📄 Papers
Reliable Reasoning Beyond Natural Language
Use LLMs to translate problems into Prolog and then use fast, reliable and efficient symbolic evaluation code to get answers. I'm very bullish on this general approach (as long-time readers of this newsletter are surely aware by now).
The bootstrapping techniques of using the model itself to create further training data is really interesting.
Advancing LLM Reasoning Generalists with Preference Trees
Paper on producing Eurus-70B, a model with good reasoning performance. They use KTO and NCO for preference learning. They’ve also produced a more updated 8x22B model.
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
🛠️ Products
Reranking models by Mixedbread
Models:
mxbai-rerank-xsmall-v1, mxbai-rerank-base-v1, mxbai-rerank-large-v1
📚 Resources
Llama 3.1 405B coding performance in Aider
Better than Mistral Large 2, behind DeepSeek Coder V2 and Sonnet 3.5. Related note, some say Llama 3.1 coding ability might be hurt by (incorrect) DPO preference tuning.
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
It's all in the name. By Washington University, Stanford, University of Texas, Berkeley & Salesforce Research.
Faith and Fate: Transformers as fuzzy pattern matchers
“The paper's key idea is that while LLMs may sound like they are following a systematic procedure, they are mainly matching parts of a problem against items in their training data, and stitching together results.” @alexisgallagher
Planning for Agents by Harrison Chase
He also gave a short talk about the topic.
Want more? Follow me on X! @ricklamers