When do Agentic loops add value over direct prompting?

Week 27 of Coding with Intelligence

Rick Lamers

Jul 05, 2024

📰 News

Agentless: Demystifying LLM-based Software Engineering Agents
“Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost ($0.34) compared with all existing open-source software agents!”

This paper opens up an interesting discussion around how we should be quantifying more rigorously what agentic loops do to improve performance, and in which cases it simply increases inference duration, inference cost and code complexity without increasing quality measurably.
Kyutai demos and launches Moshi: a GPT4o-like speech model
The demo is up, but isn't quite ready for prime time yet
Intel Shows OCI Optical I/O Chiplet Co-packaged with CPU at OFC2024, Enabling Explosive AI Scaling

📦 Repos

Apple releases 4M: Massively Multimodal Masked Modeling
Check out the video and HF demo! Developed in collaboration with EPFL.
mergekit: a toolkit for merging pre-trained language models
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
GraphRAG: graph structured RAG
Basic idea is to extract structured information from unstructured data and to use knowledge graph querying techniques at LLM inference time to populate the context-window with relevant information to answer the user's query.
Open source LLM router like Martian

📄 Papers

MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart
Interesting context for comparing xLSTM vs Mamba vs Transformers for vision sequence modeling tasks.
An Investigation of Incorporating Mamba for Speech Enhancement
State Space Models finding more applications in the audio domain. Noisy sample and the cleaned up sample.
CELLO: Causal Evaluation of Large Vision-Language Models
Yan LeCun has often argued that LLMs or VLMs don’t “really” understand the world and hence fail to apply even basic physics principles. Perhaps this benchmark will help us measure to what extent scaling, architecture innovations and data quality improvements help with causal reasoning in the visual domain.
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
By evaluating on a wide range of downstream tasks, including InfiniteBench, RULER, PG-19, and Needle In A Haystack, and models including LLaMA-3-1M, Yi-200K, GLM-4-1M, Phi-3-128K, and Qwen2-128K, we demonstrate that MInference effectively reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy.
DeepSeek introduces Expert-Specialized Fine-Tuning (ESFT) for Customizing LLMs with Sparse Architectures!
“Train only task-relevant experts for LLM customization, reduces storage by up to 90% and training time by up to 30%. Customizes LLMs efficiently, nearing Full-Parameter Fine-Tuning (FFT) performance (50.2 vs 51.0), retains high performance in Math and Code tasks (39.8 vs 40.5) compared to FFT (31.5) and LoRA (28.5).”
Ctrl-G: Adaptable Logical Control for Large Language Models
Tight control during generation allows smaller models to reach competitive results to larger, slower, more expensive models. There's also a GitHub repo and UI demo its capabilities is in the works.

📱 Demos

Florence 2 VLM running in the browser
Very cool! Which edge cases could be built with this?

📚 Resources

Meta just dropped weights for “Better & Faster Large Language Models via Multi-token Prediction”
Simple Diffusion Language Models by Sasha Rush
Gradually, then Suddenly: Upon the Threshold
Very nice opinion peace by Ethan Mollick. He outlines, in my opinion, one of the most useful frameworks for thinking about progress in generative AI. The progress in capabilities can be viewed as breaking through discrete capability boundaries that once crossed, delegate an entire subclass of tasks to be “completely solved” by AI.
GPT4All: local LLMs powered by llama.cpp + integrated embeddings/local RAG

Want more? Follow me on X! @ricklamers

Coding with Intelligence

When do Agentic loops add value over direct prompting?

Week 27 of Coding with Intelligence

Discussion about this post