Sitemap - 2024 - Coding with Intelligence
Gemini 2.0: is Google finally where everyone expected it to be?
Model Context Protocol: making LLMs more useful
Open Source o1 has (almost) arrived: DeepSeek R1-Lite-Preview
Gemini Exp 1114: overfitted to benchmarks or new king?
Chinese labs drop some incredible models
Full World simulation just had its ImageNet moment
Claude Computer Use: RPA on steroids
Edge AI makes waves: Qwen 2.5 Code Interpreter in your browser
Aria MoE A3.9B a new open source multimodal LLM
A new kind of Foundation Models: LFMs
Molmo 72B: the VLM you shouldn't sleep on
How o1 thinks: 8 full Chain-of-Thought traces
Pixtral 12B dropped: Mistral's first Vision Language Model
The 100M token context window has arrived
Simulating entire worlds using Diffusion Models: GameNGen
Native tool use Llama 3.1 lands on Groq API
Stability AI rises from the ashes as Black Forest Labs: FLUX.1 is insane
Anthropic Circuits Updates: your best bet for understanding LLMs?
Strong Open LLMs ⇒ thriving open ecosystem
New meta: using Agents for high quality Synthetic Data
Special announcement: Llama 3 Groq Tool Use finetunes 8B and 70B
Can we patch recurrent models? TTT & read twice
When do Agentic loops add value over direct prompting?
DeepSeek-Coder-V2 dropped: GPT-4/Opus level open source coding model
Qwen 2: a story of Alibaba vs Meta
Did State Space Models find their niche? 135ms latency for text-to-speech
Solid papers on finetuning & interpretability and some great demos 📱
Vision Models are having their moment
What tradeoffs is OpenAI's GPT-4o making to achieve its speed?
Beyond vanilla Transformers: Multi-Tokens, xLSTM, KANs & more
Self-Alignment for fine-tuning takes flight with StarCoder2-Instruct
Mixtral 8x22B Instruct: the race for efficiency is on
Mistral drops 8x22B & Llama 3 imminent
Jamba: are hybrid Attention/Structured State Space models the future?
Databricks' DBRX-132B-MoE outperforms Mixtral
Are LLM-powered agents around the corner?
70B model training on consumer GPUs: FSDP + QLoRA
Did Claude 3 perfectly undercut OpenAI's pricing?
Era of 1-bit LLMs: LLaMA 3B performance with just 28% of the weights
Groq delivers Mixtral at 500+ (!) tokens per second
Did Qwen 1.5 72B just overtake closed-source Mistral-Medium & GPT-3.5 Turbo?
Early version of Mistral Medium gifted to the community?
Stanford researchers drop SGLang: outperforms vLLM with up to 5x higher throughput
Perplexity raises $73.6M & new small models: LLaVA-ϕ a 2.7B Phi-2 based multimodal model