Sitemap - 2024 - Coding with Intelligence

Gemini 2.0: is Google finally where everyone expected it to be?

Model Context Protocol: making LLMs more useful

Open Source o1 has (almost) arrived: DeepSeek R1-Lite-Preview

Gemini Exp 1114: overfitted to benchmarks or new king?

Chinese labs drop some incredible models

Full World simulation just had its ImageNet moment

Claude Computer Use: RPA on steroids

Edge AI makes waves: Qwen 2.5 Code Interpreter in your browser

Aria MoE A3.9B a new open source multimodal LLM

A new kind of Foundation Models: LFMs

Molmo 72B: the VLM you shouldn't sleep on

How o1 thinks: 8 full Chain-of-Thought traces

Pixtral 12B dropped: Mistral's first Vision Language Model

The 100M token context window has arrived

Simulating entire worlds using Diffusion Models: GameNGen

Native tool use Llama 3.1 lands on Groq API

Stability AI rises from the ashes as Black Forest Labs: FLUX.1 is insane

Anthropic Circuits Updates: your best bet for understanding LLMs?

Strong Open LLMs ⇒ thriving open ecosystem

New meta: using Agents for high quality Synthetic Data

Special announcement: Llama 3 Groq Tool Use finetunes 8B and 70B

Can we patch recurrent models? TTT & read twice

When do Agentic loops add value over direct prompting?

AI Engineer World Fair in SF

DeepSeek-Coder-V2 dropped: GPT-4/Opus level open source coding model

Qwen 2: a story of Alibaba vs Meta

Did State Space Models find their niche? 135ms latency for text-to-speech

Solid papers on finetuning & interpretability and some great demos 📱

Vision Models are having their moment

What tradeoffs is OpenAI's GPT-4o making to achieve its speed?

Beyond vanilla Transformers: Multi-Tokens, xLSTM, KANs & more

Self-Alignment for fine-tuning takes flight with StarCoder2-Instruct

Why Llama 3 changes the game

Mixtral 8x22B Instruct: the race for efficiency is on

Mistral drops 8x22B & Llama 3 imminent

Jamba: are hybrid Attention/Structured State Space models the future?

Databricks' DBRX-132B-MoE outperforms Mixtral

Are LLM-powered agents around the corner?

70B model training on consumer GPUs: FSDP + QLoRA

Did Claude 3 perfectly undercut OpenAI's pricing?

Era of 1-bit LLMs: LLaMA 3B performance with just 28% of the weights

Groq delivers Mixtral at 500+ (!) tokens per second

Medusa successor Hydra boosts decoding performance by 1.31x & Reka AI lab releases multimodal model Reka Flash

Did Qwen 1.5 72B just overtake closed-source Mistral-Medium & GPT-3.5 Turbo?

Early version of Mistral Medium gifted to the community?

Stanford researchers drop SGLang: outperforms vLLM with up to 5x higher throughput

Model Merging for MoEs and improved DPO: this is how Mixtral is being transformed by the Open Source community

Perplexity raises $73.6M & new small models: LLaVA-ϕ a 2.7B Phi-2 based multimodal model

Kickstart 2024 with these useful LLM resources

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts