Meta releases Llama 2: here's what we know

Week 29 of Coding with Intelligence

Rick Lamers

Jul 19, 2023

📰 News

📦 Repos

Rift: an AI-native language server
Their goal is to develop a standard for code transformations and code understanding just like language server protocol (LSP). I.e. a standard for Copilot.
Danswer: MIT licensed document Q&A
It even integrates with personal wiki software like https://www.bookstackapp.com/
pg_embedding: a pgvector alternative
Looks like Postgres is going to stick around for a while and keep improving its status as a hybrid capable vector, nosql, sql database. Hard to beat a database that’s so ubiquitous an entire industry rallies around it (and builds on top of). It’s the new Linux! Here’s the blog post if you prefer that https://neon.tech/blog/pg-embedding-extension-for-vector-search
Audiocraft: conditioning audio generation with text and LMs
The samples of MusicGen are mind-blowing, check them out here: https://ai.honu.io/papers/musicgen/

This shows how well the semantics of text can be translated across modalities demonstrating the versatility of Language Models. This is audio, but Midjourney shows how well it works for vision.

📄 Papers

Learning to Retrieve In-Context Examples for Large Language Models
This is the paper I’ve been waiting for! Manually selecting the few-shot examples always felt like a hack. This paper introduces a sensible approach of selecting based on a reward model. When you have many few-shot examples that vary widely fetching the specific ones best suited for the current task will likely yield a great performance increase.
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Generic embeddings can harm the effectiveness of retrieval augmented language models. Check out this recent paper to improve your embedding strategy beyond `text-embedding-ada-002`.
LoRA: Low-Rank Adaptation of Large Language Models
You might have seen this acronym LoRA pop-up, check out the original paper from Microsoft Research to understand how we can use trainable rank decomposition matrices to reduce the total number of trainable parameters for fine-tuning. This technique has been successfully applied in both the LLM context (GPTs) and text-to-image models like Stable Diffusion/Dreambooth. For implementations check out https://github.com/huggingface/peft
“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
Aka "gzip-classifier". There's also a repo you can check out here: https://github.com/bazingagin/npc_gzip

Imo, this is a great example of why we need to keep evaluating beyond the current SotA methods to grow understanding and unlock more performance. The risk reward equation of trying "out there" new ideas does look different, but it might be the only path to truly improve meaningfully beyond current results. There might be some issues with the code used that slightly inflates numbers, but it seems like the primary results of gzip + KNN still hold https://news.ycombinator.com/item?id=36758433
QLoRA: Efficient Finetuning of Quantized LLMs
This paper explains how to take models like LLaMA and use their quantized version for fine-tuning in a footprint efficient manner, e.g. on a single 48GB datacenter GPU v.s. requiring 780GB of VRAM across multiple GPUs. They contribute a fine-tuned model called Guanaco that shows competitive performance to proprietary models like Bard and GPT-3.5-turbo.
Symbol tuning improves in-context learning in language models by Google Research
Interesting read! Especially considering how important in-context learning abilities are for AI powered products/features. Or if you prefer the blog post https://ai.googleblog.com/2023/07/symbol-tuning-improves-in-context.html

🛠️ Products

Mutable.ai: a Copilot that operates directly on your GitHub repos
GigaBrain: Reddit search powered by LLMs
The extension has a great UX, it runs side-by-side to your regular Google searches so if you can't find what you're looking for in the first 10 links you can check out the pre-generated GigaBrain page without having to wait.

📚 Resources

Comparing pgvector vs qdrant
TL;DR it's not looking good for pgvector if you care about query performance.
Hot take on Lucene, Elasticsearch, inverted indices and vector databases
Interesting take on Lucene’s position as an incumbent provider of an inverted index and the need for more flexibility in retrieval required for many Retrieval Augmented LM use cases.
Real-time A100/H100 availability and pricing info
Real-life case studies of how AI tools are changing how freelancers work
Active Prompting with Chain-of-Thought for Large Language Models
This is a clever technique to use model ratings to discover areas where uncertainty is highest to guide your CoT example selection process.
AI job board by Scale VP (a VC)
8-bit Methods for Efficient Deep Learning by Tim Dettmers
This is an interesting watch if you want to get deeper understanding of how it's possible to reduce the footprint of large models like OPT 65B and get them to run on consumer GPUs like 4090s. It also cover efficient fine-tuning using LoRA when paired with 4-bit frozen models. This particular case gives a 17X reduction in memory requirements for fine-tuning.

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence

Meta releases Llama 2: here's what we know

Week 29 of Coding with Intelligence

Discussion about this post