Discover more from Coding with Intelligence
Meta releases Llama 2: here's what we know
Week 29 of Coding with Intelligence
Their goal is to develop a standard for code transformations and code understanding just like language server protocol (LSP). I.e. a standard for Copilot.
It even integrates with personal wiki software like https://www.bookstackapp.com/
Looks like Postgres is going to stick around for a while and keep improving its status as a hybrid capable vector, nosql, sql database. Hard to beat a database that’s so ubiquitous an entire industry rallies around it (and builds on top of). It’s the new Linux! Here’s the blog post if you prefer that https://neon.tech/blog/pg-embedding-extension-for-vector-search
The samples of MusicGen are mind-blowing, check them out here: https://ai.honu.io/papers/musicgen/
This shows how well the semantics of text can be translated across modalities demonstrating the versatility of Language Models. This is audio, but Midjourney shows how well it works for vision.
This is the paper I’ve been waiting for! Manually selecting the few-shot examples always felt like a hack. This paper introduces a sensible approach of selecting based on a reward model. When you have many few-shot examples that vary widely fetching the specific ones best suited for the current task will likely yield a great performance increase.
Generic embeddings can harm the effectiveness of retrieval augmented language models. Check out this recent paper to improve your embedding strategy beyond `text-embedding-ada-002`.
You might have seen this acronym LoRA pop-up, check out the original paper from Microsoft Research to understand how we can use trainable rank decomposition matrices to reduce the total number of trainable parameters for fine-tuning. This technique has been successfully applied in both the LLM context (GPTs) and text-to-image models like Stable Diffusion/Dreambooth. For implementations check out https://github.com/huggingface/peft
Aka "gzip-classifier". There's also a repo you can check out here: https://github.com/bazingagin/npc_gzip
Imo, this is a great example of why we need to keep evaluating beyond the current SotA methods to grow understanding and unlock more performance. The risk reward equation of trying "out there" new ideas does look different, but it might be the only path to truly improve meaningfully beyond current results. There might be some issues with the code used that slightly inflates numbers, but it seems like the primary results of gzip + KNN still hold https://news.ycombinator.com/item?id=36758433
This paper explains how to take models like LLaMA and use their quantized version for fine-tuning in a footprint efficient manner, e.g. on a single 48GB datacenter GPU v.s. requiring 780GB of VRAM across multiple GPUs. They contribute a fine-tuned model called Guanaco that shows competitive performance to proprietary models like Bard and GPT-3.5-turbo.
Interesting read! Especially considering how important in-context learning abilities are for AI powered products/features. Or if you prefer the blog post https://ai.googleblog.com/2023/07/symbol-tuning-improves-in-context.html
The extension has a great UX, it runs side-by-side to your regular Google searches so if you can't find what you're looking for in the first 10 links you can check out the pre-generated GigaBrain page without having to wait.
TL;DR it's not looking good for pgvector if you care about query performance.
Interesting take on Lucene’s position as an incumbent provider of an inverted index and the need for more flexibility in retrieval required for many Retrieval Augmented LM use cases.
This is a clever technique to use model ratings to discover areas where uncertainty is highest to guide your CoT example selection process.
This is an interesting watch if you want to get deeper understanding of how it's possible to reduce the footprint of large models like OPT 65B and get them to run on consumer GPUs like 4090s. It also cover efficient fine-tuning using LoRA when paired with 4-bit frozen models. This particular case gives a 17X reduction in memory requirements for fine-tuning.
Want more? Follow me on Twitter! @ricklamers