Coding with Intelligence

Share this post

Meta releases Llama 2: here's what we know

codingwithintelligence.com

Discover more from Coding with Intelligence

CoWI is a weekly newsletter covering the latest developments in Large Language Models and Machine Learning. Get the latest News, Repos, Demos, Products, and Papers.
Over 2,000 subscribers
Continue reading
Sign in

Meta releases Llama 2: here's what we know

Week 29 of Coding with Intelligence

Rick Lamers
Jul 19, 2023
3
Share this post

Meta releases Llama 2: here's what we know

codingwithintelligence.com
Share

📰 News

  • Meta releases Llama 2: a deep dive

  • Elon Musk starts new AI division at X Corp: xAI

  • FlashAttention v2 released by author Tri Dao, 2X speedup

📦 Repos

  • Rift: an AI-native language server

    Their goal is to develop a standard for code transformations and code understanding just like language server protocol (LSP). I.e. a standard for Copilot.

  • Danswer: MIT licensed document Q&A

    It even integrates with personal wiki software like https://www.bookstackapp.com/

  • pg_embedding: a pgvector alternative

    Looks like Postgres is going to stick around for a while and keep improving its status as a hybrid capable vector, nosql, sql database. Hard to beat a database that’s so ubiquitous an entire industry rallies around it (and builds on top of). It’s the new Linux! Here’s the blog post if you prefer that https://neon.tech/blog/pg-embedding-extension-for-vector-search

  • Audiocraft: conditioning audio generation with text and LMs

    The samples of MusicGen are mind-blowing, check them out here: https://ai.honu.io/papers/musicgen/

    This shows how well the semantics of text can be translated across modalities demonstrating the versatility of Language Models. This is audio, but Midjourney shows how well it works for vision.

📄 Papers

  • Learning to Retrieve In-Context Examples for Large Language Models

    This is the paper I’ve been waiting for! Manually selecting the few-shot examples always felt like a hack. This paper introduces a sensible approach of selecting based on a reward model. When you have many few-shot examples that vary widely fetching the specific ones best suited for the current task will likely yield a great performance increase.

  • One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    Generic embeddings can harm the effectiveness of retrieval augmented language models. Check out this recent paper to improve your embedding strategy beyond `text-embedding-ada-002`.

  • LoRA: Low-Rank Adaptation of Large Language Models

    You might have seen this acronym LoRA pop-up, check out the original paper from Microsoft Research to understand how we can use trainable rank decomposition matrices to reduce the total number of trainable parameters for fine-tuning. This technique has been successfully applied in both the LLM context (GPTs) and text-to-image models like Stable Diffusion/Dreambooth. For implementations check out https://github.com/huggingface/peft

  • “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors

    Aka "gzip-classifier". There's also a repo you can check out here: https://github.com/bazingagin/npc_gzip

    Imo, this is a great example of why we need to keep evaluating beyond the current SotA methods to grow understanding and unlock more performance. The risk reward equation of trying "out there" new ideas does look different, but it might be the only path to truly improve meaningfully beyond current results. There might be some issues with the code used that slightly inflates numbers, but it seems like the primary results of gzip + KNN still hold https://news.ycombinator.com/item?id=36758433

  • QLoRA: Efficient Finetuning of Quantized LLMs

    This paper explains how to take models like LLaMA and use their quantized version for fine-tuning in a footprint efficient manner, e.g. on a single 48GB datacenter GPU v.s. requiring 780GB of VRAM across multiple GPUs. They contribute a fine-tuned model called Guanaco that shows competitive performance to proprietary models like Bard and GPT-3.5-turbo.

  • Symbol tuning improves in-context learning in language models by Google Research

    Interesting read! Especially considering how important in-context learning abilities are for AI powered products/features. Or if you prefer the blog post https://ai.googleblog.com/2023/07/symbol-tuning-improves-in-context.html

🛠️ Products

  • Mutable.ai: a Copilot that operates directly on your GitHub repos

  • GigaBrain: Reddit search powered by LLMs

    The extension has a great UX, it runs side-by-side to your regular Google searches so if you can't find what you're looking for in the first 10 links you can check out the pre-generated GigaBrain page without having to wait.

📚 Resources

  • Comparing pgvector vs qdrant

    TL;DR it's not looking good for pgvector if you care about query performance.

  • Hot take on Lucene, Elasticsearch, inverted indices and vector databases

    Interesting take on Lucene’s position as an incumbent provider of an inverted index and the need for more flexibility in retrieval required for many Retrieval Augmented LM use cases.

  • Real-time A100/H100 availability and pricing info

  • Real-life case studies of how AI tools are changing how freelancers work

  • Active Prompting with Chain-of-Thought for Large Language Models

    This is a clever technique to use model ratings to discover areas where uncertainty is highest to guide your CoT example selection process.

  • AI job board by Scale VP (a VC)

  • 8-bit Methods for Efficient Deep Learning by Tim Dettmers

    This is an interesting watch if you want to get deeper understanding of how it's possible to reduce the footprint of large models like OPT 65B and get them to run on consumer GPUs like 4090s. It also cover efficient fine-tuning using LoRA when paired with 4-bit frozen models. This particular case gives a 17X reduction in memory requirements for fine-tuning.


Want more? Follow me on Twitter! @ricklamers

3
Share this post

Meta releases Llama 2: here's what we know

codingwithintelligence.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Rick Lamers
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing