OpenAI's embedding model dethroned in MTEB by BAAI general embedding model

Aug 09, 2023

📰 News

MTEB Embedding Benchmark: bge-small 384 dimensions outperform text-embedding-ada-002
bge stands for BAAI general embedding, with embeddings of just 384 dimensions versus the 1536 of OpenAI's text-embedding-ada-002 there is a new very interesting choice for embedding generation. Not just that, OpenAI's embeddings are costly API calls compared to self-hostable open source bge-family of models. Also check out their repo https://github.com/FlagOpen/FlagEmbedding
Google introduces AI-powered Codespaces alternative: Project IDX
New LLM chip company: MatX
Will be interesting to see if more fabless chip companies enter the space of NVIDIA GPUs, AMD GPUs, and TPUs. Companies like Tenstorrent, SambaNova, Untether AI, Groq, Habana Labs, Graphcore and Cerebras are certainly trying!
Vicuna v1.5 released: Llama 2, 16k context and for commercial-use

📦 Repos

DeepSpeed Chat: end-to-end three-stage InstructGPT pipeline
Use this code to produce instruct-tuned ChatGPT style models from custom data (RLHF) from open source base models. A full tune only costs on the order of hundreds of dollars of compute time.
JSX for writing prompts
They even draw a full analogy to web design in their original blog post: https://www.cursor.so/blog/prompt-design Not so sure about whether prompt engineering (or prompt design as they prefer to call it) is really analogous to web design, but I do agree that rendering prompts for inspection and composability are key for efficient and high quality prompt authoring. One issue I see is for folks (yours truly) not in a JS context, can we port this to Python :)?
EasyLLM: swap out LLMs (open & proprietary)
An open-source Python package to streamline and unify working with open LLMs. They promise swapping models is as easy as swapping the import line (not sure if that's the right abstraction level, but let's see!)
OpenChat: Llama 2 based SFT tuned LLM
According to https://twitter.com/johnjnay/status/1688737571788967936 it scores the highest for agent tasks out of all evaluated open source models.

📄 Papers

Direct Preference Optimization: Your Language Model is Secretly a Reward Model
An alternative to RLHF, time to say goodbye to complex multi-stage fine-tuning processes?
Mass-Editing Memory in a Transformer
Retraining, fine-tuning and context windows are no longer the only paradigms for updating language models. We now have LLM surgery as another option for including new knowledge for the model to use.
AgentBench: Evaluating LLMs as Agents
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Docs > examples!

📚 Resources

Comparing inference costs of GPT-3.5 to self-hosted Llama 2
Great analysis! Kudos to Aman from the Cursor team.
Do Machine Learning Models Memorize or Generalize?
The moment models shift and start to generalize instead of regurgitate can happen pretty abruptly during training. The phenomenon is called grokking and was first shown in 2021. A deeper exploration of this behavior is now called mechanistic interpretability and is getting a lot of attention from LLM researchers. Learn more by reading this excellent interactive exploration by a team of Google researchers.
DotDict: no.more.nesting.issues = 'happy'
A simple Python library to make chained attributes possible. You have to wonder, why isn't it already like this?
Simple Parallel Transformer
Simple implementation of a transformer model, with some borrowed efficiency improvements. The purpose is mainly pedagogical. Created by someone from the EleutherAI project.

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence