Big update to Google's Bard and OpenAI ships a new completion model (with logprobs!)
Week 38 of Coding with Intelligence
๐ฐ News
Google Bard ships Extensions update
This is a very powerful idea: it integrates data from Drive, Docs, Gmail, YouTube, Maps, Flights, and Hotels. Check out how well it does on your data.
OpenAI releases gpt-3.5-turbo-instruct, completion model
Apparently it reaches 1800 Elo in chess https://twitter.com/GrantSlatton/status/1703913578036904431 and https://gist.github.com/grantslatton/8ae9d5bfe0f9e26bb5211a32b799abd3 Note the endpoint includes logprobs that can sometimes be useful, read more here: https://twitter.com/alexgraveley/status/1704169124467749090
๐ฆ Repos
Marvin: minimalist LangChain alternative
Some cool ideas in here to treat LLMs as function calls with structured output in a very Python-native style. To get a real-world example, the chess example Gist link of gpt-3.5-turbo-instruct in this issue uses it.
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Main author is also behind the https://vllm.ai/ project. Supervised by the inventor of Flash Attention, Tri Dao. There's also a blog post https://sites.google.com/view/medusa-llm
๐ Papers
Language Modeling Is Compression
Compression Is All You Need? Also check out this earlier post of Ilya Sutskever's talk about unsupervised learning as compression https://www.youtube.com/watch?v=AKMuA_TVz3A
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Nougat: OCR for Academic Documents
Implementation is also available on GitHub https://github.com/facebookresearch/nougat Exciting how many high quality tokens this will unlock for training/retrieval.
DeepMind paper on letting LLMs optimize prompts
tl;dr 8% GSM8K and 50% Big-Bench Hard performance by letting the LLM find a better prompt. Paper title "Large Language Models as Optimizers"
Efficient Memory Management for Large Language Model Serving with PagedAttention
๐ฑ Demos
๐ Resources
Google's Jeff Dean and Amin Vahdat reveal some TPU/LLM scaling challenges
Generative AI Infra & Market Map by Sequoia Capital
Useful to get a "lay of the land". The infra map is very helpful if you're a builder.
Pallas: a JAX extension similar to Triton
In case you forgot about Triton, itโs OpenAIโs framework that tries to be in the Goldilocks zone for ML engineers working on neural network architectures.
Want more? Follow me on Twitter! @ricklamers