📚 Resources
antirez - author of Redis - on using LLMs for coding tl;dr:
- LLMs are good at interpolating between training examples thus synthesizing truly novel programs that are some combination of existing code is totally feasible and quite useful.
- LLMs boost code productivity especially if you're operating outside of your comfort zone (e.g. when picking up Vue or Swift for the first time)
- LLMs can code simple yet useful one-off scripts in a single go (e.g. a Python plot of some training metric data)
- Learning how to extract the most value out of an LLM is an acquired skill: asking in the right way, asking the right things and knowing what to ask help you get more value from LLMs
Resources for getting started with dataset development for LLM fine-tuning
Whisper competitor by NVIDIA and Suno.ai
Better WER on HF ASR Leaderboard https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
Stuff we figured out about AI in 2023
By the infamous Simon Willison
Tips for getting involved in dataset development for fine tuning
Offtopic: Discord is a goldmine for AI insights if you know where to look ;-) This was in the Nous Research AI Discord: https://discord.gg/EK6MdmRzXX
📦 Repos
nanoGPT ported to Apple MLX framework
By a Reddit user
pykoi: UI for collecting RLHF data & preference fine-tuning
pykoi let you easily get real-time user feedback and continuously improve your models
Lightweight query routing for LLM/Assistant selection
from semantic_router.layer import RouteLayer
dl = RouteLayer(encoder=encoder, routes=routes)
We can now use our decision layer to make super fast decisions based on user queries. Let's try with two queries that should trigger our decisions:dl("don't you love politics?").name
[Out]: 'politics'
📄 Papers
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Synthetic training data can work well for LLMs to perform reasoning
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Especially useful for researchers working on a compute budget! This nugget made me chuckle: "Make your vocab size as a multiple of 64 (Andrej Karpathy says so!)"
ETH shows approximate MatMul calculation can achieve 15x higher chip area efficiency
Proof of principle: Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9. Will the next Transformers be trained on approximation based accelerators?
Gemini Pro vs GPT-4 Vision on text and multimodal common sense reasoning tasks
By Stanford and Meta researchers. tl;dr Gemini Pro seems competitive, which bodes well for Google as Gemini Pro is generally considered the GPT-3.5 level model so Gemini Ultra might actually outperform GPT-4 Vision.
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
They took Mixtral MoE 8x7B layers and made a single simpler model that outperforms it. You can try it on LMSYS https://chat.lmsys.org
Improving Text Embeddings with Large Language Models
Frustrating result: distilling synthetic data from best-in-class proprietary models works. However, you're not allowed to do this from OpenAI's ToS so great result Microsoft, but we can't use it.
Photo restoration enhanced by a few reference images; by Snap AI team
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
They show modified Mistral-7b-instruct-0.1 succeeds at the passkey task (retrieve random 5 digit number) up to 24k context window size. Mistral's sliding context window size is 4k by default.
📱 Demos
Google takes a jab at zero-shot video generation
Will Pika and Runway get some competition in the form of a product/API or is this just a tech demo to show the world Google can keep up in AI?
Want more? Follow me on Twitter! @ricklamers