Kickstart 2024 with these useful LLM resources
Week 1 of Coding with Intelligence in 2024 🎆
- LLMs are good at interpolating between training examples thus synthesizing truly novel programs that are some combination of existing code is totally feasible and quite useful.
- LLMs boost code productivity especially if you're operating outside of your comfort zone (e.g. when picking up Vue or Swift for the first time)
- LLMs can code simple yet useful one-off scripts in a single go (e.g. a Python plot of some training metric data)
- Learning how to extract the most value out of an LLM is an acquired skill: asking in the right way, asking the right things and knowing what to ask help you get more value from LLMs
Better WER on HF ASR Leaderboard https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
By the infamous Simon Willison
Offtopic: Discord is a goldmine for AI insights if you know where to look ;-) This was in the Nous Research AI Discord: https://discord.gg/EK6MdmRzXX
By a Reddit user
pykoi let you easily get real-time user feedback and continuously improve your models
Lightweight query routing for LLM/Assistant selection
from semantic_router.layer import RouteLayer
dl = RouteLayer(encoder=encoder, routes=routes)
We can now use our decision layer to make super fast decisions based on user queries. Let's try with two queries that should trigger our decisions:
dl("don't you love politics?").name
Synthetic training data can work well for LLMs to perform reasoning
Especially useful for researchers working on a compute budget! This nugget made me chuckle: "Make your vocab size as a multiple of 64 (Andrej Karpathy says so!)"
Proof of principle: Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9. Will the next Transformers be trained on approximation based accelerators?
By Stanford and Meta researchers. tl;dr Gemini Pro seems competitive, which bodes well for Google as Gemini Pro is generally considered the GPT-3.5 level model so Gemini Ultra might actually outperform GPT-4 Vision.
They took Mixtral MoE 8x7B layers and made a single simpler model that outperforms it. You can try it on LMSYS https://chat.lmsys.org
Frustrating result: distilling synthetic data from best-in-class proprietary models works. However, you're not allowed to do this from OpenAI's ToS so great result Microsoft, but we can't use it.
They show modified Mistral-7b-instruct-0.1 succeeds at the passkey task (retrieve random 5 digit number) up to 24k context window size. Mistral's sliding context window size is 4k by default.
Will Pika and Runway get some competition in the form of a product/API or is this just a tech demo to show the world Google can keep up in AI?
Want more? Follow me on Twitter! @ricklamers