Model proliferation continues: new LLMs you should know about - Yi, Aurora, DeepSeek
Week 48 of Coding with Intelligence
π° News
Stability AI releases SDXL Turbo: A Real-Time Text-to-Image Generation Model
LlamaIndex ships feature simplifying source citations
The fuzzy matching approach is probably something you can implement yourself but the toolbox of LlamaIndex for RAG is becoming better & better. Why reinvent the wheel!
Try it at https://chat.lmsys.org/
GAIA benchmark for evaluating agents, by Meta and Hugging Face
DeepSeek releases 67B code model
The chat model scores 73.8% on HumanEval and based on the extensive reporting on their GitHub repo it appears itβs not overfitting to any test data. This is the real deal folks!
$10M Prize for AI model that can achieve gold medal level on Math Olympiad
Paid for by algorithmic trading firm XTX
π¦ Repos
CoachLM: an automatic instruction revision approach to LLM instruction tuning
Interesting approach to making it easier to create high quality instruction tuning datasets by Huawei Research. Paper: https://arxiv.org/abs/2311.13246
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Applied to vision models, but could also be applied to LLMs across language tasks.
Reference implementation for DPO (Direct Preference Optimization)
By Stanford PhD student Eric Mitchell
π± Demos
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
This is super cool. From pose estimations to fully animated character.
Shinstagram: Elixir based social media network with an army of multi-modal agents
Watch the demo, super cool =)
π οΈ Products
Magnific: Impressive image Upscaler & Enhancer
By https://twitter.com/emailnicolas and https://twitter.com/javilopen
π Resources
Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF
Blog post breaking down the process of applying RLAIF on the OpenChat 3.5 model.
Conjecture of Q* OpenAI algorithm by AI Explained YouTube channel
tl;dv it's likely referring to an OpenAI paper called 'Verify Step by Step' where they use a verifier model to evaluate many generated solutions and choose the best one through self-consistency (majority vote). The idea is called "Test Time Compute" and the heuristic interpretation is that it affords the LLM to 'think' for longer than afforded by the computational steps in the LLM architecture itself (network layer depth). It also builds on a well known asymmetry result in computer science/mathematics which is that answer generation can be much more difficult than answer verification. Make the verifier model potentially easier to build.
HIP: AMD's answer to NVIDIA's CUDA
HIP stands for Heterogeneous-Compute Interface for Portability See for example tinygrad runtime backend: https://github.com/tinygrad/tinygrad/blob/master/tinygrad/runtime/ops_hip.py
Want more? Follow me on Twitter! @ricklamers