Model proliferation continues: new LLMs you should know about - Yi, Aurora, DeepSeek

Nov 30, 2023

📰 News

📦 Repos

CoachLM: an automatic instruction revision approach to LLM instruction tuning
Interesting approach to making it easier to create high quality instruction tuning datasets by Huawei Research. Paper: https://arxiv.org/abs/2311.13246
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Applied to vision models, but could also be applied to LLMs across language tasks.
Reference implementation for DPO (Direct Preference Optimization)
By Stanford PhD student Eric Mitchell

📱 Demos

🛠️ Products

📚 Resources

Llama Index introduces Llama Packs
Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF
Blog post breaking down the process of applying RLAIF on the OpenChat 3.5 model.
Conjecture of Q* OpenAI algorithm by AI Explained YouTube channel
tl;dv it's likely referring to an OpenAI paper called 'Verify Step by Step' where they use a verifier model to evaluate many generated solutions and choose the best one through self-consistency (majority vote). The idea is called "Test Time Compute" and the heuristic interpretation is that it affords the LLM to 'think' for longer than afforded by the computational steps in the LLM architecture itself (network layer depth). It also builds on a well known asymmetry result in computer science/mathematics which is that answer generation can be much more difficult than answer verification. Make the verifier model potentially easier to build.
HIP: AMD's answer to NVIDIA's CUDA
HIP stands for Heterogeneous-Compute Interface for Portability See for example tinygrad runtime backend: https://github.com/tinygrad/tinygrad/blob/master/tinygrad/runtime/ops_hip.py

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence