Model proliferation continues: new LLMs you should know about - Yi, Aurora, DeepSeek
Week 48 of Coding with Intelligence
The fuzzy matching approach is probably something you can implement yourself but the toolbox of LlamaIndex for RAG is becoming better & better. Why reinvent the wheel!
Try it at https://chat.lmsys.org/
The chat model scores 73.8% on HumanEval and based on the extensive reporting on their GitHub repo it appears it’s not overfitting to any test data. This is the real deal folks!
Paid for by algorithmic trading firm XTX
Interesting approach to making it easier to create high quality instruction tuning datasets by Huawei Research. Paper: https://arxiv.org/abs/2311.13246
Applied to vision models, but could also be applied to LLMs across language tasks.
By Stanford PhD student Eric Mitchell
This is super cool. From pose estimations to fully animated character.
Watch the demo, super cool =)
Blog post breaking down the process of applying RLAIF on the OpenChat 3.5 model.
tl;dv it's likely referring to an OpenAI paper called 'Verify Step by Step' where they use a verifier model to evaluate many generated solutions and choose the best one through self-consistency (majority vote). The idea is called "Test Time Compute" and the heuristic interpretation is that it affords the LLM to 'think' for longer than afforded by the computational steps in the LLM architecture itself (network layer depth). It also builds on a well known asymmetry result in computer science/mathematics which is that answer generation can be much more difficult than answer verification. Make the verifier model potentially easier to build.
HIP stands for Heterogeneous-Compute Interface for Portability See for example tinygrad runtime backend: https://github.com/tinygrad/tinygrad/blob/master/tinygrad/runtime/ops_hip.py
Want more? Follow me on Twitter! @ricklamers