Discover more from Coding with Intelligence
Meet the offspring of Llama-2: a thriving ecosystem of commercial-use derivative models
Week 31 of Coding with Intelligence
We'll have to see if the results hold up to scrutiny, might be a nother case of overfitting. But the path (Llama-2-13b as a base and high quality data for fine-tuning) is promising! Furthermore, this is commercially usable because of the Llama-2 commercial use license. Overall, might be the best commercial-use OSS coding LLM available today! HF link: https://huggingface.co/SLAM-group/NewHope
It also includes RoPE scaling to allow for 10k input tokens. Impressive work! OSS models continuing to inch closer to proprietary models like GPT-3.5 et al.
They make use of the position interpolation paper we reported on earlier. It's available on Hugging Face too, licensed under the original llama2 license it seems. https://huggingface.co/togethercomputer/LLaMA-2-7B-32K
If you're not yet running evaluation for your LLM apps PromptTools might come in handy if you're keen on using an existing framework for defining evals.
Preventing hallucination is difficult, post-generation retrieval based verification strategies are interesting and this locally hosted plug-in allows you to use SERPs and webpage scraping to verify GPT's claims. A great step in the right direction!
Extracting maximum performance from LLMs for complex reasoning tasks is an open problem. This framework from researchers at UC San Diego, University of Florida and Mohamed bin Zayed University of AI attempt to use an explicit world and reward model for optimally navigating reasoning tasks using MCTS. They also write up their findings in a paper https://arxiv.org/abs/2305.14992
Great to see options in TypeScript for not just inference but also eval pipelines. Check out the `llmRubric` in the README, simple & efficient prompt evaluation!
It even shows you code diffs for selected code replacing!
Check out their abstractions and see which one you like more!
Interesting research on the inner workings of LLMs: "language model layers are typically relatively loosely coupled (ablations to one layer only affect a small number of downstream layers)"
Interesting idea for speeding up inference! However, it will only work for specific queries (that match the structure of a 'skeleton of thought') and the impact on inference quality is tricky to evaluate.
Benchmarking and evaluating web agents that perform intelligent autonomous action-taking behavior is tricky if you're not operating in a sandbox. The consequences can cause real-world damage/introduce liabilities and it's hard to run controlled experiments generating signal through the noise. This is an OpenAI Gym of sorts for web-agents and this approach is likely going to be adopted by all teams building in this area.
pip install gorilla-cliand use it like
gorilla "list all files that start with hello". Really cool project from a UC Berkeley team. Extension of https://gorilla.cs.berkeley.edu/ paper. Be careful, the commands are being sent unencrypted to a GCP hosted endpoint (presumably this is where the LLM runs/is called from).
A vision for LLM use I very much agree with. Our anthropomorphism kicks in when using LLMs directly (treating the system as another human) and it limits our thinking on how to best utilize this system. nostalgebraist is onto something here I believe.
Apparently someone loaned them a cool $20M to purchase 512 H100s. Supposedly they offer better prices than Lambda and low commitments. Startups only it seems.
By Eugene Yan, Senior Applied Scientist at Amazon. I largely agree with the taxonomy of the seven key patterns he's identified. It's a 65 min read, so be ready for a deep-dive. His treatment of evals is comprehensive and important, as evals are one of the areas most crucial in building good LLM applications but also in building better models (did trick X or Y yield more improvement).
Want more? Follow me on Twitter! @ricklamers