Discover more from Coding with Intelligence
The LLM build phase: these tools will help you build faster
Week 33 of Coding with Intelligence
YALCC: Yet Another LangChain Clone. Some interesting ideas around the API interface for constrained predictions though!
generate.regex(model, r"\s*([Yy]es|[Nn]o|[Nn]ever|[Aa]lways)", max_tokens=30)(prompt)
A framework for composing retrieval and language models for knowledge-intensive NLP. Good resource (paper & repo) to mine for ideas when building your own LLM apps.
Their stated goal for the framework: "making it 100x easier to experiment and ship AI applications using the LLMs with minimum boilerplate"
AdaTape presents a new approach to Transformers' traditional "fixed-cost for every input" approach. A dynamic approach gives the ability to tune the cost of inference depending on the complexity of the input. This makes intuitive sense, not all problems are equally difficult, so allocating a fixed compute budget for each input is quite counter-intuitive. Compare for example "I hate this movie! Sentiment:" and "An alternative strategy to modern economic policy should at least include the following ideas:".
This paper suggests that knowledge distillation through example chat conversations does not perform as well as initial experiments seemed to show.
Good to see research labs like Salesforce contribute to better evaluation of LLM Augmented Agents (LAA, new term?). Based on the results of the paper it looks like a “many agent” approach of specialization could work!
If you're more into videos you can also check out Andrej Karpathy's Let's build GPT https://www.youtube.com/watch?v=kCc8FmEb1nY If you're more interested in fine-tuning Llama you might want to check out this fine-tuning notebook https://github.com/mshumer/gpt-llm-trainer/blob/main/One_Prompt___Fine_Tuned_LLaMA_2.ipynb
It's good to know there are options beyond NVIDIA. The chip & associated instances on AWS are available today. Note you can also use the Neuron SDK for fine-tuning in addition to pre-training from scratch.
AMD GPUs are quickly becoming more common on both the training & inference side. This article gives a short overview of performance data gathered using an RX 7900 XTX using ROCm. It reaches 80% of the tokens/sec of an RTX 4090 for Llama2-7B/13B. Note, an RTX 4090 is about twice as expensive as the 7900 XTX at current retail prices. Resulting in cost-per-performance for AMD that outperforms the NVIDIA GPU by a large margin! Note these are relatively small language models and multi-GPU inference setup with datacenter GPUs (A100, H100, MI300) might not have similar results.
Want more? Follow me on Twitter! @ricklamers