Discover more from Coding with Intelligence
Will Modelfile be the new Dockerfile for OSS LLMs?
Week 30 of Coding with Intelligence
Ollama described the Model File as "A model file is the blueprint to create and share models with Ollama." See https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md for details. Interestingly, the author previously created the Dockerfile spec,
he's now at JP Morgan building Ollama. Mistake: jmorganca is just his GH handle!
An npm tool released by the inventor of TypeScript. The core idea is to use types in your prompt to specify what you expect your output format to be, and to be able to validate the output using that typed definition you provided in the prompt. Very neat! Who’s building the Python version?
This is a really interesting variant for more guided language model sampling. CFGs are flexible enough to encode typed syntax like TypeScript or Python syntax all the way to simple YAML and JSON. ggml on fire? Thank you Evan Jones from Palo Alto Networks for contributing his work & ideas.
Interesting detail: it uses GGML under the hood.
SimPer introduces a self-supervised representation learning technique that addresses the shortcomings of previous self-supervised approaches. Specifically, previous methods overlook the intrinsic periodicity (i.e., the ability to identify if a frame is part of a periodic process) in data and fail to learn robust representations that capture periodic or frequency attributes. As a self-supervised method it has the ability to train on large amounts of unlabeled data, opening up the possibility of learning complex patterns in real-world data.
There's no shortage of ideas for accelerating inference. This paper from CMU introduces a neat tree-based inference approach that combines multiple smaller models.
“To curb hallucination, on the teacher side, we use contrastive decoding, which ensures that the rationales generated for true assertions differ as much as possible from the rationales generated for false assertions.” Hopefully using techniques like these we can welcome a future where hallucinations in LLMs are a thing of the past. https://arxiv.org/abs/2305.01879
Some of these are an eerily futuristic glimpse into what's to come for content production. Hollywood should probably pay attention to this Silicon powered wave.
You might be familiar with ONNX as a format for storing neural network weights and architecture. ONNX Runtime standardizes inference APIs and is able to dynamically adjust to an impressive list of environments. Currently handling over 1 trillion inferences a day this is clearly a production ready project.
Want more? Follow me on Twitter! @ricklamers