Will Modelfile be the new Dockerfile for OSS LLMs?

Jul 26, 2023

📰 News

Modelfile: a path to a "Dockerhub" for OSS models?
Ollama described the Model File as "A model file is the blueprint to create and share models with Ollama." See https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md for details. Interestingly, the author previously created the Dockerfile spec, ~~he's now at JP Morgan building Ollama~~. Mistake: jmorganca is just his GH handle!
TypeChat: all you need is types
An npm tool released by the inventor of TypeScript. The core idea is to use types in your prompt to specify what you expect your output format to be, and to be able to validate the output using that typed definition you provided in the prompt. Very neat! Who’s building the Python version?

📦 Repos

Grammar based sampling for Llama in ggml
This is a really interesting variant for more guided language model sampling. CFGs are flexible enough to encode typed syntax like TypeScript or Python syntax all the way to simple YAML and JSON. ggml on fire? Thank you Evan Jones from Palo Alto Networks for contributing his work & ideas.
Ollama: Get up and running with large language models locally
Interesting detail: it uses GGML under the hood.
SimPer: Simple Self-Supervised Learning of Periodic Targets
SimPer introduces a self-supervised representation learning technique that addresses the shortcomings of previous self-supervised approaches. Specifically, previous methods overlook the intrinsic periodicity (i.e., the ability to identify if a frame is part of a periodic process) in data and fail to learn robust representations that capture periodic or frequency attributes. As a self-supervised method it has the ability to train on large amounts of unlabeled data, opening up the possibility of learning complex patterns in real-world data.

📄 Papers

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference
There's no shortage of ideas for accelerating inference. This paper from CMU introduces a neat tree-based inference approach that combines multiple smaller models.
Amazon Research introduces SCOTT: Self-consistent chain-of-thought distillation
“To curb hallucination, on the teacher side, we use contrastive decoding, which ensures that the rationales generated for true assertions differ as much as possible from the rationales generated for false assertions.” Hopefully using techniques like these we can welcome a future where hallucinations in LLMs are a thing of the past. https://arxiv.org/abs/2305.01879

📱 Demos

RunwayML Gen-2 Image-to-Video
Some of these are an eerily futuristic glimpse into what's to come for content production. Hollywood should probably pay attention to this Silicon powered wave.

📚 Resources

ONNX Runtime
You might be familiar with ONNX as a format for storing neural network weights and architecture. ONNX Runtime standardizes inference APIs and is able to dynamically adjust to an impressive list of environments. Currently handling over 1 trillion inferences a day this is clearly a production ready project.

Want more? Follow me on Twitter! @ricklamers

Coding with Intelligence