Amazon working on 2T AI model & Modular Deep Learning to the rescue for OS?
Week 46 of Coding with Intelligence
📰 News
LlamaIndex releases 'create-llama' equivalent to 'create-react-app
Amazon dedicates team to train 2T AI model codenamed 'Olympus'
📦 Repos
📄 Papers
Interesting stream of research, this provides a survey of modular architectures. "In this framework, units of computation are often implemented as autonomous parameter-efficient modules. Information is conditionally routed to a subset of modules and subsequently aggregated." Could be a great way for multiple OS specialized smaller models to become powerful in union (for example for Agent AI type workloads that depend on specialized knowledge throughout their action taking).
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Better decontamination is key to trusting benchmark results. With training datasets getting larger & larger and not always having access to the training data we need more sophisticated tools to discover contamination issues. Improved benchmarks helps us understand whether models are actually improving when changes are made in addition to aiding LLM selection. Great accompanying blog post https://lmsys.org/blog/2023-11-14-llm-decontaminator/
Language Models can be Logical Solvers
Interesting idea of fine-tuning on traces generated by a symbolic (logic evaluation) system. LLMs might be more flexible than commonly believed.
📱 Demos
AudioSR - Audio Super-resolution: demo + paper
Very impressive, especially for muffled audio clips. This could be great for enhancing old content.
📚 Resources
OWASP LLM Top 10 security guidelines
You better check your prompt injections :)
Benchmarking GPT-4 Turbo - A Cautionary Tale
Shows subtly depending on the information in the LLM weights can cause performance drops, even though the underlying LLM might not be less competent in performing the task given sufficient and unambiguous task information.
by Lilian Weng who works on AI safety at OpenAI
Scaling multimodal understanding to long videos by Google AI
At 3B the model is significantly smaller than previous attempts. From the blog post "At 3B parameters, Mirasol3B is compact compared to prior Flamingo (80B) and PaLI-X (55B) models. Finally, Mirasol3B outperforms the state-of-the-art approaches on video question answering (video QA), long video QA, and audio-video-text benchmarks."
Talk: SkyATC, Rethinking LLM Serving Stack (Berkeley, 10/20/2023)
Really interesting ideas, e.g. Conex an alternative to Docker containers more suitable for "fat" images needed for model serving.
Want more? Follow me on Twitter! @ricklamers