Perplexity raises $73.6M & new small models: LLaVA-ϕ a 2.7B Phi-2 based multimodal model
Week 2 of Coding with Intelligence
📰 News
Perplexity raises $73.6M from IVP
Does Google Search have some real competition on their hands? Try their RAG powered search and judge yourself: https://www.perplexity.ai/
📦 Repos
TACO: HumanEval alternative by BAAI
GPT-4 apparently scores 31.5 pass@1 on easy and 2 pass@1 for very hard. By Shandong Normal University and Peking University.
More details on 1.6T Switch Transformer from Google
Updated HF page and updated source code: https://github.com/google-research/t5x
AutoAWQ: implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference
📄 Papers
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Bootstrapping better models!
DeepSeek LLM paper: Scaling Open-Source Language Models with Longtermism
Interesting section about tool-integrated reasoning performance on several benchmarks like GSM8K MATH MGSM-zh and CMath.
Paper from the Mistral team about their Mixtral 8x7B model
LLaVA-ϕ: Efficient Multi-Modal Assistant with Small Language Model
Llava based on Phi-2 - extra relevant with the recent MIT licensing of Phi-2 by Microsoft.
Tencent trains Llama 2 Pro 8.3B: outperforms WizardCode/WizardMath
Code & weights at https://github.com/TencentARC/LLaMA-Pro
Google AI releases Instruct-Imagen paper: instruction tuning for image generation
📱 Demos
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Remarkably useful for content creators 🙌🏻
📚 Resources
Chess-GPT's Internal World Model
Really cool interpretability work, probing for the world model/decision model of an LLM trained on chess games
LLM Training and Inference with Intel Gaudi 2 AI Accelerators
Good post by Databricks with lots of interesting tables & figures.
Want more? Follow me on Twitter! @ricklamers