Discover more from Coding with Intelligence
Anthropic's Claude 2: 4X cheaper than GPT-4-32k and up-to-date world knowledge
Week 28 of Coding with Intelligence
71.2% on HumanEval, 100k context window, API available. This is starting to look competitive to GPT-4. Check Jim Fan’s take in this tweet.
Last week I opened the newsletter with replit-code-instruct-glaive's eye-popping HumanEval score of 63.5%. I warned "if there is no contamination" and alas, it turns out the model was overfitting since HumanEval leaked into the training data. Now who's running the due diligence that's going to verify data leakage for Anthropic's Claude 2 HumanEval score of 71.2% ...
They are hiring additional ML engineers and committing 20% of their compute resources. Headed by Ilya Sutskever.
OpenRouter: route LLM calls to whatever model you prefer, dynamically
Check out their Streamlit app for intuition around the idea. Don’t forget to check the docs if you want to learn more.
Since GitHub Copilot Chat is in private preview we need a smarter way to provide the IDE context to ChatGPT's GPT-4 to discuss our project code. Unless you like copy & pasting a lot! Smolex uses a neat feature of ChatGPT plugin development: locally hosted plugins. It uses an AST based indexer and a vector database for retrieval. One thing I dislike is the required roundtrip of ChatGPT plugins. That makes it notably slower than asking it a direct question. Maybe a ChatGPT plugin design flaw? I can see how direct plugin execution could provide value through speedier prompt construction over LLM based "tool invocation", although the latter is more general and flexible. Maybe when LLM inference doesn't take multiple seconds anymore...
Choice is very welcome, but I’m not sure if I’m happy with the DX so far: the docs default to C#, videos over code snippets, it’s kind of all over the place. This planner example is interesting: https://learn.microsoft.com/en-us/semantic-kernel/get-started/quick-start-guide/using-the-planner
This repo is interesting to me because LLMs aren't magic, they're "just" models that have compressed the training data in a useful way. Typically deep learning models don't do well on "out of distribution" generation. Can prompt optimization be done by aligning with the data distribution the model was trained on? I think this is a very interesting area to explore further! Consider this a call for papers. Found anything? Please let me know.
Microsoft Research introduces the concept of dilated attention to achieve a linear increase in computational cost with respect to an increase in length of the input sequence. Interesting scan for ideas for model developers but benchmarks are limited and no trained model is introduced. The trade-off is invariably a less complete attention mechanism than the standard and hard to scale O(N^2) method where every input token is correlated to every other input token. How good their heuristic is will have to be shown through meaningful (ahem perplexity) performance benchmarks.
This paper likely is an exploration of what GPT-4 did. Main result is that MoE can give better performance at a third of the FLOPS. That makes MoE mainly interesting from a compute efficiency perspective. They don’t show data for an MoE model larger than 32B.
Easiest way to run Open Source GPT models on your local device.
Supports Chats and Completions. Explore the variety of “brushes”. A lot of interesting ideas. It makes heavy use of symbolic scanning of the repo and embeddings for making their assistant understand your code. Noteworthy is their support for multi-repo codebases.
This is a very useful tool for giving the model access to the relevant context in a much smarter way than copying and pasting. Until vector based or other retrieval methods become smart enough I don’t think a developer that knows its codebase and uses something like symbex can be beaten in terms of single shot accuracy.
The first informative resource I've found on AI alignment. Sohl works at Google Brain and invented Stable Diffusion. According to his theory he's definitely very incoherent: read the article to understand what I mean :)
Want more? Follow me on Twitter! @ricklamers