Discover more from Coding with Intelligence
8x faster inference with Flash Decoding - OS models are getting faster
Week 42 of Coding with Intelligence
8x inference speedup on long sequences. From the same author that brought Flash Attention, Tri Dao.
Want more? Follow me on Twitter! @ricklamers