RetNet

From The Age of Machine Intelligence
Jump to navigation Jump to search

Paper

https://arxiv.org/abs/2307.08621


Articles

https://medium.com/ai-fusion-labs/retentive-networks-retnet-explained-the-much-awaited-transformers-killer-is-here-6c17e3e8add8

This has really nice worked examples (using 2 vectors of embedding dimension 3). Take a look at these before you do anything else, and refer back if you get bogged down with algebra later.

https://medium.com/@choisehyun98/the-rise-of-rnn-review-of-retentive-network-a080a9a1ad1d

^ much more compact. But rigorous. Sehyun Choi has actually made a GitHub impl (syncdoth/retnet, see below)

  • I'd watch Manish Gupta's vid before this one.


YouTube

45 mins https://www.youtube.com/watch?v=C6Hi5UkXJhs&ab_channel=DataScienceGems

This is by Manish Gupta https://sites.google.com/view/manishg

Very good maths breakdown. I think that by starting here, you probably won't understand the big picture too well, but you'll be able to join some of the internal math dots, and grok that later


https://www.youtube.com/watch?v=B_iGSeG04qo&ab_channel=GabrielMongaras

Maybe in light of the previous video, this will make some sense. This whole YT channel looks like a hidden gem.


https://www.youtube.com/watch?v=ec56a8wmfRk&t=9s&ab_channel=YannicKilcher

Need to watch this again now I've watched the other two.


GitHub

https://github.com/microsoft/unilm/tree/master/retnet -- official MS repo (⭐️15k but it's an umbrella repo with many archs)

https://github.com/Jamie-Stirling/RetNet (⭐️750 ~4days) -- this seems to have the most buzz (google searching)

https://github.com/syncdoth/RetNet (⭐️90 ~2weeks)

https://github.com/fkodom/yet-another-retnet (⭐️43 ~2months) -- good README.md, says you use the PARALLEL formulation for training and the RECURRENT formulation for inference.

https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py -- Another umbrella repo, seems to be a single retnet.py among many other archs