RetNet
Paper
https://arxiv.org/abs/2307.08621
Articles
This has really nice worked examples (using 2 vectors of embedding dimension 3). Take a look at these before you do anything else, and refer back if you get bogged down with algebra later.
https://medium.com/@choisehyun98/the-rise-of-rnn-review-of-retentive-network-a080a9a1ad1d
^ much more compact. But rigorous. Sehyun Choi has actually made a GitHub impl (syncdoth/retnet, see below)
- I'd watch Manish Gupta's vid before this one.
YouTube
45 mins https://www.youtube.com/watch?v=C6Hi5UkXJhs&ab_channel=DataScienceGems
This is by Manish Gupta https://sites.google.com/view/manishg
Very good maths breakdown. I think that by starting here, you probably won't understand the big picture too well, but you'll be able to join some of the internal math dots, and grok that later
https://www.youtube.com/watch?v=B_iGSeG04qo&ab_channel=GabrielMongaras
Maybe in light of the previous video, this will make some sense. This whole YT channel looks like a hidden gem.
https://www.youtube.com/watch?v=ec56a8wmfRk&t=9s&ab_channel=YannicKilcher
Need to watch this again now I've watched the other two.
GitHub
https://github.com/microsoft/unilm/tree/master/retnet -- official MS repo (⭐️15k but it's an umbrella repo with many archs)
https://github.com/Jamie-Stirling/RetNet (⭐️750 ~4days) -- this seems to have the most buzz (google searching)
https://github.com/syncdoth/RetNet (⭐️90 ~2weeks)
https://github.com/fkodom/yet-another-retnet (⭐️43 ~2months) -- good README.md, says you use the PARALLEL formulation for training and the RECURRENT formulation for inference.
https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py -- Another umbrella repo, seems to be a single retnet.py among many other archs