This has really nice worked examples (using 2 vectors of embedding dimension 3). Take a look at these before you do anything else, and refer back if you get bogged down with algebra later.
^ much more compact. But rigorous. Sehyun Choi has actually made a GitHub impl (syncdoth/retnet, see below)
- I'd watch Manish Gupta's vid before this one.
This is by Manish Gupta https://sites.google.com/view/manishg
Very good maths breakdown. I think that by starting here, you probably won't understand the big picture too well, but you'll be able to join some of the internal math dots, and grok that later
Maybe in light of the previous video, this will make some sense. This whole YT channel looks like a hidden gem.
Need to watch this again now I've watched the other two.
https://github.com/microsoft/unilm/tree/master/retnet -- official MS repo (⭐️15k but it's an umbrella repo with many archs)
https://github.com/Jamie-Stirling/RetNet (⭐️750 ~4days) -- this seems to have the most buzz (google searching)
https://github.com/syncdoth/RetNet (⭐️90 ~2weeks)
https://github.com/fkodom/yet-another-retnet (⭐️43 ~2months) -- good README.md, says you use the PARALLEL formulation for training and the RECURRENT formulation for inference.
https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py -- Another umbrella repo, seems to be a single retnet.py among many other archs