We used XNMT as our sequence-to-sequence toolkit. The code for the pure and interleaved models can be found under xnmt/specialized_encoders/self_attentional_am.py.
xnmt/specialized_encoders/self_attentional_am.py