LessWrong AI
· Communities
Tree Transformers: A step towards generalizing the transformer architecture
After a billion architectures and a trillion variations, I finally found a transformer architecture that intrigued me. And this essay is step one towards the theory and the sub-field on which it was built [1].I.The representation of any dataset that we wish to approximate/predict/learn the distribution of, has converge