Skip to content
LessWrong AI · Communities

Tree Transformers: A step towards generalizing the transformer architecture

After a billion architectures and a trillion variations, I finally found a transformer architecture that intrigued me. And this essay is step one towards the theory and the sub-field on which it was built [1].I.The representation of any dataset that we wish to approximate/predict/learn the distribution of, has converge