Folding networks, a generalisation of recurrent neural networks to tree structured inputs, are investigated as a mechanism to learn regularities on classical symbolic data, for example. The architecture, the training mechanism, and several applications in different areas are explained. Afterwards a theoretical foundation, proving that the approach is appropriate as a learning mechanism in principle, is presented: Their universal approximation ability is investigated- including several new results for standard recurrent neural networks such as explicit bounds on the required number of neurons and the super Turing capability of sigmoidal recurrent networks. The information theoretical learnability is examined - including several contribution to distribution dependent learnability, an answer to an open question posed by Vidyasagar, and a generalisation of the recent luckiness framework to function classes. Finally, the complexity of training is considered - including new results on the loading problem for standard feedforward networks with an arbitrary multilayered architecture, a correlated number of neurons and training set size, a varying number of hidden neurons but fixed input dimension, or the sigmoidal activation function, respectively.