The recent popularization of machine learning as a new paradigm in computer science provides interesting opportunities for explaining phenomena of collective motion in living systems, as for...Show moreThe recent popularization of machine learning as a new paradigm in computer science provides interesting opportunities for explaining phenomena of collective motion in living systems, as for example flocks of birds or schools of fish. In this thesis we develop a model for collective motion using multi-agent reinforcement learning with orientation-based rewards, a new type of reward system that has not yet been found in literature. While the developed model is in principle generally applicable to all forms of collective motion observed in nature, we use the language of the flocking behaviour of birds as a particular example to frame our model. The birds have the option to either fly into an instinctive direction or act based on a Viscek-type of interaction with their neighbors, and are rewarded maximally when the resulting direction of movement is some predetermined prefered direction. The model distinghuishes between leaders that instinctively move towards this direction and followers that do not. We show that collective motion into this prefered direction emerges from this model, but only with a minimum of 1.23 encounters with neighbours on average, of which a minimal fraction of 0.2 should be leaders, which on average roughly corresponds to at least one encounter with a leader every four timesteps. These lower bounds are rudimentary estimates, as the present study serves mainly as a proof of concept that collective motion can emerge from this new type of model. Additionally it is suggested that, using deep reinforcement learning, this model can be viewed as a reinforcement learning extension of the Vicsek model.Show less