Humans use inferred statistical properties of sequential events to smoothen subsequent actions by anticipatory movements. These anticipatory movements have been studied in the serial reaction time ...Show moreHumans use inferred statistical properties of sequential events to smoothen subsequent actions by anticipatory movements. These anticipatory movements have been studied in the serial reaction time (SRT) task, in which participants anticipate the target stimuli in learned sequences, however, under uncertainty, the participants seem to adhere to a centering strategy. It remains unclear whether this centering behavior is a statistically inferred way to compensate for the absence of sequence knowledge, using the center as an optimal anticipatory position. In this study, two state-of-the-art Deep Reinforcement Learning (Deep RL) algorithms (Proximal Policy Optimization (PPO) & Soft Actor-Critic (SAC)) are compared and employed to train artificial agents to investigate the scope of centering behavior, by manipulating the frequency distributions of target stimuli. While SAC evidently outperformed PPO in terms of performance and stability, both algorithms displayed an effect of frequency distribution on centering position. Specifically, a proportional shift toward more probable target stimuli, suggesting that centering behavior is indeed anticipatory behavior as a way to compensate for the absence of explicit sequence knowledge.Show less