This project employs reinforcement learning techniques to explore novel decoding strategies for quantum error correction, particularly focusing on the toric code, to address the challenge of...Show moreThis project employs reinforcement learning techniques to explore novel decoding strategies for quantum error correction, particularly focusing on the toric code, to address the challenge of maintaining stable quantum states for fault-tolerant quantum computing. Two game frameworks are established, including a novel dynamic game framework applicable to the training and measuring of RL agents and potential application in multiagent scenarios. The RL agents use Stable Baselines 3’s Proximal Policy Optimization and show to achieve Minimum Weight Perfect Matching performance on 3 × 3 toric code lattices in both the static and dynamic game frameworks.Show less
Humans use inferred statistical properties of sequential events to smoothen subsequent actions by anticipatory movements. These anticipatory movements have been studied in the serial reaction time ...Show moreHumans use inferred statistical properties of sequential events to smoothen subsequent actions by anticipatory movements. These anticipatory movements have been studied in the serial reaction time (SRT) task, in which participants anticipate the target stimuli in learned sequences, however, under uncertainty, the participants seem to adhere to a centering strategy. It remains unclear whether this centering behavior is a statistically inferred way to compensate for the absence of sequence knowledge, using the center as an optimal anticipatory position. In this study, two state-of-the-art Deep Reinforcement Learning (Deep RL) algorithms (Proximal Policy Optimization (PPO) & Soft Actor-Critic (SAC)) are compared and employed to train artificial agents to investigate the scope of centering behavior, by manipulating the frequency distributions of target stimuli. While SAC evidently outperformed PPO in terms of performance and stability, both algorithms displayed an effect of frequency distribution on centering position. Specifically, a proportional shift toward more probable target stimuli, suggesting that centering behavior is indeed anticipatory behavior as a way to compensate for the absence of explicit sequence knowledge.Show less