This thesis research aims to improve traffic sign detection within dashcam footage by using temporal information. Essentially, a video is a set of images displayed at a fast rate. Temporal...Show moreThis thesis research aims to improve traffic sign detection within dashcam footage by using temporal information. Essentially, a video is a set of images displayed at a fast rate. Temporal information lies in the similarity across subsequent frames. However, current state-of-the-art object detection frameworks only use single images. To test whether temporal information can increase the performance of a Convolutional Neural Network (CNN), we train three models: YoloV5, a 3D CNN and a 4D CNN. YoloV5 is used to benchmark the other models against a state-of-the-art framework for object detection. Second, the existing architecture of YoloV5 is adopted as a basis for the 3D CNN. After tuning the hyperparameters for the 3D CNN, performance is compared to YoloV5. Third, the 3D CNN is changed into a 4D CNN that processes sets of frames. By combining the frames within a set, the information in each frame is fused together, including the temporal information across the frames. We call this temporal information fusion (TIF). Comparing the performance of the 3D CNN to those of the 4D CNN shows the effect of TIF. In this research, a balanced dataset containing 444 sets of frames containing traffic signs from dashcam videos is used to train and test the models. The objective is to correctly classify the traffic signs on the frames. The results show that TIF can increase the accuracy of a CNN model by 2\%, purely through the addition of TIF. The main drawback of using TIF is an increase in processing time. Instead of a single image, the network needs to process a set of images, which naturally will take longer. The results in this research can form a basis to explore TIF in object detection further.Show less