Due to recent advances within the field of machine learning and computing power becoming more readily available, the use of machine learning within the field of psychology has increased. However,...Show moreDue to recent advances within the field of machine learning and computing power becoming more readily available, the use of machine learning within the field of psychology has increased. However, potential remains for greater use of machine learning within the field of psychology. In this study the usability and performance of 3 machine learning models namely K-nearest neighbors, the Random forest, and the Support vector machine algorithms were assessed when predicting gender, marital status, and family size from Big 5 personality measures and the Holland Code Career Test. Repeated cross-validation was combined with grid search to ensure performance measure accuracy and to optimize model accuracy and F1-score. The performances of the 3 models were compared to the performance of logistic regression to assess whether these models could outperform a model regularly used within psychology. The 3 models consistently outperformed the logistic regression under almost all conditions and proved far superior for groups sizes over 500 even outperforming logistic regression by 10 percentage points under some conditions. However, caution was advised due to wide confidence intervals for small group sizes (n ≤ 200). Therefore, a study was proposed with the aim to enhance predictions for small group sizes, focusing on feedforward neural networks, known to be able to capture complex relationships even with limited data. Addressing these aspects could improve the usability of machine learning in psychology settings involving small group sizes.Show less
There is a demand for high bandwidth down links from space to earth. A cubesat in a GEO could function as a relatively cheap access point to a high bandwidth communication channel with earth. This...Show moreThere is a demand for high bandwidth down links from space to earth. A cubesat in a GEO could function as a relatively cheap access point to a high bandwidth communication channel with earth. This thesis explores new ways to increase the bandwidth by identifying bottlenecks in the GEO- Earth communication channel and how to circumvent them. The diffrac- tion limit causes large beam spreading at GEO distance, holding back ad- vanced modulation techniques due to the inability to capture the whole wave front. In this case, a modulation scheme using only a few bits should be chosen, allowing to modulate as fast as possible. The low signal inten- sity can be detected with more sensitivity by making use of a quantum enhanced receiver. From GEO to Earth, data rates around 50 Gbps are possible. Additionally, the atmosphere introduces spatial incoherence. To mitigate the effects of the atmosphere, a modulation scheme should be chosen that exploits modulation vectors which are orthogonal to the spa- tial dimension, such as polarization or wavelength. This gives a modula- tion scheme with many degrees of freedom. To deal with the complexity, a variational auto-encoder deep neural network is used to act as the modu- lator and demodulator. The variational distribution is chosen to match the noise introduced by an atmospheric channel. Using this scheme, we were able to find encodings that increase the density of symbols in phase space relative to the noise. This approach is especially promising in a bandwidth limited channel.Show less
The application of denoising machine learning to STM data has several advantages, such as improving data quality, aiding visual interpretation of data, and speeding up measurement time. With...Show moreThe application of denoising machine learning to STM data has several advantages, such as improving data quality, aiding visual interpretation of data, and speeding up measurement time. With experimental data, the absence of a ground truth poses a problem for traditional supervised learning techniques. In this work, state-of-the art self-supervised machine learning techniques are applied to reduce noise in quasiparticle interference data of overdoped cuprates, using only the noisy measurements. The machine learning methods are shown to outperform traditional denoising methods. Further ideas to improve and generalize the denoising of quasiparticle interference data are proposed.Show less
Machine learning algorithms are frequently deployed for predictive classification problems. However, as implied by the No Free Lunch (NFL) theorem, not every algorithm is destined to perform well...Show moreMachine learning algorithms are frequently deployed for predictive classification problems. However, as implied by the No Free Lunch (NFL) theorem, not every algorithm is destined to perform well on a given data set. One is often interested whether predictive capacity of an algorithm is significantly better than chance level (better than chance) for a choice of the hyperparameter(s). Machine learning algorithms generally lack the intrinsic statistical framework of statistical learning algorithms to make statements about better than chance performance. In supervised binary classification problems, multiple methods have been proposed to do so. These have shown to be flawed. Arguably, this can be attributed to the idea that the NFL also applies to these better than chance methods, or in general to any tests, suggesting that performance of the test depends on the type of signal. Therefore, in this current project, we propose novel global test (GT) based tests that are in accordance with the signal detected by their respective learning algorithm. To do so, we reformulated two popular machine learning algorithms, k-nearest neighbors (kNN) and random forest, as empirical Bayesian linear models. It turned out we can not only construct tests for specific (combinations of) hyperparameters but also for sets of hyperparameters. Properties of these tests have been explored in simulated linearly and nonlinearly separable alternatives as well as in real world data. Results from our simulation studies indicated that our novel tests had competitive power characteristics compared to existing methods. Moreover, we demonstrated their applicability to real world data. Our finding indicated that our novel tests for kNN and random forest can be readily used to assess better than chance performance. Equally important, the exploited GT framework can be applied to construct tests for other learning algorithms. Ultimately, our tests and possible future GT based tests add to list of existing methods that each serve a niche in the detection of better than chance signal for a learning algorithm.Show less
Artefact classification is one of the main themes and an important practice since the beginnings of archaeology, while machine learning (ML) became one of the most efficient approaches to increase...Show moreArtefact classification is one of the main themes and an important practice since the beginnings of archaeology, while machine learning (ML) became one of the most efficient approaches to increase our knowledge in a number of disciplines. This thesis describes a ML model developed for the classification of pottery assemblages, identifying its benefits and limitations, focusing on the importance of artefacts features for the identification of vessel shape classes, and to what extent this kind of knowledge can be used to replicate classifications made by experts. The research also analyses different classes structures based on the ML model. The research dataset was based on an assemblage of pottery vessels representing nine shape classes and four archaeological sites from the Bronze Age Northeastern Syria, made available by the Arcane project. The classification methodology was based on principles of quantitative archaeology, using vessel measurements and categorical features, implemented by supervised and unsupervised learning ML algorithms and supporting methods from the scikit-learn and SciPy libraries. The Anaconda platform, the Jupyter notebook environment and ImageJ for image processing complete the main software used through the research. The research results indicate benefits and limitations in the application of ML models in the classification of pottery assemblages. The limitations are especially related to number of samples versus target classes, the homogeneity of the vessels context in the dataset, and the quality of data available for the samples. The results suggest that a ML model can be useful to experts, assisting in the identification of the most relevant artefact features and similarities among classes of artefacts, as well possible misclassifications, ultimately providing new insights into the classification of pottery assemblages in archaeology.Show less
The increasing reliance on ICT within the public sector has changed the working ways of governmental bureaucracies from a paper reality to a digital one, and governments are eager to use new...Show moreThe increasing reliance on ICT within the public sector has changed the working ways of governmental bureaucracies from a paper reality to a digital one, and governments are eager to use new technologies for their business operations and reap its benefits just as the private sector does. Since technological advancement is driven by the private sector, and humans are increasingly accustomed to the speed and efficiency that technology brings, citizens are expecting governments to adapt and digitize as well. As such, an important trend that is being experimented with is the usage of self-learning algorithms, particularly Artificial Intelligence or AI. Since AI runs on data, it is only logical that an organization such as the government which holds an abundance of data would like to put this to use. Data that is collected might hold certain patterns, if you can find such patterns and assume that the near future will not be much different from when the data was collected, predictions can be made. However, AI systems are often deemed opaque and inscrutable, and this can collide with the judicial accountability that governments have towards their citizens in the form of transparency. Based on the assumption that the information that is used by AI i.e. data and algorithms, is not similar to documentary information that governments are accustomed to, there are added obstacles for governments to overcome in order to achieve the desired effects of transparency. The goal of this research is to explore the barriers to transparency in governmental usage of AI in decision-making by analyzing governmental motivation towards (non-) transparency and how the complex nature of AI relates to this. The question that stems from this is: What are the obstacles related to being transparent in AI-assisted governmental decision-making? In the study, a comparison is made between the obstacles to transparency for documentary information and the obstacles that experts encounter in practice related to AI, a contribution follows. Based on the literature, it is hypothesized that governments are limited by privacy and safety issues, lack of expertise, cooperation and inadequate disclosure. The results show that the obstacles are more nuanced and an addition to the theory is appropriate. The most important findings being: that data and algorithms should not be treated as documentary information; the importance of the policy domain in determinant for the degree of transparency; that lack of cooperation causes multiple obstacles to transparency such as self-censoring, accountability issues, superficial debate, false promises, inability to explain and ill-suited systems; that more information disclosure isn’t always better; and that the public sector should rethink their overreliance on private sector business models. All these obstacles can be associated to losing sight of the fundamental function of government, serving citizens.Show less
This thesis investigates two interrelated issues: the tendency of automated decision-making (ADM) systems to exacerbate gender bias, and the extent to which current European Data Protection...Show moreThis thesis investigates two interrelated issues: the tendency of automated decision-making (ADM) systems to exacerbate gender bias, and the extent to which current European Data Protection legislation (GDPR) both promises and delivers a right to explanation of decisions reached by those systems. The thesis has high philosophical and societal relevance, and engages fluently with a variety of important discourses: technical discussions of artificial intelligence, feminist scholarship, and commentaries on EU legal texts. After an introduction on machine learning and algorithms, the thesis moves to examinating those parts in the GDPR that address ADM, in order to clarify the way they are regulated. In the second and in the third chapter, problems such as the black box, different types of bias, technological design and neutrality are discussed. Gender bias are presented and many cases are discussed in order to provide reason of this growing phenomenon. A central topic of investigation is that of data representativeness, or how women data lack from our daily infrastructure at a point that discrimination normally occurs. This thesis ultimately seeks to provide a new framework for the introduction of a new feminist ethics of technology, that addresses bias and data collection in an intersectional way and especially that claims for new regulations to be discussed.Show less
With data-generation becoming increasingly complex and automatized as a result of technological developments, using computers to perform data-selection, preprocessing and data-analysis has become...Show moreWith data-generation becoming increasingly complex and automatized as a result of technological developments, using computers to perform data-selection, preprocessing and data-analysis has become indispensable in many fields of physics and astronomy. Hence, acquiring some basic knowledge of machine-learning techniques should be an essential part of the curriculum of these subjects. However, courses on the subject are mainly aimed at future computer-scientists. In this study, we explore the potential of using the Emotiv EPOC+, a consumer-grade EEG-device, as an educational tool in a hands-on machine learning course, tailor-made for physics and astronomy students. For this, we perform various experiments with a single subject, and use elementary neural networks to perform a binary classification to identify events in the self-produced EEG-data. We find that the Emotiv is capable of producing data containing sufficient consistency within a single recording to detect blinks and full-arm motion with more then 90% accuracy. However, these results are not reproducible with the same neural network once the head-set has been removed from the head between recordings. This means the networks have to be trained anew in order to classify events in new data. For the Emotiv to serve as an educational tool in a machine learning course a better understanding of this difference in noise between recordings is necessary, and a standardized preprocessing to reduce noise should be developed.Show less