Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
A stricter global sulfur regulation by International Maritime Organization MARPOL Annex IV is effective as of the beginning of 2020, but there is no monitoring system on whether the ships actually...Show moreA stricter global sulfur regulation by International Maritime Organization MARPOL Annex IV is effective as of the beginning of 2020, but there is no monitoring system on whether the ships actually comply with the sulfur cap. The thesis devises a systematic approach to a prototype of a sulfur compliance monitoring system using the state-of-the-art TROPOspheric Monitoring Instrument(TROPOMI) which measures the atmospheric presence of trace gases. Oceanic geographical coordinates are classified by the similarity in the concentration level of trace gas with the k-means clustering method and adequate averaging techniques. The choice of hyperparameters and the final results are statistically formulated and verified. The subsequent longitudinal analysis on the temporal trends of trace gas emission suggests that the sulfur dioxide measurements of TROPOMI are dominated by measurement noise. The thesis concludes with the outcome that the nitrogen dioxide measurements of TROPOMI can be well-utilized to backtrack the maritime anthropogenic activities such as the regional shipping route, which indicates a possibility to be further developed as a global monitoring system for both land and maritime emission.Show less
Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
In this Thesis, we explore the feasibility of the task to identify impact data in humanitarian documents. We approach this as a sentence classification task and create a human-labelled set of over...Show moreIn this Thesis, we explore the feasibility of the task to identify impact data in humanitarian documents. We approach this as a sentence classification task and create a human-labelled set of over 11,000 sentences extracted from documents related to the IFRC’s Disaster Relief Emergency Fund. Using this set, we compare various classification models and feature sets and show that it is possible to classify sentences containing impact data with a good performance. Our final model, a Linear Support Vector machine trained on a Document-Term Matrix of word bigrams, achieves a precision of 0.852 and a recall of 0.746 (F1 = 0.796) on a separated validation set of 1, 114 sentences. In a second part of our research, we describe techniques that can be applied when there are fewer human-labelled examples available. When performing brief experiments with the simplest of these techniques, we show that indeed it is possible to achieve the aforementioned performance on the validation set with 7, 454 fewer labelled examples in the training set (approximately 75% less). Our work can serve as an exploratory first step towards fully automated impact data extraction from text. The work has its limitations. For instance, we found that it is very difficult to define what is impact data when creating a labelled ground-truth, which influences the generalisability of our ground truth data set. Further work can focus on the impact data definition. Other ideas for future work are the investigation of newer (e.g. neural network-based) techniques for humanitarian text processing tasks such as this one. A continuation of our work on investigating techniques that can solve problems based on fewer labelled examples specifically for text from the humanitarian domain is also a valuable next step.Show less