Abstract
The Office for National Statistics (ONS) is currently undertaking a substantial research program into using price information scraped from online retailers in the Consumer Prices Index ...including occupiers’ housing costs (CPIH). In order to make full use of these data, we must classify it into the product types that make up the basket of goods and services used in the current collection. It is a common problem that the amount of labeled training data is limited and it is either impossible or impractical to manually increase the size of the training data, as is the case with web-scraped price data. We make use of a semi-supervised machine learning (ML) method, Label Propagation, to develop a pipeline to increase the number of labels available for classification. In this work, we use several techniques in succession and in parallel to enable higher confidence in the final increased labeled dataset to be used in training a traditional ML classifier. We find promising results using this method on a test sample of data achieving good precision and recall values for both the propagated labels and the classifiers trained from these labels. We have shown that through combining several techniques together and averaging the results, we are able to increase the usability of a dataset with limited labeled training data, a common problem in using ML in real world situations. In future work, we will investigate how this method can be scaled up for use in future CPIH calculations and the challenges this brings.
In this thesis we use both observations and modelling to explore the gas content of galaxies. We use the L-Galaxies semi-analytic model to simultaneously match the Hɪ and stellar mass properties of ...model galaxies to observations using Markov Chain Monte Carlo methods. We add the observed Hɪ mass function as an extra model constraint and successfully match the Hɪ and stellar mass functions. However, the fit to the star formation properties has been weakened compared to without the Hɪ constraint. We suggest that this problem may be partially resolved by forming stars out of only H2 gas instead of the total cold gas. The environment in which a galaxy resides can affect its evolution. We use the counts in a fixed size cylinder method to estimate 3 environment measures for the GAMA survey. We use density and edge corrections to allow us to calculate estimates for every galaxy out to z = 0.4 in our flux limited sample. We then use these estimates to examine the effect of environment on the luminosity and stellar mass functions. Using Hɪ observations of the groups and galaxies in the ALFALFA and GAMA surveys we calculate Hɪ masses using the stacking technique. The use of the stacking technique has allowed us to exploit survey data that would not otherwise be possible. We stack galaxies in halo mass bins and calculate the Hɪ to halo mass fraction as a function of halo mass. We see a steady decline in the Hɪ fraction as we move to higher mass halos. These are the highest density environments where there is less cold gas. Combining this fraction with the halo mass function we are able to calculate a lower limit value for ΩHɪ of 1.8 ± 0.39 x 10-4h-¹.
Using the L-Galaxies semi-analytic model we simultaneously fit the HI mass function, stellar mass function and galaxy colours. We find good fits to all three observations at z = 0 and to the stellar ...mass function and galaxy colours at z = 2. Using Markov Chain Monte Carlo (MCMC) techniques we adjust the L-Galaxies parameters to best fit the constraining data. In order to fit the HI mass function we must greatly reduce the gas surface density threshold for star formation, thus lowering the number of low HI mass galaxies. A simultaneous reduction in the star formation efficiency prevents the over production of stellar content. A simplified model in which the surface density threshold is eliminated altogether also provides a good fit to the data. Unfortunately, these changes weaken the fit to the Kennicutt-Schmidt relation and raise the star-formation rate density at recent times, suggesting that a change to the model is required to prevent accumulation of gas onto dwarf galaxies in the local universe.
In this thesis we use both observations and modelling to explore the gas content of galaxies. We use the L-Galaxies semi-analytic model to simultaneously match the Hɪ and stellar mass properties of ...model galaxies to observations using Markov Chain Monte Carlo methods. We add the observed Hɪ mass function as an extra model constraint and successfully match the Hɪ and stellar mass functions. However, the fit to the star formation properties has been weakened compared to without the Hɪ constraint. We suggest that this problem may be partially resolved by forming stars out of only H2 gas instead of the total cold gas. The environment in which a galaxy resides can affect its evolution. We use the counts in a fixed size cylinder method to estimate 3 environment measures for the GAMA survey. We use density and edge corrections to allow us to calculate estimates for every galaxy out to z = 0.4 in our flux limited sample. We then use these estimates to examine the effect of environment on the luminosity and stellar mass functions. Using Hɪ observations of the groups and galaxies in the ALFALFA and GAMA surveys we calculate Hɪ masses using the stacking technique. The use of the stacking technique has allowed us to exploit survey data that would not otherwise be possible. We stack galaxies in halo mass bins and calculate the Hɪ to halo mass fraction as a function of halo mass. We see a steady decline in the Hɪ fraction as we move to higher mass halos. These are the highest density environments where there is less cold gas. Combining this fraction with the halo mass function we are able to calculate a lower limit value for ΩHɪ of 1.8 ± 0.39 x 10-4h-¹.