Project title: Ensemble averaging for complex disease prediction in the large p small n domain from multiple data sources
Rationale:We hypothesise that novel ensemble averaging methods will provide a powerful engine for analysing the factors which most influence disease and provide a framework for future rare disease studies.
Objectives:Using RKD registry data, develop algorithms that deal with data of multiple or mixed type in a model-based framework without over-fitting from random noise, hence allowing identification of data sources most relevant for prediction (Objective 2). Construct graphical descriptors to easily represent uncertainty associated with the estimates from these algorithms (Objective 2). Generate generic code that operates on a wide range of possible disease models and discern between them using evidence from gathered data.
Expected Results:Novel statistical algorithms for clustering of mixed type data, model averaging, and model evaluation, to enhance predictive performance with multivariate temporo-spatial data. Insights into characteristic features of vasculitis and relationship with environmental factors.
FI: 4 mths to feed into WP3 deliverables