Statistical Inference of AI-identified Subcellular RNA Localizations
Genomineerde shortlist mtech+prijs
Genomineerde shortlist Eosprijs
Spatial transcriptomics was deemed Nature’s method of the year in 2020. Because of these novel technologies, it is now possible to analyze subcellular RNA localization in a systematic and large scale manner. This will allow us to answer interesting fundamental biological questions in a variety of biological domains, in health and disease. However, computational methods to characterize subcellular RNA localization are still in their infancy. We therefore aim to tackle the following questions as part of this master thesis study:
How does one automatically classify whether a gene shows a subcellular localized expression pattern or not?
● Using supervised classification
○ Which classification algorithm is best suited? And how do we train it optimally
○ What is the performance of the optimal model?
○ Can we aggregate model classifications of a gene over every cell, and can we create a reliable statistical test to discern the probability that a gene localizes non-randomly?
○ If we can classify patterns from non-patterns, can we classify which specific pattern it is?
● Can we infer subcellular localization directly from the latent space embedding of an in-house developed neural network model, without training a classifier first?
● Do these results on simulated data generalize to real biological/experimental data?
Meer lezen