Treffer: The role of model implementation in neuroscientific applications of machine learning

Title:

The role of model implementation in neuroscientific applications of machine learning

Authors:

Abe, Taiga

Publication Year:

2024

Collection:

Columbia University: Academic Commons

Subject Terms:

Neurosciences, Computer science, Machine learning--Industrial applications, Machine learning--Statistical methods, Machine learning--Mathematical models, Deep learning (Machine learning), Cloud computing

Document Type:

Dissertation thesis

Language:

English

DOI:

10.7916/rbd2-p250

Availability:

https://doi.org/10.7916/rbd2-p250

Accession Number:

edsbas.EB5919FC

Database:

BASE

Weitere Informationen

In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation. Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings. First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS. Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating ...

Treffer: The role of model implementation in neuroscientific applications of machine learning

Weitere Informationen

Links

Zusatz-Funktionen