Cell High Throughput Imaging : correlating activation of NFKB or cell-line with morphological features
by
Andreas Hadjiprocopis, Institute of Cancer Research, London (contact details: 'andreashad2' then the funny snail symbol then 'gmail.com' without the quotes)


A large number of cells belonging to a variety of cell lines, grown in a variety of media and under different treatment conditions and durations (see here for details) have been imaged and their individual morphological features extracted (e.g. area, area of cytoplasm, ruffliness, etc.).

We built neural network models trying to:

The models were accurate with 5-15% error on predicting unknown data (the data set was split 60% for training, 40% for testing - the error is on the 40% test set).

We have investigated the models output when only one morphological feature (the driver) value varies within its dynamic range while all other features are drawn randomly from their estimated distribution (distributions were built using all available data). For each driver value, 100,000 samples were drawn from the corresponding distributions and fed to the model. A density plot of the 100,000 outputs is constructed. So for each driver value we have a density plot of the output of the model. We considered this a good way to model the cell-to-cell variability.

The task of predicting the ratio of NFKB quantity in nucleus over that in cytoplasm utilised models with three outputs:

The task of predicting the cell line required one output, the cell line ID.

There were two data sets. The first one called 'JULIA' is very general with 22 cell lines, 8 treatment conditions and 2 treatment durations. The second one called 'JOAN' is more specific with 5 cell lines and two treatment conditions. The questions of the latter were very specific, how does the treatment condition and duration affect morphology of the carefully chosen cell lines.

The following animations show how the input varies (the first image on the left) and what the effect of this is on the output(s) - both input and outputs are density plots of 100,000 'simulated cells'. In the case of predicting a cell line, the output shows the histogram of predicted cell lines for each input value.


Here are the animations which require you to allow Adobe Flash Player to run (it will ask you) - if not already allowed. Each animation is less than 10MB in size but might take some time to load. ( mouse right-click on top of each animation to display the menu and then select 'PLAY' ):


The JULIA dataset now,