A large number of cells belonging to a variety of cell lines, grown in a variety of media and under different treatment conditions and durations (see here for details) have been imaged and their individual morphological features extracted (e.g. area, area of cytoplasm, ruffliness, etc.).
We built neural network models trying to:
predict the activation of NFKB given cell morphology (i.e. our inputs are cell morphological features).
predict cell line given cell morphology.
The models were accurate with 5-15% error on predicting unknown data (the data set was split 60% for training, 40% for testing - the error is on the 40% test set).
We have investigated the models output when only one morphological feature (the driver) value varies within its dynamic range while all other features are drawn randomly from their estimated distribution (distributions were built using all available data). For each driver value, 100,000 samples were drawn from the corresponding distributions and fed to the model. A density plot of the 100,000 outputs is constructed. So for each driver value we have a density plot of the output of the model. We considered this a good way to model the cell-to-cell variability.
The task of predicting the ratio of NFKB quantity in nucleus over that in cytoplasm utilised models with three outputs:
The task of predicting the cell line required one output, the cell line ID.
There were two data sets. The first one called 'JULIA' is very general with 22 cell lines, 8 treatment conditions and 2 treatment durations. The second one called 'JOAN' is more specific with 5 cell lines and two treatment conditions. The questions of the latter were very specific, how does the treatment condition and duration affect morphology of the carefully chosen cell lines.
The following animations show how the input varies (the first image on the left) and what the effect of this is on the output(s) - both input and outputs are density plots of 100,000 'simulated cells'. In the case of predicting a cell line, the output shows the histogram of predicted cell lines for each input value.
Here are the animations which require you to allow Adobe Flash Player to run (it will ask you) - if not already allowed. Each animation is less than 10MB in size but might take some time to load. ( mouse right-click on top of each animation to display the menu and then select 'PLAY' ):
JOAN dataset: the model's prediction for cell line is shown on the right as a histogram of predicted cell line for all all 100,000 'simulated cells' when value of morphological feature 'area' (left) varies from minimum to maximum. At the beginning, when area is small, the most likely (predicted) cell lines are #1 and #4. As the cell area increases, the most likely (predicted) cell lines become #6. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
JOAN dataset: the model's prediction for cell line is shown on the right as a histogram of predicted cell line for all all 100,000 'simulated cells' when value of morphological feature 'ruffliness' (left) varies from minimum to maximum. At the beginning, when cell ruffliness is small, the most likely (predicted) cell lines are #6 (and #4 is distant second). As the cell ruffliness increases, the most likely (predicted) cell lines become #1.
The model's prediction for quantity of NFKB (second image is NFKB in nucleus, third image is NFKB in cytoplasm, third image is ratio of NFKB in nucleus over that in cytoplasm) for a given value of morphological feature 'area' (left). JOAN dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for quantity of NFKB (second image is NFKB in nucleus, third image is NFKB in cytoplasm, third image is ratio of NFKB in nucleus over that in cytoplasm) for a given value of morphological feature 'SERedgecell' (which is a texture feature) (left). JOAN dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for cell line is shown on the right as a histogram of all 100,000 'simulated cells' prediction for a given value of morphological feature 'ruffliness' (left). JULIA dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for cell line is shown on the right as a histogram of all 100,000 'simulated cells' prediction for a given value of morphological feature 'skel_intens_border' (left). JULIA dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for quantity of NFKB (second image is NFKB in nucleus, third image is NFKB in cytoplasm, third image is ratio of NFKB in nucleus over that in cytoplasm) for a given value of morphological feature 'cytoplasm_area' (left). JULIA dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for quantity of NFKB (second image is NFKB in nucleus, third image is NFKB in cytoplasm, third image is ratio of NFKB in nucleus over that in cytoplasm) for a given value of morphological feature 'SERedgecyto' (which is a texture feature) (left). JULIA dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for quantity of NFKB (second image is NFKB in nucleus, third image is NFKB in cytoplasm, third image is ratio of NFKB in nucleus over that in cytoplasm) for a given value of morphological feature 'SERedgecell' (which is a texture feature) (left). JULIA dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')
The model's prediction for quantity of NFKB (second image is NFKB in nucleus, third image is NFKB in cytoplasm, third image is ratio of NFKB in nucleus over that in cytoplasm) for a given value of morphological feature 'area' (left). JULIA dataset. (mouse right-click on top of each animation to display the menu and then select 'PLAY')