## Cell High Throughput Imaging : are certain ranges of values for pairs of features more likely than expected? by Andreas Hadjiprocopis, Institute of Cancer Research, London (contact details: 'andreashad2' then the funny snail symbol then 'gmail.com' without the quotes)

This page describes my work analysing large populations of cells observed using High Throughput Screening equipment (e.g. the Opera from Perkin Elmer). This work was done during my time at the Institute of Cancer Research, London, in association with Dr Chris Bakal, Dr Julia Serro, Dr Rune Linding and Dr Janine Erler.

A large number of cells belonging to a variety of cell lines, grown in a variety of media and under different treatment conditions and durations (see here for details) have been imaged and their individual morphological features extracted (e.g. area, area of cytoplasm, ruffliness, etc.).

We investigate the co-occurence of certain ranges of values of pairs of features in relation to what is expected by the inidividual feature values distribution if there was no hidden relationship between the two features. In the graphs below, the orange bar-charts show the individual distributions of two features, e.g. area and NucbyCytoArea or WidthToLength. The range of the feature values is divided in 10 equal sub-ranges (horizontal axis, 1 to 10) and the number of cells in each of these sub-ranges is recorded (vertical axis). For example, in the first graph, the number of cells with area values in the first percentile - horizontal, first bar - (i.e. below the 10% of the maximum area observed for all these cells) is 1559. While the number of cells with area between 10% and 20% of the maximum area observed in that population is 1142.

According to these distributions, the expected probability of occuurence of given pairs of values can be calculated. This is what we would expect to occur if there was no hidden relationship between the two features. The expected value is the second number in the second row of each square in the central plot. For example, for this specific cell population, cells with area between 1% and 10% of the maximum (1st percentile) and NucbyCytoArea (nuclear area to cytoplasm area ratio) between 10% and 20% of the maximum (2nd percentile) are expected to be 208. This is the box at coordinates (x=1,y=2).

The expected value is compared to what was actually observed. The color of the box indicates deviation (blue: observed is less than expected, red: observed is more than expected) and the second row of numbers represents the observed/expected - the top row is a metric of fold change (fold change is less reliable because it does not take into account the number of cells, hence lacks confidence).

In our example, we see that small-area cells (e.g. in the 1st percentile) are observed to occur together with NucbyCytoArea (nuclear area to cytoplasm area ratio) in the 2nd percentile 62 times but, according to their individual distributions, they are expected to occur together 208 times. This is the box at coordinates (x=1,y=2) of the first plot.

This method of statistical analysis allows us to see if a pair of cell features in a given population is correlated and either do not usually occur together or they occur together more than if the two features were not related.

This method requires a large number of observations and this can only be achieved using High Throughput Screening techniques observing thousands of individual cells.

Sadly, western blots have no place here ...

Click on images to zoom in and out.  