DiscoverySpace Scatter Plot

Currently Scatter Plot is best suited for visualizing a comparison of two sets of SAGE libraries. Here are some frequently asked questions regarding the DiscoverySpace Scatter Plot widget.

What do the contour curves on the graph represent?

The contours are drawn based on the Audic Claverie formula at three different cut-off values, 95%, 99%, and 99.9%. The graph legend specifies the colour code for the contour curves.

For more information on Audic Claverie computations please see the "The Significance of Digital Gene Expression Profiles" which is available online:

  • PubMed: Abstract.
  • Genome Research: Full document in HTML.
  • Genome Research: Full document in PDF.

What exactly is represented on an axis when there is more than one library on it?

Each of the axes can represent one or more libraries. In the case where there is more than one library on one axis, all the libraries are simply added together creating a single meta library. For example, lets say on the X axis there are two libraies A and B. Library A has 4 of tags with the sequence: GGGGGGGGGG and library B has 5 of the same tag. Then, the meta library generated out of libraries A and B will have 9 of tags with the sequence: GGGGGGGGGG. The same is also true for the total number of tags in the library;

totalnew = totalA + totalB

Is normalization taken into account when the total number of tags for the two axes are not the same?

The Audic-Claverie p-statistics takes into account the size of the libraries, as there is a normalization factor that is present in the algorithm. That means that you can compare two libraries, one with 10,000 tags and one with 500,000 tags without worrying about the size differences.

This is also true when there is more than one library on one or both of the axes. The meta library generated by pooling few libraries together can become large, but the Audic-Claverie statistics formula does take this into account.

Thus no normalization is done or is needed in Scatter Plot other than what is already considered by within the Audic-Claverie formula.

What would tag count 0.5 mean?

On the scatter plot, the first row of points have a Y value of 0.5, and the first column of points have a X value of 0.5. The 0.5 value is a replacement for zero. That is, if the graph reads 0.5 for the X or Y of any point, you should interpret the X or Y value to be zero.

This is simply because the graph is in log scale, and log(0) is undefined. Therefore, for any point whose X or Y value is zero, on the graph the zero value is replaced with 0.5.

Page last modified Jun 04, 2010