Package Bio :: Package Cluster :: Class Record
[hide private]
[frames] | no frames]

Class Record

source code

A Record stores the gene expression data and related information
     contained in a data file following the file format defined for
     Michael Eisen's Cluster/TreeView program. A Record
     has the following members:
data:     a matrix containing the gene expression data
mask:     a matrix containing only 1's and 0's, denoting which values
          are present (1) or missing (0). If all elements of mask are
          one (no missing data), then mask is set to None.
geneid:   a list containing a unique identifier for each gene
          (e.g., ORF name)
genename: a list containing an additional description for each gene
          (e.g., gene name)
gweight:  the weight to be used for each gene when calculating the
          distance
gorder:   an array of real numbers indicating the preferred order of the
          genes in the output file
expid:    a list containing a unique identifier for each experimental
          condition
eweight:  the weight to be used for each experimental condition when
          calculating the distance
eorder:   an array of real numbers indication the preferred order in the
          output file of the experimental conditions
uniqid:   the string that was used instead of UNIQID in the input file.

Instance Methods [hide private]
 
__init__(self, handle=None)
Reads a data file in the format corresponding to Michael Eisen's Cluster/TreeView program, and stores the data in a Record object
source code
 
treecluster(self, transpose=0, method='m', dist='e') source code
 
kcluster(self, nclusters=2, transpose=0, npass=1, method='a', dist='e', initialid=None) source code
 
somcluster(self, transpose=0, nxgrid=2, nygrid=1, inittau=0.02, niter=1, dist='e') source code
 
clustercentroids(self, clusterid=None, method='a', transpose=0) source code
 
clusterdistance(self, index1=[0], index2=[0], method='a', dist='e', transpose=0) source code
 
distancematrix(self, transpose=0, dist='e') source code
 
save(jobname, geneclusters=None, expclusters=None)
saves the clustering results.
source code
 
_savekmeans(self, filename, clusterids, order, transpose) source code
 
_savedata(self, jobname, gid, aid, geneindex, expindex) source code
Method Details [hide private]

save(jobname, geneclusters=None, expclusters=None)

source code 
saves the clustering results. The saved files follow the convention
for Java TreeView program, which can therefore be used to view the
clustering result.
Arguments:
jobname:   The base name of the files to be saved. The filenames are
           jobname.cdt, jobname.gtr, and jobname.atr for
           hierarchical clustering, and jobname-K*.cdt,
           jobname-K*.kgg, jobname-K*.kag for k-means clustering
           results.
geneclusters=None:  For hierarchical clustering results,
           geneclusters is an (ngenes-1 x 2) array that describes
           the hierarchical clustering result for genes. This array
           can be calculated by the hierarchical clustering methods
           implemented in treecluster.
           For k-means clustering results, geneclusters is a vector
           containing ngenes integers, describing to which cluster a
           given gene belongs. This vector can be calculated by
           kcluster.
expclusters=None:  For hierarchical clustering results, expclusters
           is an (nexps-1 x 2) array that describes the hierarchical
           clustering result for experimental conditions. This array
           can be calculated by the hierarchical clustering methods
           implemented in treecluster.
           For k-means clustering results, expclusters is a vector
           containing nexps integers, describing to which cluster a
           given experimental condition belongs. This vector can be
           calculated by kcluster.