An efficient way of performing clustering would be
select a subset of important features. We want to fist remove unimportant
features that contribute to noise, we can then reduce the data size for more
useful clustering. One approach here would be to use an unbiased method for
unsupervised data such as Principal Component Analysis (PCA) to extract
important features and perform dimension reduction. As we covered in A0.9, PCA
gives us a set of uncorrelated directions that are ordered by their variance. If
we assume that these directions are indeed most important for robust
clustering, then we can discard variables that have a large component in low
variance directions. The choice of PCA is also strategic because PCA gives a
low dimensional summary of the data, helps detect outliers, and can be used to
visualize the data in an interpretable manner as well. Using the first few PCs
that capture most of the variation in the data, we can use k-means clustering in the PCA sub-space. k-means clustering tries to optimize variance of the clusters and
is computationally cost-effective.

If we wanted to use a set of pre-determined features
based on biological knowledge, we could consider many variables that are
relatively easy to measure such as (1) length of the hairpin stem, (2) ratio of
purines to pyrimidines in the hairpin loop, and (3) distance from the 3′-end,
and (4) 5′-end of the mRNA, and then subsequently use k-means clustering on these 4 features. We are using k-means instead of a hierarchical
clustering algorithm because we have little basis to believe that the underlying
data has a hierarchical structure, nor do we necessarily want to recover a
hierarchy. Although it is a possible pitfall that this method of feature
selection can lead to a high clustering error as we may lose valuable
information by throwing away potentially useful features, k-means clustering will generate centroids that are easy to
understand and use for subsequent biological studies.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now
Written by

I'm Colleen!

Would you like to get a custom essay? How about receiving a customized one?

Check it out