Biochemical characterization of non-canonical DNAs has progressively turned out to be crucial due to its critical biological relevance. Generally, X-ray crystallography (Chandrasekhar et al., 2010; Neidle et al., 2008; Wang et al., 1981), nuclear magnetic resonance (NMR) spectroscopy (Patel et al., 2007) and circular dichroism (Kypr et al., 2009) are most widely used in the structure study of those non-canonical DNAs. However, those biophysical methods are unable to achieve the large-scale annotation due to the limitations of cost, time and energy. Therefore, computational prediction and mapping of non-canonical DNA based upon biological and biophysical properties are highly demanded for the genome-wide annotation and subsequent biological interpretation of non-canonical DNA. This genome-wide annotation enabled researchers to estimate the distribution and abundance of non-canonical DNA elements in the genome and provided insight into their nature of functions in DNA metabolism. Genome-wide analysis showed that DNA structure-induced genomic flexibility is a source of disease and evolution. Many researches W?1 followed the above methodology to thoroughly analyze the role of large-scale genome evolution and regulation of transition from B-DNA to non-B DNA (Huppert & Balasubramanian, 2005; Khuu et al., 2007; Moyzis et al., 1989; Schroth & Ho, 1995; Stallings et al., 1990; Todd et al., 2005; G. Wang et al, 2006).
The computational approaches to predict the non-canonical DNAs in the genome
Many strategies used for the identification of non-canonical DNA motifs are ab initio predictions based on the biophysical and biochemical properties. These include Pattern Discovery approaches; (i) word-based methods by regular expressions (RE) (ii) probabilistic sequence models based on position weight matrices (PWM) (iii) and Entropy based W?2 approaches (iiii). Word-based (string-based) methods depend on exhaustive enumeration by comparing and counting oligonucleotide frequencies using regular expressions (van Helden, André, & Collado-Vides, 1998). Word-based methods are globally optimal and exhaustiveW?3 . Probabilistic methods rely on probabilistic models of the interest regions. In these methods, motif W?4 is represented by a position probability matrix while the remaining sequence is modeled by a background modelW?5 . To calculate parameters of Probabilistic methods maximum likelihood estimation is used. These methods are not globally optimal but may converge to local W?6 optimal solution. Entropy basedW?7 approaches isW?8 determined by the principles of thermodynamics to calculate the difference of free energy (delG) between the B-DNA and non-canonical conformation of a sequence motif to evaluate the stability of the non-canonical DNAs. In the following sections, the known prediction methods of four representative non-canonical DNAs (CurciformW?9 , triplex, quadraplexW?10 and Z-conformation DNAs) are described.