ARTADE (ARabidopsis Tiling-Array-based Detection of Exons) is a standard tool for the annotation of genome-wide tiling-array data in Arabidopsis. ARTADE is a program originally written by Dr. Tetsuro Toyoda, Genomic Sciences Center, RIKEN, Japan. ARTADE (current version 1.1.7) and its C++ source programs can be freely downloaded and used by academic users based on the license agreement.
>> Application and source code are available from Sourceforge
>> ARTADE Manual (pdf)
>> ARTADE Tutorial
>> Search database (HTML only)
Tiling array-driven elucidation of transcriptional structures based on maximum-likelihood and Markov models.
Tetsuro Toyoda and Kazuo Shinozaki
The Plant Journal. Volume 43 Issue 4 Page 611-621, August 2005
||Left figure shows the predicted gene structures and array-intensity data. A snapshot of OmicBrowse window in which the predicted gene structures are drawn in orange over and under the intensity graphs of flower, root, suspension cell culture, light grown seedling data is shown. The red or green graph indicates that the value is higher or lower than the median of overall intensities. The top row shows the MIPS annotations of Arabidopsis genes. (Click the left image for the full size snapshot.)
Tiling arrays of high-density oligonucleotide probes spanning the entire genome are powerful tools for the discovery of new genes. However, it is very difficult to determine the structure of the spliced product of a structurally unknown gene from noisy array signals. ARTADE estimates the precise splicing points and the exon/intron structure of a structurally unknown gene by maximizing the likelihood of observed intensities and sequences based on the combined model of a threshold-based likelihood and a bi-directional Markov model. It was confirmed that gene-expression values predicted by ARTADE without reference to structural information highly correlated with those calculated in the conventional fashion and based on gene structures. Thus, ARTADE opens up new possibilities for combined analysis of structurally known and unknown genes and the discovery of new coding and non-coding transcripts, and thus offers the first tool to annotate tiling-array data.
In order to better link gene prediction to expression data, it is necessary to invent a new method that estimate exon and intron regions based on a sequence of expression intensities mapped onto the genomic sequence, and to develop a statistical model that “converts” expression intensity signals to exon/intron signals. The proposed new statistical model predicts exons based on expression intensities and the underlying genomic sequences. In the new model, observed intensities are considered as a stochastic variable whose probability distribution is dependent on the hidden state indicating whether the position refers to an exon, an intron, or an inter-genic region. The statistical significance of expression is evaluated for each genomic region by a binomial distribution test in order to identify all significantly expressed putative exons. Subsequently, the prediction of the exon/intron structure proceeds bi-directionally toward the 5’ and 3’ ends by maximizing the likelihood of observed intensities and sequences based on a Markov model, which has been applied to biological sequence analysis but not to tiling data analysis. ARTADE was applied to whole-genome array data in Arabidopsis thaliana. The result is shown in the OmicBrowse database.