访问数据
ChIP-chip has become a popular technique to identify genome-wide in vivo protein-DNA interactions. With genome tiling microarrays commercially available from Affymetrix, Nimblegen and Agilent, more and more academic laboratories are adopting this technology to detect cis-regulatory elements in mammalian genomes. Despite the importance of ChIP-chip, there is a shortage of web servers developed for integrating the necessary downstream analysis functions with the capability of processing genome-scale ChIP-regions. So far all the big ChIP-chip papers in mammalian systems are published as a direct result of powerful bioinformatics support (e.g. Rick Young with David Gifford, Mike Snyder with Mark Gerstein, Kevin Struhl with Tom Gingeras, and Myles Brown with X. Shirley Liu), which is something not available for smaller labs. Cis-regulatory Element Annotation System (CEAS) integrates many useful tools to simplify ChIP-chip analysis for biologists. It can handle hundreds or thousands of regions from high throughput ChIP-chip experiments. Given genome-scale ChIP-regions in UCSC genome browser .bed file format, our CEAS server retrieves information from different sources to help with downstream analysis. Specifically, it provides the following information: 1. Fully repeat-masked genome DNA sequence for the ChIP-regions for qPCR validation and transcription factor motif finding. Current UCSC genome browser does not remove segmental duplication and simple repeats in its DNA retrieval function, which could create complications for qPCR primer design and sequence motif finding. 2. GC content and evolutionary conservation of each ChIP-region and their average. CEAS uses PhastCons conservation scores from UCSC Genome Bioinformatics, which is based on multiz alignment of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish genomic DNA. CEAS generates thumbnail conservation plot for each ChIP-region and the average conservation plot for all the ChIP-regions, which can be directly used in ChIP-chip biologists' manuscript. 3. ChIP-region nearby gene mapping. CEAS examines both upstream and downstream sequences on both strands to map the nearest RefSeq and miRNA gene up to 300KB away. In each direction, CEAS reports the distance between a ChIP-region and its nearest gene. When a ChIP-region is within a gene, CEAS reports whether the ChIP-region is mapped to 5'UTR, 3'UTR, coding exon, or intron. CEAS also provides a summary statistics for the location of all the ChIP-regions based on this gene mapping. 4. Transcription factor motif finding on the fully repeat-masked ChIP sequences. CEAS finds enriched TRANSFAC and JASPAR motifs in the ChIP-regions that are the putative binding motifs for the transcription factor of interest (against which ChIP-chip is conduced) and its cooperative binding partners. CEAS provides sequence logo, motif enrichment fold change and p-value for each enriched motif, and combine redundant enriched motifs. CEAS pre-computes all the motif occurrence information to store in the database, whereas current TRANSFAC motif-matching programs could not handle thousands of input sequences. In summary, CEAS retrieves useful information (e.g. sequence retrieval) for the validation of ChIP-chip experiments, assembles important knowledge (e.g. conservation plot, nearby gene mapping, and motif logos) to be included in biologists' publication, and generates useful hypothesis (e.g. transcription factor cooperative partner) for further study.