|
This year we will have two challenge datasets. One of them was already used in CAMDA 2006, but given its interest we propose its use this year again. This dataset, from the CDC Chronic Fatigue Syndrome Research Group, contains gene expression, proteomic, SNP, and clinical data. We hope this will foster integrative and biological goal-oriented analysis. Additionally we propose another dataset composed by 6000 arrays of diseased and normal human samples and cell lines collected from ArrayExpress and GEO. This second dataset poses challenging questions on large-scale data analysis and visualization.
CDC Chronic Fatigue Syndrome Research Group
New additional data for the 2007 contest. Contains proteomics, SNP and medication data:
Contest data 2006. Contains microarray, proteomics, SNP, and clinical data:
| Data |
*Subjects |
Description |
Format |
| microarray |
177 |
Single-channel, gold labelling |
CSV, TIFF |
| proteomics |
60 |
SELDI, 6 fractions per sample |
CSV, XML |
| SNPs |
50 |
singlue nucleotide polymorphisms |
CSV |
| clinical |
227 |
Blood profile, urine metabolite, physical, demographic |
CSV |
*Data on all subjects will be released if it becomes available.
Downloadable data (zip format and gzip/tar format) (ftp.camda.duke.edu/CAMDA06_DATASETS/):
Some additional information can be found at the CAMDA06 site: http://www.camda.duke.edu/camda06/datasets/index.html
If you experience problems downloading these files try the alternative location at ftp.camda.duke.edu/CAMDA06_DATASETS/
META-analysis dataset
This dataset contains almost 6000 arrays of diseased and normal human samples and cell lines collected from ArrayExpress and GEO. All the samples were hybridizated with the Affymetrix GeneChip Human Genome HG-U133A.
To download the samples go to the ArrayExpress database http://www.ebi.ac.uk/arrayexpress/ and query the experiment: E-TABM-185
|