Table of Contents

Contest datasets

As traditional in CAMDA contests, neither we nor the producers of the data can provide advice on the datasets to individuals as dealing with the files forms part of the analysis challenge. There is, however, an open forum for participants' free discussions relating to the contest data sets, and in which you are encouraged to participate.

We look forward to a lively contest!

Dataset

This year's conference will focus on the promise of gaining better insight from an integration of heterogeneous large-scale data. As contest data set, we have identified the Glioblastoma multiforme subset of The Cancer Genome Atlas (TCGA) as a particularly interesting challenge.

This repository is unusual in that it provides publicly, for several hundred patients, profiles of

complemented by a variety of clinical parameters and survival outcomes. Sometimes, additional results are available from alternative technologies / platforms. Note that the data can be downloaded at different abstraction levels, from raw (Level 1) via normalized (Level 2) to processed (Level 3), also facilitating integration by non-domain experts.

There are obviously a large number of interesting questions that can be addressed in the context of this data collection. Of course, analyses can focus on subsets of the data. The below outline is meant as inspiration and makes no claim to being comprehensive!

Practical challenges / insight
Statistical challenges
Software engineering challenges

Data download

The data is avalailable online from the TCGA portal for this collection:

Feel free to browse the site, all publicly available data relating to this collection can be used as part of the contest. Some relevant pointers:

References

This selection of papers provides a first impression of typical analyses already performed on the contest data set.

(1) The Cancer Genome Atlas Research Network. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways, Nature 455(7216):1061-1068.

(2) Freire, P., Vilela, M., Deus, H., Kim, Y.W., Koul, D., Colman, H., Aldape, K.D., Bogler, O., Yung, W.K.A., Coombes, K., et al. (2008) Exploratory analysis of the copy number alterations in glioblastoma multiforme. PLoS One 3(12):e4076.

(3) Cerami, E., Demir, E., Schultz, N., Taylor, B.S. and Sander, C. (2010) Automated network analysis identifies core pathways in glioblastoma. PLoS One 5(2):e8918.

(4) Bredel, M., Scholtens, D.M., Harsh, G.R., Bredel, C., Chandler, J.P., Renfrow, J.J., Yadav, A.K., Vogel, H., Scheck, A.C., Tibshirani, R., et al. (2009) A network model of a cooperative genetic landscape in brain tumors. JAMA 302(3):261-275.

Other relevant article references can be found in the TCGA list of publications.