GeneSigDB provides several options for downloading data. You may individually select search results in order to create your own custom compressed file containing data files for your chosen gene signatures.

If you wish to download a larger set of our curated data, it may be easier to download all of GeneSigDB data. Therefore, we have provided several files here that encompass all of our public data. Please feel free to download them and use the data for any analysis purpose you require. If you use GeneSigDB, we would be grateful if you cite our publications in the Database issue of Nucleic Acids Research.

Note that the ALL_SIGS files below are in a processed GSEA MSigDB (gmt) format file. The first column contains the GeneSigDB identifier for signatures, signature id, which is normally composed from a PubMed Id along with a table or supplement name. The second column is a comment column that contains both the organism and aa signature name. All subsequent columns starting with the third column contain gene symbols identifying the members of the signature.

Current Release

The current data release of GeneSigDB is Release 4 released September 2011.

GeneSigDB Release 4

Release 4- released September 2011. It was annotated using EnsEMBL 63.0. GeneSigDB release 4 contains 3,515 gene signatures curated from publications that focus on gene signatures of cancer, lung disease, viral, immmue cells, development and stem cell biology.

GeneSigDB v4 Release Notes


More Resources for GSA in R


Archives of GeneSigDB (previous releases of GeneSigDB)

These are archives of previous GeneSigDB releases. The gmt files contain the mapped EnsEMBL identifiers and gene symbols. If you wish to regenerate mapping between any signature in an archived GeneSigDB release and any gene identifiers download the compressed archive below and use Biomart using the corresponding archives version of EnsEMBL,

GeneSigDB Release 3

Release 3 released December 2010. It was annotated using EnsEMBL 59.0. GeneSigDB release 3 contains 2,142 gene signatures curated from publications that focus on gene signatures of cancer, lung disease, viral and stem cell biology.

GeneSigDB v3 Release Notes

GeneSigDB Release 2 (March 2010)

Release 2 released March 2010. It was annotated using EnsEMBL 56.0. GeneSigDBr2 contains nearly 1000 gene signatures curated from publications that focus on gene signatures of cancer, viral and stem cell biology.

GeneSigDB v2 Release Notes

GeneSigDB Release 1 (August 2009)

Release 1 was released August 2009. It was annotated using EnsEMBL 55.0. GeneSigDBr1 contained 560 gene signatures from cancer and stem cell articles from human, mouse and rat.

GeneSigDB v1 Release Notes.

To regenerate mapping between any signature in GeneSigDB release 1.0 and gene identifiers other than EnsEMBL gene identifiers or gene symbol, download the compressed archive below. Then map the EnsEMBL gene identifiers in the xxx-standardized.txt files where xxx is the GeneSigDB ID using EnsEMBL 55.0.


Mapping EnsEMBL ids to other Gene Identifiers using Archives of EnsEMBL

Using biomaRt and Bioconductor

You can connect to the ensembl.org archive marts by specifying the URL of the archive version you would like to use:

	##--------
	# Load Libraries
	##--------
	library(biomaRt)

	##------------
	# Sample of genes for demo purposes	
	##------------
	 genes<-c("ENSG00000221968","ENSG00000170954","ENSG00000136146","ENSG00000132670")


	#----------
	#EnsEMBL Archive URLS
	#-----------
	urls= list(EnsEMBL56= "jul2009.archive.ensembl.org",
		EnsEMBL57= "mar2010.archive.ensembl.org",
		EnsEMBL58= "may2010.archive.ensembl.org",
		EnsEMBL59= "aug2010.archive.ensembl.org")
	
	

	# Select the Archive to use	
	EnsEMBLarchiveURL = urls[1]

	
	## list Marts
	
	print(EnsEMBLarchiveURL)
	print(listMarts(host=EnsEMBLarchiveURL,path="/biomart/martservice",archive=FALSE)[1:2,])
	

	##-------------
	# Short Example using biomaRt
	##------------
	
	mart = useMart("ENSEMBL_MART_ENSEMBL", host=EnsEMBLarchiveURL,path="/biomart/martservice",archive=FALSE)
	datasets <- listDatasets(mart)
	mart<-useDataset("hsapiens_gene_ensembl",mart)
	getBM(attributes=c("ensembl_gene_id", "affy_hg_u133a","hgnc_symbol","chromosome_name","band"),filters="ensembl_gene_id",genes, mart=mart)


	

Here are the results if you use different EnsEMBL archives. Note that although the gene information is mostly the same, updates to the genome mean that mapping of probes may change


	

	> # Select the Archive to use
	> EnsEMBLarchiveURL = urls[1]
	> print(EnsEMBLarchiveURL)
	$EnsEMBL56
	[1] "jul2009.archive.ensembl.org"
	
	> print(listMarts(host=EnsEMBLarchiveURL,path="/biomart/martservice",archive=FALSE)[1:2,])
		biomart              version
	1 ENSEMBL_MART_ENSEMBL           Ensembl 55
	2     ENSEMBL_MART_SNP Ensembl variation 55
	
	
	> mart = useMart("ENSEMBL_MART_ENSEMBL", host=EnsEMBLarchiveURL,path="/biomart/martservice",archive=FALSE)
	> datasets <- listDatasets(mart)
	> mart<-useDataset("hsapiens_gene_ensembl",mart)
	> getBM(attributes=c("ensembl_gene_id","affy_hg_u133a","hgnc_symbol","chromosome_name","band"),filters="ensembl_gene_id",genes, mart=mart)
	
	ensembl_gene_id affy_hg_u133a hgnc_symbol chromosome_name   band
	1 ENSG00000132670   213795_s_at       PTPRA              20    p13
	2 ENSG00000132670   213799_s_at       PTPRA              20    p13
	3 ENSG00000136146   217843_s_at        MED4              13  q14.2
	4 ENSG00000170954     205514_at      ZNF415              19 q13.42
	5 ENSG00000221968     204257_at       FADS3              11  q12.2
	6 ENSG00000221968   216080_s_at       FADS3              11  q12.2
	7 ENSG00000221968                     FADS3              11  q12.2

	
	

	> EnsEMBLarchiveURL = urls[4]
	> print(EnsEMBLarchiveURL)
	$EnsEMBL59
	[1] "aug2010.archive.ensembl.org"
	
	> print(listMarts(host=EnsEMBLarchiveURL,path="/biomart/martservice",archive=FALSE)[1:2,])
		biomart              version
	1 ENSEMBL_MART_ENSEMBL     Ensembl Genes 59
	2     ENSEMBL_MART_SNP Ensembl Variation 59
	
	> mart = useMart("ENSEMBL_MART_ENSEMBL", host=EnsEMBLarchiveURL,path="/biomart/martservice",archive=FALSE)
	> datasets <- listDatasets(mart)
	> mart<-useDataset("hsapiens_gene_ensembl",mart)
	> getBM(attributes=c("ensembl_gene_id","affy_hg_u133a","hgnc_symbol","chromosome_name","band"),filters="ensembl_gene_id",genes, mart=mart)
	
	ensembl_gene_id affy_hg_u133a hgnc_symbol chromosome_name   band
	1 ENSG00000132670   213795_s_at       PTPRA              20    p13
	2 ENSG00000132670   213799_s_at       PTPRA              20    p13
	3 ENSG00000136146                      MED4              13  q14.2
	4 ENSG00000136146   217843_s_at        MED4              13  q14.2
	5 ENSG00000170954     205514_at      ZNF415              19 q13.42
	6 ENSG00000221968     204257_at       FADS3              11  q12.2
	7 ENSG00000221968   216080_s_at       FADS3              11  q12.2
	8 ENSG00000221968                     FADS3              11  q12.2

	
	

To find the correct URL to use, click on the biomart link in the archive version of Ensembl that you are interested in. The list of archive versions can be found here: http://www.ensembl.org/info/website/archives/index.html.