- Strains: 550
- WGS strains: 410
- Isotypes: 246
- Genome Version WS258
Wild isolate genomes are aligned and stored using the BAM format. BAMs are available in the table below.
Downloading All Alignment Data
You can download all alignment data using the script below. Before this script will work, you need to download and install wget. We recommend using Homebrew for this installation (Unix/Mac OS), or Cygwin on windows. See the FAQ for details on installing wget.
Information regarding alignment, variant calling, and annotation are available here.
We used samtools to identify single-nucleotide variant (SNV) sites as compared to the N2 reference genome. Variant data are provided as VCF or tab-delimited files.
VCFs generated from the variant calling pipeline are provided below.
- Soft-filter - Includes all variants and annotations. The QC status of variants is included.
- Hard-filter - Variants and genotypes that fail QC are removed.
- Imputed - An imputed dataset generated from the hard-filter VCF.
You can programmatically access specific regions of VCF files (rather then the entire file) from the command line:
Download Strain Data
Currently, we have performed variant calling across all wild isolates. We are working to add additional variant classes including insertion/deletion, structural, transposon, and additional variant classes.
We have recently performed an anaysis characterizing Transposon variation in C. elegans. The dataset will be further integrated with the site resources as time goes on. For now, the raw data are available below.Download Transposon Data
The following statistics were generated with
bcftools stats. The soft-filtered VCF for this release has had records and genotypes annotated but no data has been removed. The hard-filtered VCF removes records and genotypes that have been annotated with filters.
The hard-filtered VCF has stripped records and genotypes that have had filters applied.
Methods / Pipelines
Note: These methods operated on sequence data at the isotype level.
Sequences were aligned to WS245 using BWA (version 0.7.8-r455). Optical/PCR duplicates were marked with PICARD (version 1.111).
SNV calling was performed using bcftools (version 1.3).
Sites with greater than 10% missing or greater than 90% heterozygous calls across all isotypes were removed. Individual calls with the following parameters were removed:
- Depth of coverage (DP) <= 10
- Quality (QUAL) < 30
- Mapping Quality (MQ) < 40. Only applied to ALT calls.
- Number of high-quality non-reference bases (DV) / Depth of Coverage (DP) < 0.5. Applied only to ALT calls.
Variants were annotated using SnpEff (version 4.1g) using the WS241 database.
The C. elegans Natural Diversity Resource has three git repos which contain the software used to run the site.
A set of functions to process phenotype data, perform GWAS, and perform post-mapping data processing for C. elegans.
A python daemon that handles submitted mapping jobs from base.
cegwas-worker Runs on Google Compute Engine.
The software responsible for this website, which is run using Google App Engine.