Genetic Data: FASTQ, BAM and VCF

Dante Labs dati grezzi

With Dante Labs whole genomes, you always get your raw data. We give raw data because it represents your DNA, is yours, and is an asset for life: in the next months and years you will be able to use your raw data on new tools, by Dante Labs and by other organizations. 

Instead of keeping the raw data and forcing you to return to us, we give you the raw data. 

Raw data can be confusing. It is a lot of files. Some are very large and hard to understand.

In a nutshell:

  • the VCF SNP is the most commonly used file (ex. on third party websites), followed by the VCF INDEL

  • if you don't know what the FASTQ or BAM files are, it will be very hard to read them (they are 100 GB each and requires bioinformatics knowledge)

The table below has a short description that we hope you may find useful to understand what files you will receive by Dante Labs. 

When you sequence your genome with Dante Labs (Whole Genome, Whole GenomeZ, Whole GenomeL), you will get this data:


VCF stands for Variant Call Format. It is a standardized text file format for representing SNP, INDEL, SV and CNV variation calls.

SNPs (Single Sucleotide Polymorphisms, pronounced “snips”), are the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide.

This is the most used VCF (ex. on third party tools like


(Not available for Whole Genome L)


Indel is a molecular biology term for insertions or deletions in your DNA. The number of INDELs in human genomes is second only to the number of SNPs. They have a key role in your genetics.


(Not available for Whole Genome L)


SVs, or Structural Variants, are large DNA sequences that are inserted, inverted, deleted or duplicated within genomes.


(Not available for Whole Genome L)


A CNV (copy number variation) is when the number of copies of a particular gene varies from one individual to the next. Some cancers are believed to be associated with elevated copy numbers of particular genes.


(Not available for Whole Genome L)


Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map. BAM files are 90-100 gigabytes in size. They are generated by aligning the FASTQ files to the reference genome.


(Not available for Whole Genome L)


FASTQ files contain billions of entries and are about 90-100 gigabytes in size, making them too large to open in a normal text editor. FASTQ files are the ultimate raw data.


(Only one available for Whole Genome L)


If you are interested to learn more, we suggest: