Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



8 Commits

Repository files navigation


Assortment of VCF processing scripts


  • Ensure you have Python3 installed on your system

  • Clone or download the script from this repository

    git clone

Merge VCF Headers

This script allows you to merge headers from multiple VCF (Variant Call Format) files into a single output header file. It's useful when you have VCF files with similar headers and you want to consolidate them. This can then be used after or before running bcftools concat --naive[-force], where --naive-force ignores the differences in headers, for example, to modify the concatenated VCF files with all header information from both files, if you opt to include the merged headers only after concatenating your VCF files with bcftools (other concatenation tools).

Input can be gzipped or not. Output filename is optional, but recommended. If not given, will create merged_headers.txt in the working directory.

python -vcfs file1.vcf.gz file2.vcf.gz [...] -o output_merged_header.txt

Optionally, you can include headers other than the mandatory ##fileformat, ##contig, ##FILTER, ##INFO, and ##FORMAT. These are usually commands used to process the VCF files (e.g. bcftools, vcflib, etc), dates of file creation, version of tool, etc.

python -vcfs file1.vcf.gz file2.vcf.gz -o output_merged_header.txt -fullheader

Example usage with bcftools concat

  • Re-head VCF files before concatenation:
python -vcfs file1.vcf.gz file2.vcf.gz -o output_merged_header.txt
(cat output_merged_header.txt; zcat file1.vcf.gz | grep -v '##') | bgzip -f -c > reheaded_file1.vcf.gz
(cat output_merged_header.txt; zcat file2.vcf.gz | grep -v '##') | bgzip -f -c > reheaded_file2.vcf.gz
tabix -f -p vcf file1.vcf.gz 
tabix -f -p vcf file2.vcf.gz 
bcftools concat --naive file1.vcf.gz file2.vcf.gz  > reheaded_concatenated.bcf
  • Re-head VCF file after concatenation
bcftools concat --naive-force file1.vcf.gz  file2.vcf.gz > concatenated.bcf
bcftools view concatenated.bcf -Oz -o concatenated.vcf.gz
python -vcfs file1.vcf.gz file2.vcf.gz -o output_merged_header.txt -fullheader
(cat output_merged_header.txt; zcat concatenated.vcf.gz | grep -v "##") | bgzip -f -c > reheaded_concatenated.vcf.gz
tabix -f -p vcf reheaded_concatenated.vcf.gz

Parse and Calculate VAF from Strelka2 somatic VCF files

This script parses Strelka2 Variant Call Format (VCF) files and calculates Variant Allele Frequency (VAF) for INDELs or SNVs. It follows the recommendations from the developers.

Input can be gzipped or not. Output will be a VCF (not gzipped) and setting the filename is mandatory, as well as setting the type of variant based on the somatic output files from Strelka2. Information found here

python --input somatic.snvs.vcf.gz --output somatic.snvs.VAF.vcf --variant snv
python --input somatic.indels.vcf.gz --output somatic.indels.VAF.vcf --variant indel


No releases published


No packages published
