Checking the quality of resolved genomes
The sequences of the resolved genomic paths can be found in resolved_paths.fasta
. Each entry in this FASTA file is a resolved genome (not a contig) and can be directly evaluated using a dedicated viral evaluation tool like CheckV. The following sections will walk you through how to setup and run CheckV.
Installing CheckV
The recommended way to install CheckV is using conda
.
# Create a new conda environment and install checkv
conda create -n checkv -c conda-forge -c bioconda checkv
# Activate checkv conda environment
conda activate checkv
You can also install using pip
.
pip install checkv
Download the CheckV database
checkv download_database ./
Now you need to to specify the CHECKVDB
location.
export CHECKVDB=/path/to/checkv-db
Running CheckV
Here is an example command to run CheckV on the resolved genomes.
checkv end_to_end resolved_paths.fasta checkv_resolved_paths -t 16
The end_to_end
option will run the full pipeline.
You can also run individual commands for each step in the pipeline as follows.
checkv contamination resolved_paths.fasta checkv_resolved_paths -t 16
checkv completeness resolved_paths.fasta checkv_resolved_paths -t 16
checkv complete_genomes resolved_paths.fasta checkv_resolved_paths
checkv quality_summary resolved_paths.fasta checkv_resolved_paths
CheckV outputs
CheckV will produce the following .tsv
files.
complete_genomes.tsv
- overview of putative complete genomes identifiedcompleteness.tsv
- overview of how completeness was estimatedcontamination.tsv
- overview of how contamination was estimatedquality_summary.tsv
- integrated quality results