Phage Genome Annotation

Once you have identified the high-quality and complete genomes from the CheckV results, you can annotate them using a tool such as pharokka. The following sections will walk you through how to setup and run pharokka.

Installing pharokka

The recommended way to install pharokka is using conda.

# Create a new conda environment and install pharokka
conda create -n pharokka -c bioconda pharokka

# Activate pharokka conda environment
conda activate pharokka

Download and install the pharokka databases

install_databases.py -o <path/to/databse_dir>

Running pharokka

Here is an example command to run pharokka on the complete and high-quality resolved genomes.

pharokka.py -i complete_hq_genomes.fasta -o pharokka_output  -t 16 -d <path/to/database_dir>

Circular genome plot

You can use the pharokka_plotter.py implementation from pharokka to create circular genome plots with annotations.

Let's assume that you have already run pharokka on all of the complete and high-quality resolved genomes and the output is available in pharokka_output. You can pick one genome to plot. For example, let's consider the genome phage_comp_280_cycle_1.fasta which is phiX174.

We start by reorienting the genome to start from the terminase large subunit. You can look up the starting position and strand of the terminase large subunit from the output file pharokka_output/pharokka_cds_final_merged_output.tsv. For example, let's take the starting position as 617 on the positive strand. You can run pharokka again for this genome with reorientation as follows.

pharokka.py -i resolved_phages/phage_comp_280_cycle_1.fasta -o pharokka_output_phage_comp_280_cycle_1 -d <path/to/databse_dir> -t 16 --terminase --terminase_strand 'pos' --terminase_start 617

Then you can run the plotting command as follows.

pharokka_plotter.py -i resolved_phages/phage_comp_280_cycle_1.fasta -n phage_comp_280_cycle_1_plot -o pharokka_output_phage_comp_250_cycle_1 -t "Escherichia phage phiX174"