Phables Usage
Phables run options can be found using the phables run -h
command.
Usage: phables run [OPTIONS] [SNAKE_ARGS]...
Run Phables
Options:
--output PATH Output directory [default: phables.out]
--configfile TEXT Custom config file [default:
(outputDir)/config.yaml]
--threads INTEGER Number of threads to use [default: 1]
--use-conda / --no-use-conda Use conda for Snakemake rules [default: use-
conda]
--conda-prefix PATH Custom conda env directory
--profile TEXT Snakemake profile
--snake-default TEXT Customise Snakemake runtime args [default:
--rerun-incomplete, --printshellcmds,
--nolock, --show-failed-logs]
--input PATH Path to assembly graph file in .GFA format
[required]
--reads PATH Path to directory containing paired-end reads
[required]
--minlength INTEGER minimum length of circular unitigs to consider
[default: 2000]
--mincov INTEGER minimum coverage of paths to output [default:
10]
--compcount INTEGER maximum unitig count to consider a component
[default: 200]
--maxpaths INTEGER maximum number of paths to resolve for a
component [default: 10]
--mgfrac FLOAT length threshold to consider single copy
marker genes [default: 0.2]
--evalue FLOAT maximum e-value for phrog annotations
[default: 1e-10]
--seqidentity FLOAT minimum sequence identity for phrog
annotations [default: 0.3]
--covtol INTEGER coverage tolerance for extending subpaths
[default: 100]
--alpha FLOAT coverage multiplier for flow interval
modelling [default: 1.2]
--longreads provide long reads as input (else defaults to
short reads)
--prefix TEXT prefix for genome identifier
-h, --help Show this message and exit.
If you use Phables in your work, please cite Phables as,
Vijini Mallawaarachchi, Michael J Roach, Przemyslaw Decewicz,
Bhavya Papudeshi, Sarah K Giles, Susanna R Grigson, George Bouras,
Ryan D Hesse, Laura K Inglis, Abbey L K Hutton, Elizabeth A Dinsdale,
Robert A Edwards, Phables: from fragmented assemblies to high-quality
bacteriophage genomes, Bioinformatics, Volume 39, Issue 10,
October 2023, btad586, https://doi.org/10.1093/bioinformatics/btad586
For more information on Phables please visit:
https://phables.readthedocs.io/
CLUSTER EXECUTION:
phables run ... --profile [profile]
For information on Snakemake profiles see:
https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
RUN EXAMPLES:
Required: phables run --input [assembly graph file]
Specify threads: phables run ... --threads [threads]
Disable conda: phables run ... --no-use-conda
Change defaults: phables run ... --snake-default="-k --nolock"
Add Snakemake args: phables run ... --dry-run --keep-going --touch
Specify targets: phables run ... print_stages
Available targets:
all Run everything (default)
preprocess Run preprocessing only
phables Run phables (and preprocessing if needed)
postprocess Run postprocessing (with preprocessing and phables if needed)
print_stages List available stages
Run options explained
--input
- assembly graph file in .GFA format--reads
- folder containing paired-end read files--minlength
- minimum length of circular unitigs to consider [default: 2000]--mincov
- minimum coverage of paths to output [default: 10]--compcount
- maximum unitig count to consider a component [default: 200]--maxpaths
- maximum number of paths to resolve for a component [default: 10]--mgfrac
- length threshold to consider single copy marker genes [default: 0.2]--evalue
- maximum e-value for phrog annotations [default: 1e-10]--seqidentity
- minimum sequence identity for phrog annotations [default: 0.3]--covtol
- coverage tolerance for extending subpaths [default: 100]--alpha
- coverage multiplier for flow interval modelling [default: 1.2]--longreads
- provide long reads as input. If this flag is not provided phables defaults to short reads--prefix
- prefix for genome identifier [default: None]--output
- path to the output directory [default:phables.out
]--configfile
- custom config file [default:(outputDir)/config.yaml
]--threads
- number of threads to use [default: 1]--use-conda
/--no-use-conda
- use conda for Snakemake rules [default:use-conda
]--conda-prefix
- custom conda env directory--snake-default
- customise Snakemake runtime args [default:--rerun-incomplete, --printshellcmds, --nolock, --show-failed-logs
]
Example usage
Assuming your assembly graph file is assembly_graph.gfa
and reads folder as fastq
, you can run phables
as follows.
Using short reads
# Preprocess data using 8 threads (default is 1 thread)
phables run --input assembly_graph.gfa --reads fastq --threads 8
Using long reads
# Preprocess data using 8 threads (default is 1 thread)
phables run --input assembly_graph.gfa --reads fastq --threads 8 --longreads
Note that you should provide the path to the GFA file to the --input
parameter and the folder containing your sequencing reads to the --reads
parameter.
The output of Phables is set by default to phables.out
. You can update the output path using the --output
parameter for phables run
as follows.
# Preprocess data using 8 threads (default is 1 thread)
phables run --input assembly_graph.gfa --reads fastq --output my_output_folder --threads 8
The phables run
command will run preprocessing steps, perform genome resolution and the perform postprocessing steps.
Output
Following is the folder structure of the Phables complete run.
phable.out
├── config.yaml # config file
├── logs # all log files
├── phables # final phables results
├── phables.log # phables master log
├── postprocess # postprocessing results
└── preprocess # preprocessing results
Phables will create 3 main folders preprocess
, phables
and postprocess
for the different stages of execution.
1. preprocess
- preprocessing results
The following preprocessing steps will be run and their corresponding files and folders can be found in the preprocess
folder.
- Obtain unitig sequences from assembly graph -
edges.fasta
- Map reads to unitig sequences and get BAM files -
temp/*.bam
andtemp/*.bai
- Calculate coverage of unitig sequences -
coverage.tsv
- Scan unitig sequences for single-copy marker genes -
edges.fasta.hmmout
- Scan unitig sequences for Prokaryotic Virus Remote Homologous Groups (PHROGs) -
phrogs_annotations.tsv
2. phables
- genome resolution results
The following files and folders can be found inside the phables
folder which are the main outputs of Phables.
resolved_paths.fasta
containing the resolved genomesresolved_phages
folder containing the resolved genomes in individual FASTA filesresolved_genome_info.txt
containing the path name, coverage, length, GC content and unitig order of the resolved genomesresolved_edges.fasta
containing the unitigs that make up the resolved genomesunresolved_phage_like_edges.fasta
containing all the unresolved phage-like unitigsall_phage_like_edges.fasta
containing sequences from all the phage-like components (both resolved and unresolved)resolved_component_info.txt
containing the details of the phage bubbles resolvedcomponent_phrogs.txt
containing PHROGs found in each component
3. postprocess
- postprocessing results
The following postprocessing steps will be run and their corresponding files and folders can be found in the postprocess
folder.
- Combine resolved genomes and unresolved edges -
genomes_and_unresolved_edges.fasta
- Obtain read counts for resolved genomes and unresolved edges -
sample_genome_read_counts.tsv
- Obtain mean coverage of resolved genomes and unresolved edges -
sample_genome_mean_coverage.tsv
- Obtain RPKM coverage of resolved genomes and unresolved edges -
sample_genome_rpkm.tsv
Step-wise usage
You can execute each of the preprocessing, phables and postprocessing steps individually if you wish to do so as follows.
Preprocessing only
You can use the following command to only run the preprocessing steps.
# Only preprocess data
phables run --input assembly_graph.gfa --reads fastq --threads 8 preprocess
Genome resolution only
You can use the following command to only run the genome resolution steps. Please make sure to have the preprocessing results in the output folder.
# Only run phables core using short reads
phables run --input assembly_graph.gfa --reads fastq --threads 8 phables
# Only run phables core using long reads
phables run --input assembly_graph.gfa --reads fastq --threads 8 phables --longreads
Postprocessing only
You can use the following command to only run the postprocessing steps.
# Only run phables core
phables run --input assembly_graph.gfa --reads fastq --threads 8 postprocess