AHA Functional Specification

Version 2.0

Introduction

This document describes the interface, input/output, and performance characteristics of pbaha.py, a scaffolding and gap-filling tool provided by the pbaha package from Pacific Biosciences. pbaha.py allows access to the AHA scaffolding algorithm outside of the PacBio pipeline controller smrtpipe.py and in the 2.1.0 release of SMRT Analysis is the only way to execute AHA on PacBio reads in FASTA files.

Software Overview

The pbaha package provides a command-line tool pbaha.py.

Functional Requirements

Command-line interface

pbaha.py is invoked from the command line. For example, a simple invocation is:

pbaha.py reads.fasta contigs.fasta --outputDir aha/

which requests that AHA runs by scaffolding contigs in contigs.fasta with the reads in reads.fasta and write the resulting scaffold into the directory aha/.

Invoking

pbaha.py --help

will provide a help message explaining all available options; they will be documented here shortly.

Input and output

pbaha.py requires two input files:

  • A reads file of PacBio reads with PacBio formatted ids (e.g. m110909_014825_sidney_c100172072554400000315046510191162_s2_p0/5/0_865)
  • A contigs file of high-confidence contigs in FASTA format. These contigs should be the output of an assembler or an assembly-like process, meaning they should have limited redundant sequence information.

Software interfaces

The pbaha module currently depends on the SMRT Analysis software package for bioinformatics tools such as the BLASR, gapFiller.py and nucmer executables. This dependency will be eliminated in favor of a reduced set of dependencies in a future release.

Performance Requirements

pbaha.py should finish in tens of core hours on a bacterial genome with reasonable (~60X) coverage of PacBio data.

Table Of Contents

Previous topic

pbaha

Next topic

pbalign

This Page