Cap3 Sequence Assembly Program Windows Apps

As well as most PHYLIP programs that take sequence. File into the BioEdit apps folder, download the. Cap.inf: Cap contig assembly program. As well as most PHYLIP programs that take sequence. File into the BioEdit apps folder, download the. Cap.inf: Cap contig assembly program. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. For a more advanced usage of CAP3, it is recommended to install the original software on your local computers.

Assembly CAP3

This mode of assembly uses the global assembly program CAP3, developed by Xiaoqiu Huang.Huang, X. DNA Sequence Assembly under Forward-Reverse Constraints. In preparation. (1998).

The CAP3 program can be accessed via the Gap4 interface through the 'Assembly'menu or as a stand alone program.

The CAP3 files for use with Gap4 must be obtained via ftp from theauthor, Xiaoqiu Huang.

Email Xiaoqiu Huang (huang@mtu.edu) stating that you want CAP3 foruse with gap4 and the operating system for which you need the program(one of: Solaris 2; Digital Unix; SGI Irix; linux x86). He will then contact you to arrange for the retrieval of thebinary files. The binary files are called cap3_s and cap3_create_exp_constraints. Make these executable (eg chmod a+x cap3_s) and move them to the directory$STADENROOT/$MACHINE-bin. The CAP3 options on the 'Assembly' menushould now be available.

Perform CAP3 assembly

The assembly works on either a file or list of reading names in experimentfile format (see section Experiment File). CAP3 assembles the readings and the alignmentsare written to the output window. New reading files are written in the destination directory in experiment file format. If the destination directory does not already exist, then it is created. These new files contain the additional information required to recreate the same assembly within Gap4. This is done by the addition of an AP line. See section Directed Assembly.

CAP3 uses forward-reverse constraints to correct errors in assembly of reads.The constraints file isgenerated automatically using the information in the experiment files by setting the 'Use constraint file' radiobutton to 'Yes'. The constraints file is named after the input file with the addition of '.con' ie if the input fileis called fofn, the constraint file is called fofn.con. Note that if the 'Use constraint file' is set to 'No', then any files of the formatinput_file.con will be deleted from the current directory. For further details,see section Further details about CAP3.

Sequence

CAP3 also can use quality values to determine the consensus sequence. If the quality values are present in the experiment files, then they are automaticallyused. For further details,see section Further details about CAP3.

Import CAP3 assembly

This mode imports the aligned sequences produced after CAP3 assembly intoGap4 and maintains the same alignment. Importing the files requires the directory containing the newly aligned readings, ie the destination directory used in 'Perform CAP3 assembly'. Readings which arenot entered are written to a 'list' or 'file' specified in the 'Save failures'entry box. This mode is functionally equivalent to 'Directed assembly'.See section Directed Assembly.

Perform and import CAP3 assembly

This mode performs both the assembly, see section Perform CAP3 assembly and the import, see section Import CAP3 assembly together. The assembled readingsare written to the destination directory and then are automaticallyimported from this directory into Gap4.

Stand alone CAP3 assembly

The program can be alternatively accessed as a stand alone program with the following command line arguments

cap3_s -format file_of_filenames [-out destination_directory]

Program

format is the file format of the file of filenames and is either in experiment file format or fasta format. Legal inputs are exp, EXP, fasta orFASTA.

file_of_filenames is the name of the file containing the reading names to beassembled for experiment files or a single file of readings in fasta format.

destination_directory is the name of a directory to which the newexperiment files are written to. The default directory is 'assemble'.

To use forward-reverse reading constraints, an appropriate file_of_filenames.con file must exist in the current directory. This file can be created from experiment files using the program:

cap3_create_exp_constraints file_of_filenames

where file_of_filenames is the same file as used for cap3_s. For fasta files,the constraint file is created using the program:

formcon File_of_Reads Min_Distance Max_Distance

See below for more information.

If quality values are present in the experiment files, then these will be usedautomatically. For fasta files, the quality values must be in a separate file of the type file_of_filenames.qual. See below for more information.

Further details about CAP3

Cap3 sequence assembly program windows apps download

The comments provided with CAP3 by Huang are detailed below.

CONTIG ASSEMBLY PROGRAM Version 3 (CAP3)

copyright (c) 1998 Michigan Technological UniversityNo part of this program may be distributed without prior writtenpermission of the author.

Proper attribution of the author as the source of the software wouldbe appreciated:

CAP3 uses forward-reverse constraints to correct errors in assembly of reads.CAP3 works better if a lot more constraints are used. If the file of sequencereads in FASTA format is named 'xyz', then the file of forward-reverseconstraints must be named 'xyz.con'. Each line of the constraint filespecifies one forward-reverse constraint of the form:

where ReadA and ReadB are names of two reads, and MinimumDistance andMaximumDistance are distances (integers) in base pairs. The constraint issatisfied if ReadA in forward orientation occurs in a contig before ReadB inreverse orientation, or ReadB in forward orientation occurs in a contig beforeReadA in reverse orientation, and their distance is between MinimumDistanceand MaximumDistance. We have a separate program to generate a constraint filefrom the sequence file.

The program reports whether each constraint is satisfied or not. The report isin file `xyz.con.results'. A sample report file is given here:

The first four columns are simply taken from the constraint file.

Line 1 indicates that the constraint is satisfied, where the actual distancebetween the two reads is given on the fifth column.

Line 2 indicates that the constraint is not satisfied in distance, that is,the two reads in opposite orientation occur in the same contig, but theirdistance (given on the fifth column) is out of the given range.

Line 3 indicates that the constraint is not satisfied.

Line 4 indicates that this constraint is the 10th one that links two contigs,where the 3' read of one contig is CPBKI23F in plus orientation and the5' read of the other is CPBKT37R in minus orientation. The informationsuggests that the two contigs should go together in the gap closure phase.Information about corrections made using constraints is reported in file named`.info'.

A feature to use quality values in determination of consensus sequences hasbeen added. The file of quality values must be named `xyz.qual', where`xyz' is the name of the sequence file. Only the sequence file is givenas an argument to the program. All the other input files must be in the samedirectory. CAP3 uses the same format of a quality file as Phrap. The qualityvalues of contig consensuses are given in file `xyz.contigs.qual'. Theresults of CAP3 go to the standand output.

CAP3 also uses a more effective filter to speed up overlap computation.

CAP3 assumes that the low-quality ends of sequence reads have been trimmed.Otherwise, CAP3 may not work well. We have a separate program to trimlow-quality ends and to produce a corresponging Phred quality file. If youneed this program, please let us know. We plan to remove this assumption inthe future.

The CAP3 program consists of two C source files: `cap3.c' and`filter.c'. To produce the executable code named cap3, use the command:

The usage is:

The file `output' contains the output of CAP3.

The features given above are new in CAP3. Below is for CAP2.

The CAP2 program assembles short DNA fragments into long sequences.CAP2 contains a number of improvements to the original versiondescribed in Genomics 14, pages 18-25, 1992. These improvements are:

  • Use of a more efficient filter for quickly detecting pairs of fragments that could not overlap.
  • Accurate evaluation of overlap strengths through the use of internally generated fragment-specific confidence vectors.
  • Identification of fragments from repetitive sequences and resolution of ambiguities in assembly of those fragments.
  • Identification of chimeric fragments.
  • Automated refinement of poorly aligned regions of fragment alignments

Cap3 Sequence Assembly Program Windows Apps Free

A chimeric fragment is made of two short pieces from non-adjacentregions of the DNA molecule. CAP2 may report a repeat structure like:

where F1, F2, I1, I2, I3, T1 and T2 are fragment names. The structure meansthat I1 ,I2 and I3 are from two copies of a repetitive element, F1 and F2flank the two copies at their 5' end, T1 and T2 flank them at their 3' end.CAP2 produces the two copies in the final sequence by resolving theambiguities in the repeat structure.

CAP2 is efficient in computer memory: a large number of DNA fragments can beassembled. The time requirement is acceptable; for example, CAP2 took 1.5hours to assemble 829 fragments of a total of 393 kb nucleotides into a singlecontig on a Sun SPARC 5. The program is written in C and runs on Sunworkstations.

The CAP2 program can be run with the -r option. If this option is specified,then the program identifies chimeric fragments, reports repeat structures andresolves them. Otherwise, these tasks are not performed.

Cap3 Sequence Assembly Program Windows Apps Download

Large integer values should be used for MATCH, MISMAT, EXTEND.

The comments given above are for CAP2. Written on Feb. 11, 95.

Below is a description of the parameters in the #define section of CAP.Two specially chosen sets of substitution scores and indel penaltiesare used by the dynamic programming algorithm: heavy set for regionsof low sequencing error rates and light set for fragment ends of highsequencing error rates. (Use integers only.)

In the initial assembly, any overlap must be of length at least OVERLEN,and any overlap/containment must be of identity percentage at leastPERCENT. After the initial assembly, the program attempts to joincontigs together using weak overlaps. Two contigs are merged if thescore of the overlapping alignment is at least CUTOFF. The value forCUTOFF is chosen according to the value for MATCH.

POS5 and POS3 are fragment positions such that the 5' end between base 1and base POS5, and the 3' end after base POS3 are of high sequencingerror rates, say more than 5%. For mismatches and indels occurring inthe two ends, light penalties are used.

A file of input fragments looks like:

A string after '>' is the name of the following fragment.Only the five upper-case letters A, C, G, T and N are allowedto appear in fragment data. No other characters are allowed.A common mistake is the use of lower case letters in a fragment.

Cap3 Sequence Assembly Program Windows Apps 2017

To run the program, type a command of form

Cap3 Sequence Assembly Program Windows Apps List

The output goes to the terminal screen. So redirection of theoutput into a file is necessary. The output consists of three parts:overview of contigs at fragment level, detailed display of contigsat nucleotide level, and consensus sequences.The output of CAP on the sample input data looks like:

'+' = direct orientation; '-' = reverse complement

This page is maintained bystaden-package.Last generated on 22 October 2002.

Cap3 Sequence Assembly Program Windows Apps


URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_88.html