To display SegMantX modules and syntax:
SegMantX -h
The following will be displayed:
Options: SegMantX.py [ chain_self_alignments | chain_alignments | visualize_chains | fetch_nucleotide_chains | generate_alignments | test_modules | help | version | citation | start_app ]
SegMantX generate_alignments -h
The following will be displayed:
usage: SegMantX [-h] -q QUERY_FILE [-s SUBJECT_FILE] [-b BLAST_OUTPUT_FILE] [-a ALIGNMENT_HITS_FILE] [-Q] [-S] [-SA] [-e EVALUE]
[-i MIN_IDENTITY_PERCENTAGE] [-T NUMBER_OF_THREADS] [-W WORD_SIZE]
module
Computes alignments for chaining modules.
positional arguments:
module Name of the module being executed.
options:
-h, --help show this help message and exit
-q, --query_file QUERY_FILE
Path to the query nucleotide FASTA file (required).
-s, --subject_file SUBJECT_FILE
Path to the subject nucleotide FASTA file (required for sequence comparison).
-b, --blast_output_file BLAST_OUTPUT_FILE
Path to the output file for BLASTn results (Default: blast_output.txt).
-a, --alignment_hits_file ALIGNMENT_HITS_FILE
Path to the output file containing (main) alignment hit data for the chaining process (Default: alignment_hits.tsv).
-Q, --is_query_circular
Indicates if the query sequence is circular (Default: False).
-S, --is_subject_circular
Indicates if the subject sequence is circular (Default: False).
-SA, --self_sequence_alignment
Indicates to perform a self-sequence alignment (Default: False).
-e, --evalue EVALUE E-value threshold for BLASTn (Default: 1e-9).
-i, --min_identity_percentage MIN_IDENTITY_PERCENTAGE
Minimum percentage identity for BLASTn matches (Default: 60).
-T, --number_of_threads NUMBER_OF_THREADS
Number of threads for BLASTn search (Default: 1).
-W, --word_size WORD_SIZE
Word size for BLASTn search (increase it for large genomes) (Default: 11).
SegMantX chain_self_alignments -h
The following will be displayed:
usage: SegMantX [-h] -i INPUT_FILE [-B] [-G MAX_GAP] [-SG SCALED_GAP] [-Q] [-L SEQUENCE_LENGTH] [-f FASTA_FILE] [-o OUTPUT_FILE] [-ml MIN_LENGTH]
module
Chains local alignments from self-sequence alignment (e.g., duplication detection).
positional arguments:
module Name of the module being executed.
options:
-h, --help show this help message and exit
-i, --input_file INPUT_FILE
Input file received from 'generate_alignments' (required, i.e., tab-delimited five columns: q.start, q.end, s.start, s.end, identity).
Alternatively, provide BLAST output format 7 and use --blast_outfmt7 flag).
-B, --blast_outfmt7 Indicates if the input file is BLAST output format 7 (Default: False).
-G, --max_gap MAX_GAP
Maximum gap size between alignment hits for chaining (default: 5000).
-SG, --scaled_gap SCALED_GAP
Minimum scaled gap between alignment hits for chaining (Default: 1.0).
-Q, --is_query_circular
Indicates if the query sequence has a circular sequence topology (Default: False).
-L, --sequence_length SEQUENCE_LENGTH
Size of the sequence (is required with circular sequence topology, or, otherwise, provide fasta file (i.e., using
--fasta_file)) (Default: None).
-f, --fasta_file FASTA_FILE
Fasta file to read out the sequence length. Required if the sequence topology is circular and --sequence_length is not
provided manually.
-o, --output_file OUTPUT_FILE
Filename of the chaining output file (Default: chaining_output.tsv).
-ml, --min_length MIN_LENGTH
Minium length of alignment hits for chaining (default: 200).
SegMantX chain_alignments -h
The following will be displayed:
usage: SegMantX [-h] -i INPUT_FILE [-B] [-G MAX_GAP] [-SG SCALED_GAP] [-Q] [-S] [-LQ SEQUENCE_LENGTH_QUERY] [-LS SEQUENCE_LENGTH_SUBJECT]
[-o OUTPUT_FILE] [-ml MIN_LENGTH] [-fq FASTA_FILE_QUERY] [-fs FASTA_FILE_SUBJECT]
module
Chains local alignments from sequence alignment.
positional arguments:
module Name of the module being executed.
options:
-h, --help show this help message and exit
-i, --input_file INPUT_FILE
Input file received from 'generate_alignments' (required, i.e., tab-delimited five columns: q.start, q.end, s.start, s.end, identity).
Alternatively, provide BLAST output format 7 and indicate --blast_outfmt7 flag).
-B, --blast_outfmt7 Indicates if the input file is BLAST output format 7 (Default: False).
-G, --max_gap MAX_GAP
Maximum gap size between alignment hits for chaining (default: 5000).
-SG, --scaled_gap SCALED_GAP
Minimum scaled gap between alignment hits for chaining (Default: 1.0).
-Q, --is_query_circular
Indicates if the query sequence has a circular sequence topology (Default: False).
-S, --is_subject_circular
Indicates if the subject sequence has a circular sequence topology (Default: False).
-LQ, --sequence_length_query SEQUENCE_LENGTH_QUERY
Size of the query sequence (is required with circular sequence topology, or, otherwise, provide fasta file (i.e., using
--fasta_file_query)) (Default: None).
-LS, --sequence_length_subject SEQUENCE_LENGTH_SUBJECT
Size of the subject sequence (is required with circular sequence topology, or, otherwise, provide fasta file (i.e., using
--fasta_file_subject)) (Default: None).
-o, --output_file OUTPUT_FILE
Filename of the chaining output file (Default: chaining_output.tsv).
-ml, --min_length MIN_LENGTH
Minium length of alignment hits for chaining (default: 200).
-fq, --fasta_file_query FASTA_FILE_QUERY
Fasta file to read out the sequence length of query. Required if the sequence topology of query is circular and --sequence_length_query is not provided manually.
-fs, --fasta_file_subject FASTA_FILE_SUBJECT
Fasta file to read out the sequence length of subject. Required if the sequence topology of subject is circular and --sequence_length_subject is not provided manually.
SegMantX visualize_chains -h
The following will be displayed:
usage: SegMantX [-h] -i INPUT_FILE [-LQ SEQUENCE_LENGTH_QUERY] [-LS SEQUENCE_LENGTH_SUBJECT] [-gf GENBANK_FILE] [-o OUTPUT_FILE] [-S SCALE]
[-fq FASTA_FILE_QUERY] [-fs FASTA_FILE_SUBJECT] [-QIS] [-W WIDTH] [-H HEIGHT]
module
Generates a segmentplot (i.e., similar to dotplot) to visualize yielded chains for a sequence.
positional arguments:
module Name of the module being executed.
options:
-h, --help show this help message and exit
-i, --input_file INPUT_FILE
Output file from chaining results as input (required).
-LQ, --sequence_length_query SEQUENCE_LENGTH_QUERY
Size of the query sequence (is required with circular sequence topology, or, otherwise, provide fasta file (i.e., using
--fasta_file_query)) (Default: None).
-LS, --sequence_length_subject SEQUENCE_LENGTH_SUBJECT
Size of the subject sequence (is required with circular sequence topology, or, otherwise, provide fasta file (i.e., using
--fasta_file_subject)) (Default: None).
-gf, --genbank_file GENBANK_FILE
Genbank file to visualize features (i.e., CDS & pseudogenes as rectangles).
-o, --output_file OUTPUT_FILE
Output file: Interactive plot (i.e., html file).
-S, --scale SCALE Scaling the plot to bp, kbp (default), or mbp options:[bp, kbp, mbp].
-fq, --fasta_file_query FASTA_FILE_QUERY
Fasta file to read out the sequence length of query. Required if the sequence topology of query is circular and --sequence_length_query is not provided manually.
-fs, --fasta_file_subject FASTA_FILE_SUBJECT
Fasta file to read out the sequence length of subject. Required if the sequence topology of subject is circular and --sequence_length_subject is not provided manually.
-QIS, --query_is_subject
Specify this flag if the query sequence is identical to the subject sequence (i.e., chaining result from self-sequence alignment).
-W, --width WIDTH Specifies the figure width.
-H, --height HEIGHT Specifies the figure height
SegMantX fetch_nucleotide_chains -h
The following will be displayed:
usage: SegMantX [-h] -i INPUT_FILE [-fq FASTA_FILE_QUERY] [-o OUTPUT_FILE] [-fs FASTA_FILE_SUBJECT] module
Extracts nucleotide sequences in FASTA-format from chained alignments using the output table from one of SegMantX chaining modules.
positional arguments:
module Name of the module being executed.
options:
-h, --help show this help message and exit
-i, --input_file INPUT_FILE
Tab-delimited output file from SegMantX chaining module as input file (required).
-fq, --fasta_file_query FASTA_FILE_QUERY
Fasta file to extract nucleotide chains from (i.e., query chains).
-o, --output_file OUTPUT_FILE
Output file: Fasta file containing nucleotide chains.
-fs, --fasta_file_subject FASTA_FILE_SUBJECT
Fasta file to extract nucleotide chains from (i.e., subject chains).