Below you’ll find concise sections and descriptions to get started with using SegMantX.
Use the links below to visit the full manual of SegMantX’s modules:
Core modules:
Optional modules (and help):
SegMantX is bioinformatics tool intially designed for chaining local alignments from a self-sequence alignment towards the detection of DNA duplications in genomic sequences. The application purposes of SegMantX are:
SegMantX’s suggested workflow (only) requires a (genomic) nucleotide sequence in FASTA-format. SegMantX’s workflow integrates BLASTn to compute local alignments as seeds for the chaining process.
Alternatively, the generation of local alignments using BLASTn is optional as the chaining modules accept any input (i.e., seed or alignment coordinates) that provide the following exemplified data:
Query start | Query end | Subject start | Subject end | Percent sequence identity |
---|---|---|---|---|
133470 | 147930 | 64534 | 78969 | 95.1 |
… | … | … | … | … |
329875 | 330416 | 326586 | 327127 | 93 |
Hint: Chaining local alignments on sequences that are characterized by a circular sequence topology require alignment (coordinate) input data retrieved from an alignment where circular sequences where concatenated with themselves in FASTA-format.
Please clone the SegMantX repository:
# Clone the repository
git clone https://github.com/DMH-biodatasci/SegMantX.git
cd SegMantX
Afterwards, choose an installation procedure that works for your machine:
Hint: The platform-independent installation may be required for older Miniconda versions.
Check if the installation was successful by running:
cd SegMantX #Navigate to the SegMantX directory as this module requires 'test_commands.txt'
SegMantX test_modules
Below we show two examples running SegMantX on the test dataset towards the (I.) duplication detection by chaining local alignments from self-sequence alignment and (II.) sequence comparison by chaining local alignments from sequence alignment.
The following workflow demonstrates the duplication detection computing a self-alignment of a (circular) plasmid sequence. Afterwards, the hits in the self-alignment will processed in the chaining module. An interactive plot visualizing resulting segments (or chains) and a FASTA-file containing the nucleotide chains will be created.
Hint: Use –is_query_circular only if your sequence has a circular sequence topology (e.g., circular plasmids etc.)
#Compute a self-sequence alignment:
SegMantX generate_alignments --query_file tests/NZ_AP022172.1.fasta --blast_output_file tests/NZ_AP022172.1.blast.x7 --alignment_hits_file tests/NZ_AP022172.1.alignment_coordinates.tsv --is_query_circular --self_sequence_alignment
SegMantX chain_self_alignments --input_file tests/NZ_AP022172.1.alignment_coordinates.tsv --max_gap 5000 --scaled_gap 1 --fasta_file tests/NZ_AP022172.1.fasta --is_query_circular --output_file tests/NZ_AP022172.1.chains.tsv
#Visualize chaining results of one sequence (i.e., towards duplication detection)
SegMantX visualize_chains --input_file tests/NZ_AP022172.1.chains.tsv --scale kbp --output_file tests/NZ_AP022172.1.html --fasta_file_query tests/NZ_AP022172.1.fasta --query_is_subject --genbank_file tests/NZ_AP022172.1.gbk
#Get sequences for duplication downstream analysis:
SegMantX fetch_nucleotide_chains --input_file tests/NZ_AP022172.1.chains.tsv --fasta_file_query tests/NZ_AP022172.1.fasta --output_file tests/NZ_AP022172.1.chains.fasta
The following workflow demonstrates the sequence comparison computing an alignment between two (circular) plasmid sequences. Afterwards, the hits in the alignment will processed in the chaining module. An interactive plot visualizing resulting segments (or chains) between the two plasmid sequences and a FASTA-file containing the nucleotide chains will be created.
##Compute a sequence alignment between two sequences:
SegMantX generate_alignments --query_file tests/NZ_CP018634.1.fasta --subject_file tests/NZ_CP022004.1.fasta --blast_output_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.blast.x7 --alignment_hits_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.alignment_coordinates.tsv --is_query_circular --is_subject_circular
SegMantX chain_alignments --input_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.alignment_coordinates.tsv --fasta_file_query tests/NZ_CP018634.1.fasta --fasta_file_subject tests/NZ_CP022004.1.fasta --max_gap 5000 --scaled_gap 1 --is_query_circular --is_subject_circular --min_length 100 -o tests/NZ_CP018634.1_vs_NZ_CP022004.1.chains.tsv
#Visualize chaining results of two sequences (i.e., towards sequence comparison)
SegMantX visualize_chains --input_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.chains.tsv --scale kbp --output_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.html --fasta_file_query tests/NZ_CP018634.1.fasta --fasta_file_subject tests/NZ_CP022004.1.fasta
#Get sequences for sequence comparison downstream analysis:
SegMantX fetch_nucleotide_chains --input_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.chains.tsv --fasta_file_query tests/NZ_CP018634.1.fasta --fasta_file_subject tests/NZ_CP022004.1.fasta --output_file tests/NZ_CP018634.1_vs_NZ_CP022004.1.chains.fasta