pRESTO NEBNext Immune Sequencing Kit Workflow v3.2.0

Annotation: README: Example workflow for processing NEBNext Immune Sequencing data with pRESTO. CHANGES: v3.1.1: Try to fix workflow issue where it stops after pRESTO FilterSeq without errors in UI. Change 2 pRESTO FilterSeq tools right after seqtk to: generate detailed log = yes. v3.1.2: Try to fix issue with all pRESTO ParseLog tools failing. Add missing values for -f option to all pRESTO ParseLog tools using PrestoV5.3_AbSeqV3_html.sh as a template. v3.1.3: Try to fix workflow issue where it stops after it after pRESTO MaskPrimers without errors in UI. Change MaskPrimers, BuildConsensus, AssemblePairs, mask primer sequences tools to: generate detailed log = yes. v3.1.4: Try to fix workflow issue where it stops after it after pRESTO FilterSeq without errors in UI. Change AssemblePairs, mask low quality bases tools to: generate detailed log = yes (unexpectedly, they were not changed in the previous version). v3.1.5: Change MiGMAP Receptor and Chain from IGH to all available (IGH-TRD, a total of 7), to fit the current experimental design. For pRESTO MaskPrimers with R2 primer fasta, change max error rate from 0.2 to 0.5 to match the output of PrestoV5.3_AbSeqV3_html.sh more closely (this should increase the output size). v3.1.6: Try to make pRESTO CollapseSeq fastq output size match that obtained using PrestoV5.3_AbSeqV3_html.sh on the command line. Add workflow parameter Minimum Quality to pRESTO FilterSeq tool after pRESTO AssemblePairs. v3.1.7: Flips sense of the final partition to display those with 2 or more not 2 or less, reverts Min qual filter param (it only needs to be changed for debugging) v3.1.8: Adds second MiGMAP tool to produce both IG and TCR reports (it can't run both at once), adds missing filters by error rate in build consensus (fixes report) v3.1.9: Loosens mask primers after assemble pairs to use 0.4 error rate v.3.2.0: Annotations and formal workflow parameters to comport with current best practices

StepAnnotation
Step 1: Input dataset collection
select at runtime
Paired Fastq Dataset Collection
Step 2: Input dataset
select at runtime
Read 1 Primer Fasta
Step 3: Input dataset
select at runtime
Read 2 Primer Fasta
Step 4: Input dataset
select at runtime
C-Region Fasta
Step 5: Input dataset
select at runtime
Fasta containing known immune sequnces (used to assemble read pairs that do not overlap)
Step 6: Input parameter
Not available.
decimal 0-0.9999 = fraction of total reads intever 1-N = number of reads
Step 7: Unzip collection
Output dataset 'output' from step 1
Step 8: seqtk_sample
Output dataset 'forward' from step 7
4
Not available.
Step 9: seqtk_sample
Output dataset 'reverse' from step 7
4
Not available.
Step 10: pRESTO FilterSeq
Output dataset 'default' from step 8
Filters reads by quality score (quality)
20
False
True
Use default job resource parameters
Step 11: pRESTO FilterSeq
Output dataset 'default' from step 9
Filters reads by quality score (quality)
20
False
True
Use default job resource parameters
Step 12: pRESTO MaskPrimers
Output dataset 'fastq_out' from step 10
Output dataset 'output' from step 2
Remove primer and preceding sequence (cut)
Find primer matches by scoring primers at a fixed position (score)
0
False
False
0.2
True
Use default job resource parameters
Step 13: pRESTO ParseLog
Output dataset 'log_out' from step 10
ID QUALITY
Step 14: pRESTO MaskPrimers
Output dataset 'fastq_out' from step 11
Output dataset 'output' from step 3
Remove primer and preceding sequence (cut)
Find primer matches by scoring primers at a fixed position (score)
17
False
True
0.5
True
Use default job resource parameters
Step 15: pRESTO ParseLog
Output dataset 'log_out' from step 11
ID QUALITY
Step 16: pRESTO ParseLog
Output dataset 'log_out' from step 12
ID BARCODE PRIMER ERROR
Step 17: pRESTO PairSeq
Output dataset 'fastq_out' from step 12
Output dataset 'fastq_out' from step 14
Empty.
BARCODE
Illumina
Step 18: pRESTO ParseLog
Output dataset 'log_out' from step 14
ID BARCODE PRIMER ERROR
Step 19: pRESTO BuildConsensus
Output dataset 'r1_out' from step 17
1
BARCODE
0
0.6
0.5
PRIMER
0.6
Empty.
Empty.
False
True
Filters by error rate of input reads vs. consensus.
0.1
Use default job resource parameters
Step 20: pRESTO BuildConsensus
Output dataset 'r2_out' from step 17
1
BARCODE
0
0.6
0.5
PRIMER
0.6
Empty.
Empty.
False
True
Filters by error rate of input reads vs. consensus.
0.1
Use default job resource parameters
Step 21: pRESTO ParseLog
Output dataset 'log_out' from step 19
BARCODE SEQCOUNT CONSCOUNT PRIMER PRCONS PRCOUNT PRFREQ ERROR
Step 22: pRESTO PairSeq
Output dataset 'fastq_out' from step 19
Output dataset 'fastq_out' from step 20
Empty.
Empty.
pRESTO
Step 23: pRESTO ParseLog
Output dataset 'log_out' from step 20
BARCODE SEQCOUNT CONSCOUNT PRIMER PRCONS PRCOUNT PRFREQ ERROR
Step 24: pRESTO AssemblePairs
Output dataset 'r1_out' from step 22
Output dataset 'r2_out' from step 22
Read 2 Only
CONSCOUNT
PRCONS CONSCOUNT
Attempt assembly via alignment, then reference guided assembly (sequential)
1e-05
0.3
8
1000
True
Output dataset 'output' from step 5
0.5
1e-05
100
False
Blast
pRESTO
True
Use default job resource parameters
Step 25: pRESTO FilterSeq
Output dataset 'fastq_out' from step 24
Masks low quality positions (maskqual)
0
True
Use default job resource parameters
Step 26: pRESTO ParseLog
Output dataset 'log_out' from step 24
ID REFID LENGTH OVERLAP GAP ERROR PVALUE EVALUE1 EVALUE2 IDENTITY FIELDS1 FIELDS2
Step 27: pRESTO MaskPrimers
Output dataset 'fastq_out' from step 25
Output dataset 'output' from step 4
Remove primer and preceding sequence (cut)
Find primer matches using pairwise local alignment (align)
50
False
1
1
False
False
0.2
True
Use default job resource parameters
Step 28: pRESTO ParseLog
Output dataset 'log_out' from step 25
ID MASKED
Step 29: pRESTO ParseLog
Output dataset 'log_out' from step 27
ID PRIMER ERROR
Step 30: pRESTO ParseHeaders
Output dataset 'fastq_out' from step 27
Rename header annotation fields (rename)
PRIMER
CREGION
first
Step 31: pRESTOr AbSeq3 Report
_
Output dataset 'output' from step 13
Output dataset 'output' from step 15
Output dataset 'output' from step 16
Output dataset 'output' from step 18
Output dataset 'output' from step 21
Output dataset 'output' from step 23
Output dataset 'output' from step 26
Output dataset 'output' from step 28
Output dataset 'output' from step 29
Step 32: pRESTO ParseHeaders
Output dataset 'output' from step 30
Collapse header annotations with multiple entries (collapse)
CONSCOUNT
min
Step 33: pRESTO ParseHeaders
Output dataset 'output' from step 32
Write sequence headers to a table (table)
ID PRCONS CREGION CONSCOUNT
Step 34: pRESTO CollapseSeq
Output dataset 'output' from step 32
0
PRCONS CREGION
CONSCOUNT
sum
True
True
First Sequence
False
Step 35: pRESTO ParseHeaders
Output dataset 'fastq_out' from step 34
Write sequence headers to a table (table)
ID PRCONS CREGION CONSCOUNT DUPCOUNT
Step 36: pRESTO Partition
Output dataset 'fastq_out' from step 34
CONSCOUNT
2
Step 37: pRESTO ParseHeaders
Output dataset 'upper_out' from step 36
Write sequence headers to a table (table)
ID PRCONS CREGION CONSCOUNT DUPCOUNT