name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
targetFile | file | absolute | 1 | path to char-separated table file with header | 0 | 0 | |
targetSep | string | \t | 1- | separating char in the annotation file(s) | 0 | 0 | |
outputFile | file | absolute | 1 | path to the annotated output file | 0 | 0 | |
targetIDcolumn | string | 1- | name of the column of the target file that should be used to merge the table with the annotation file(s) | 0 | 0 | ||
annotationIDcolumn | string | 1- | name(s) of the column(s) of the annotation file(s) that should be used to merge the table with the annotation file(s) | 0 | 0 | ||
annotationFile | file | absolute | 1- | path(s) to annotation table file(s) that should be attached | 0 | 0 | |
annotationSep | string | \t | 1- | separating char in the target file | 0 | 0 |
ChIPSeeker can be used to visualize called peaks in ChIP-seq data
ChIPSeeker can be used to visualize called peaks in ChIP-seq data
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedFiles | file | absolute | 1- | path to *.bed or *.narrowPeak files that contain called peaks | 0 | 0 | |
annoDb | string | 1 | name of the R genome annotation database (e.g .org.Hs.eg.db) | 0 | 0 | ||
txdb | string | 1 | file or name of R library containing transcript-related features of a particular genome (e.g. TxDb.Hsapiens.UCSC.hg38.knownGene) | 0 | 0 | ||
outputDir | file | absolute | 1 | path to an output folder in which the plots will be stored | 0 | 0 | |
promotorUpstream | integer | 3000 | * | size in bp used to define the promotor region upstream of the annotated TSS (transcription start site) | 0 | 0 | |
promotorDownstream | integer | 3000 | * | size in bp used to define the promotor region downstream of the annotated TSS (transcription start site) | 0 | 0 | |
resample | integer | 1000 | * | number of resample iterations for confidence interval estimation | 0 | 0 | |
conf | string | 0.95 | * | confidence interval to be estimated | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
ChIPSeekerOutputFolder | string | path to the output folder containing the plots | 0 | 0 |
Pubmed references: 25765347,
performs differential gene expression tests based on count tables
performs differential gene expression tests based on count tables
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
controlCondition | string | 1 | name of the control condition | 0 | 0 | ||
testCondition | string | 1 | name of the test condition | 0 | 0 | ||
countFile | file | absolute | 1 | count file with features in rows and samples in columns | 0 | 0 | |
sampleAnnotation | file | absolute | 1 | annotation file with sample names in the first colum and sample condition in the second condition; (header: sample\tcondition) | 0 | 0 | |
featureAnnotation | file | absolute | * | annotation file which is joined with the count file | 0 | 0 | |
featureAnnotationID | string | FeatureID | * | name of the column used for joining | 0 | 0 | |
featureAnnotationType | string | type | * | name of the column in the annotation file for which a distribution plot is created | 0 | 0 | |
excludeSamples | string | 0-null | names of samples that should be excluded from the analysis | 0 | 0 | ||
pValueCutoff | double | [0-1] | 0.01 | * | p-Value cutoff for significant results | 0 | 0 |
minKeepReads | integer | >=0 | 25 | * | number of reads a feature must own in average per sample to pass filtering step before DE test is performed | 0 | 0 |
foldchangeCutoff | integer | 0.0,0.415,1.0 | 0-null | log2 foldchange cutoffs for which a own result file will be created; will be used for both directions (+/-) | 0 | 0 | |
foldchangeCutoffNames | string | significant,0.33-fold,2-fold | 0-null | corresponding names to the foldchange cutoffs | 0 | 0 | |
foldchangeCutoff | double | 1 | * | log2 foldchange cutoffs the two-colored volcano plot; will be used for both directions (+/-) | 0 | 0 | |
downregColor | string | red | * | color for down-regulated genes in the two-colored volcano plot | 0 | 0 | |
upregColor | string | blue | * | color for down-regulated genes in the two-colored volcano plot | 0 | 0 | |
output | file | absolute | 1 | path to output folder | 0 | 0 | |
method | string | all | * | method that should be applied; one of: limma, DESeq, DESeq2, edgeR, all | 0 | 0 |
Differential gene expression analysis was performed using %method% (%SOFTWARE_VERSION%).
tests RNA-seq data for differential exon usage
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
controlCondition | string | 1 | name of the control condition | 0 | 0 | ||
testCondition | string | 1 | name of the test condition | 0 | 0 | ||
countFile | file | absolute | 1 | count file with features in rows and samples in columns | 0 | 0 | |
flattedGTFAnnotation | file | absolute | 1 | flatted GTF file which was used to create the count file; created by dexseq_prepare_annotation.py that comes with DEXSeq | 0 | 0 | |
sampleAnnotation | file | absolute | 1 | annotation file with sample names in the first colum and sample condition in the second condition (header: sample\tcondition) | 0 | 0 | |
featureAnnotation | file | absolute | * | annotation file which is joined with the count file | 0 | 0 | |
featureAnnotationID | string | Geneid | * | name of the column with is used for joining | 0 | 0 | |
featureAnnotationName | string | name | * | name of the column in the annotation file that contains the name of the feature | 0 | 0 | |
excludeSamples | string | 0-null | names of samples that should be excluded from the analysis | 0 | 0 | ||
pValueCutoff | double | [0,1] | 0.01 | * | p-Value cutoff for significant results | 0 | 0 |
minKeepReads | integer | [1,] | 25 | * | number of reads a feature must own in average per sample to pass filtering step before DE test is performed | 0 | 0 |
output | file | absolute | 1 | output folder | 0 | 0 | |
threads | integer | [1,] | 1 | * | number of threads to use for testing | 0 | 0 |
Differential exon usage was determined using DEXSeq (%SOFTWARE_VERSION%).
Pubmed references: 22722343,
dynamic analysis of alternative polyadenylation from RNA-seq
dynamic analysis of alternative polyadenylation from RNA-seq
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
controlCondition | string | 1 | name of the control condition | 0 | 0 | ||
testCondition | string | 1 | name of the test condition | 0 | 0 | ||
sampleAnnotation | file | absolute | 1 | annotation file with sample names in the first colum and sample condition in the second condition (header: sample\tcondition) | 0 | 0 | |
excludeSamples | string | 1- | names of samples that should be excluded from the analysis | 0 | 0 | ||
wigFolder | file | 1 | folder containing the wig files (format: folder/samplename.bedgraph) | 0 | 0 | ||
wigEnding | string | bedgraph | * | ending of the wig files | 0 | 0 | |
annotated3UTR | file | absolute | 1 | path to annotated 3' UTR regions created with DaPars_Extract_Anno.py | 0 | 0 | |
outputFile | file | absolute | 1 | path to the output file | 0 | 0 | |
coverageCutoff | integer | [1,] | 30 | * | coverage threshold | 0 | 0 |
FDRCutoff | double | [0,1] | 0.01 | * | FDR cutoff | 0 | 0 |
PDUICutoff | double | [0,100] | 0.5 | * | degree of difference in APA usage in percent | 0 | 0 |
FoldChangeCutoff | double | 0.5 | * | log2 foldchange cutoff between the two conditions | 0 | 0 | |
numberOfCondASamplesReachingCutoff | integer | [1,] | * | number of samples from condition A that must pass the coverage cutoff; default: all samples | 0 | 0 | |
numberOfCondBSamplesReachingCutoff | integer | [1,] | * | number of samples from condition B that must pass the coverage cutoff; default: all samples | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
wiggleFile | string | path to the output file in WIG format | 0 | 0 |
DaPars (%SOFTWARE_VERSION%) was used to identify alternative polyadenylation.
Pubmed references: 25409906,
gene set enrichment analysis on GO and KEGG
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
backgroundFile | file | absolute | 1 | path to file with header, which contains a list of ENSEMBL or GENDCODE identifiers that should be used as backgroud | 0 | 0 | |
testFiles | string | absolute | 1- | path to file(s) with header, which contain a list of ENSEMBL or GENDCODE identifiers that should be used for enrichment testing | 0 | 0 | |
orgDB | string | 1 | name of the organism database (orgDB) that should be used as GO annotation; if package is missing it is installed via biocLite | 0 | 0 | ||
keggDBName | string | * | organism code for KEGG (e.g. mmu / hsa); http://www.genome.jp/kegg/catalog/org_list.html; if not supported by KEGGREST parameter will be ignored | 0 | 0 | ||
pValueCutoff | double | 0.01 | * | p-Value cutoff for significant results | 0 | 0 | |
plotKegg | boolean | true | * | if enabled, plots are created for KEGG pathways | 0 | 0 | |
output | file | absolute | 1 | path to output basename; folder is created if not existent | 0 | 0 | |
suffix | string | * | suffix that is inserted before basename of output; if a absolute path basename is applied | 0 | 0 | ||
foldchangeCol | string | * | name of the colum that contains the log2FC | 0 | 0 |
Afterwards gene set enrichment analysis was performed on gene sets defined by GO (%orgDB%) and KEGG (%keggDBName%) enrichment on up-/down-regulated genes using clusterProfiler (%SOFTWARE_VERSION%).
Pubmed references: 22455463,
identifies protein-DNA interaction at high resolution in ChIP-seq data
identifies protein-DNA interaction at high resolution in ChIP-seq data
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
jarPath | file | absolute | 1 | path to GEM jar file | 0 | 0 | |
expt | file | absolute | 1 | aligned read file | 0 | 0 | |
readDistribution | file | absolute | 1 | read spatial distribution file | 0 | 0 | |
gpsOnly | boolean | true | * | run in GPS only mode | 0 | 0 | |
k | integer | 8 | * | length of the k-mer for motif finding, use --k or (--kmin & --kmax); GEM parameter | 0 | 0 | |
kMin | integer | 6 | * | min value of k, e.g. 6; GEM parameter | 0 | 0 | |
kMax | integer | 13 | * | max value of k, e.g. 13; GEM parameter | 0 | 0 | |
seed | string | * | exact k-mer string to jump start k-mer set motif discovery; GEM parameter | 0 | 0 | ||
genome | folder | absolute | * | the path to the genome sequence directory, for motif finding; GEM parameter | 0 | 0 | |
outputPrefix | file | absolute | 1 | output folder name and file name prefix | 0 | 0 | |
control | file | absolute | * | aligned reads file for control | 0 | 0 | |
chrSize | file | absolute | * | genome chrom.sizes file with chr name/length pairs | 0 | 0 | |
format | string | BED | * | read file format: BED/SAM/BOWTIE/ELAND/NOVO | 0 | 0 | |
sizeInBp | integer | * | size of mappable genome in bp (default is estimated from genome chrom sizes) | 0 | 0 | ||
alphaValue | double | * | minimum alpha value for sparse prior (default is esitmated from the whole dataset coverage) | 0 | 0 | ||
qValue | double | 2 | * | significance level for q-value, specify as -log10(q-value) (default=2, q-value=0.01) | 0 | 0 | |
threads | integer | #CPU | * | maximum number of threads to run GEM in paralell | 0 | 0 | |
kSeqs | integer | 5000 | * | number of binding events to use for motif discovery; GEM parameter | 0 | 0 | |
memoryPerThread | integer | 2048 | * | total memory per thread in MB if running on local host; otherwise memory limit of executor might be set | 0 | 0 | |
useFixedAlpha | boolean | false | * | use a fixed user-specified alpha value for all the regions | 0 | 0 | |
JASPAROutput | boolean | true | * | output motif PFM in JASPAR format; GEM parameter | 0 | 0 | |
MEMEOutput | boolean | true | * | output motif PFM in MEME format; GEM parameter | 0 | 0 | |
HOMEROutput | boolean | true | * | output motif PFM in HOMER format; GEM parameter | 0 | 0 | |
BEDOutput | boolean | true | * | output binding events in BED format for UCSC Genome Browser | 0 | 0 | |
NarrowPeakOutput | boolean | true | * | output binding events in ENCODE NarrowPeak format | 0 | 0 | |
workingDir | folder path | absolute | /usr/local/storage/ | * | path to working directory | 0 | 0 |
GEM (%SOFTWARE_VERSION%) was used to call peaks in the ChIP-seq data [Y. Guo, S. Mahony, D.K. Gifford, High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology, (2012) 8(8): e1002638].
Pubmed references: 22912568,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
unpaired | file | * | Files with unpaired reads. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | 0 | 0 | ||
s | integer | * | skip the first <int> reads/pairs in the input (none) | 0 | 0 | ||
u | integer | * | stop after first <int> reads/pairs (no limit) | 0 | 0 | ||
trim5 | string | * | trim <int> bases from 5'/left end of reads (0) | 0 | 0 | ||
trim3 | string | * | trim <int> bases from 3'/right end of reads (0) | 0 | 0 | ||
nceil | string | * | func for max # non-A/C/G/Ts permitted in aln (L,0,0.15) | 0 | 0 | ||
pencansplice | integer | * | penalty for a canonical splice site (0) | 0 | 0 | ||
pennoncansplice | integer | * | penalty for a non-canonical splice site (12) | 0 | 0 | ||
pencanintronlen | string | * | penalty for long introns (G,-8,1) with canonical splice sites | 0 | 0 | ||
pennoncanintronlen | string | * | penalty for long introns (G,-8,1) with noncanonical splice sites | 0 | 0 | ||
minintronlen | integer | * | minimum intron length (20) | 0 | 0 | ||
maxintronlen | integer | * | maximum intron length (500000) | 0 | 0 | ||
knownsplicesiteinfile | file | * | provide a list of known splice sites | 0 | 0 | ||
novelsplicesiteoutfile | file | * | report a list of splice sites | 0 | 0 | ||
rnastrandness | string | * | Specify strand-specific information (unstranded) | 0 | 0 | ||
ma | integer | * | match bonus (0 for --end-to-end, 2 for --local) | 0 | 0 | ||
mp | string | * | max and min penalties for mismatch; lower qual = lower penalty <6,2> | 0 | 0 | ||
sp | string | * | max and min penalties for soft-clipping; lower qual = lower penalty <2,1> | 0 | 0 | ||
np | integer | * | penalty for non-A/C/G/Ts in read/ref (1) | 0 | 0 | ||
rdg | string | * | read gap open, extend penalties (5,3) | 0 | 0 | ||
rfg | string | * | reference gap open, extend penalties (5,3) | 0 | 0 | ||
scoremin | string | * | min acceptable alignment score w/r/t read length (L,0.0,-0.2) | 0 | 0 | ||
k | integer | * | report up to <int> alns per read; MAPQ not meaningful | 0 | 0 | ||
a | integer | * | report all alignments; very slow, MAPQ not meaningful | 0 | 0 | ||
un | file | * | write unpaired reads that didn't align to <path> | 0 | 0 | ||
al | file | * | write unpaired reads that aligned at least once to <path> | 0 | 0 | ||
unconc | file | * | write pairs that didn't align concordantly to <path> | 0 | 0 | ||
alconc | file | * | write pairs that aligned concordantly at least once to <path> | 0 | 0 | ||
metfile | file | * | send metrics to file at <path> (off) | 0 | 0 | ||
met | integer | * | report internal counters & metrics every <int> secs (1) | 0 | 0 | ||
rgid | string | * | set read group id, reflected in @RG line and RG:Z: opt field | 0 | 0 | ||
rg | string | * | add <text> (\"lab:value\") to @RG line of SAM header. | 0 | 0 | ||
offrate | integer | * | override offrate of index; must be >= index's offrate | 0 | 0 | ||
threads | integer | * | number of alignment threads to launch (1) | 0 | 0 | ||
seed | integer | * | seed for random number generator (0) | 0 | 0 | ||
index | string | 1 | Index filename prefix (minus trailing .X.ht2) | 0 | 0 | ||
paired1 | file | * | Files with #1 mates, paired with files in <m2>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | 0 | 0 | ||
paired2 | file | * | Files with #2 mates, paired with files in <m1>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2). | 0 | 0 | ||
output | file | 1 | File for SAM output | 0 | 0 | ||
fastq | boolean | false | * | query input files are FASTQ .fq/.fastq (default) | 0 | 0 | |
qseq | boolean | false | * | query input files are in Illumina's qseq format | 0 | 0 | |
fasta | boolean | false | * | query input files are (multi-)FASTA .fa/.mfa | 0 | 0 | |
raw | boolean | false | * | query input files are raw one-sequence-per-line | 0 | 0 | |
c | boolean | false | * | paired1, paired2, unpaired are sequences themselves, not files | 0 | 0 | |
phred33 | boolean | false | * | qualities are Phred+33 (default) | 0 | 0 | |
phred64 | boolean | false | * | qualities are Phred+64 | 0 | 0 | |
intquals | boolean | false | * | qualities encoded as space-delimited integers | 0 | 0 | |
ignorequals | boolean | false | * | treat all quality values as 30 on Phred scale (off) | 0 | 0 | |
nofw | boolean | false | * | do not align forward (original) version of read (off) | 0 | 0 | |
norc | boolean | false | * | do not align reverse-complement version of read (off) | 0 | 0 | |
novelsplicesiteinfile | boolean | false | * | provide a list of novel splice sites | 0 | 0 | |
notempsplicesite | boolean | false | * | disable the use of splice sites found | 0 | 0 | |
nosplicedalignment | boolean | false | * | disable spliced alignment | 0 | 0 | |
tmo | boolean | false | * | Reports only those alignments within known transcriptome | 0 | 0 | |
dta | boolean | false | * | Reports alignments tailored for transcript assemblers | 0 | 0 | |
dtacufflinks | boolean | false | * | Reports alignments tailored specifically for cufflinks | 0 | 0 | |
fr | boolean | false | * | -1, -2 mates align fw/rev | 0 | 0 | |
nomixed | boolean | false | * | suppress unpaired alignments for paired reads | 0 | 0 | |
nodiscordant | boolean | false | * | suppress discordant alignments for paired reads | 0 | 0 | |
t | boolean | false | * | print wall-clock time taken by search phases | 0 | 0 | |
quiet | boolean | false | * | print nothing to stderr except serious errors | 0 | 0 | |
metstderr | boolean | false | * | send metrics to stderr (off) | 0 | 0 | |
nohead | boolean | false | * | supppress header lines, i.e. lines starting with @ | 0 | 0 | |
nosq | boolean | false | * | supppress @SQ header lines | 0 | 0 | |
omitsecseq | boolean | false | * | put '*' in SEQ and QUAL fields for secondary alignments. | 0 | 0 | |
reorder | boolean | false | * | force SAM output order to match order of input reads | 0 | 0 | |
mm | boolean | false | * | use memory-mapped I/O for index; many 'bowtie's can share | 0 | 0 | |
qcfilter | boolean | false | * | filter out reads that are bad according to QSEQ filter | 0 | 0 | |
nondeterministic | boolean | false | * | seed rand. gen. arbitrarily instead of using read attributes | 0 | 0 | |
removechrname | boolean | false | * | remove 'chr' from reference names in alignment | 0 | 0 | |
addchrname | boolean | false | * | add 'chr' to reference names in alignment | 0 | 0 | |
rf | boolean | false | * | -1, -2 mates align rev/fw | 0 | 0 | |
ff | boolean | false | * | -1, -2 mates align fw/fw | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
SAMFile | string | output SAM file (= value for parameter output) | 0 | 0 |
Sequencing reads were mapped using HISAT2 (version (%SOFTWARE_VERSION%)) [Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015 Apr;12(4):357-60].
Pubmed references: 25751142,
assembles transcript sequences of a sample using RNA-seq reads.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
forward | file | 1 | Path to FastQ or FastQ.gz file containing the forward reads | 0 | 0 | ||
reverse | file | 1 | Path to FastQ or FastQ.gz file containing the reverse reads | 0 | 0 | ||
cons_path | file | 1 | Path to file containing consensus sequences (from svCaller) | 0 | 0 | ||
outFolder | file | 1 | Path to output folder, where SPades stores all its resulting files. | 0 | 0 | ||
memory | integer | 40 | * | [optional] RAM limit. | 0 | 0 | |
ignoreConsensusExistence | boolean | false | * | do not throw an error if file containing consensus sequences does not exist | 0 | 0 |
SPades was used to assemble transcript sequences by using the forward and reverse RNA-seq reads of a sample.
Pubmed references: 32559359,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
runThreadN | integer | * | [optional] int: number of threads to run STAR | 0 | 0 | ||
genomeDir | file | 1 | string: path to the directory where genome files will be generated | 0 | 0 | ||
genomeFastaFiles | file | 1- | string(s): path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. | 0 | 0 | ||
sjdbGTFfile | file | * | [optional] string: path to the GTF file with annotations | 0 | 0 | ||
sjdbOverhang | integer | 100 | * | [optional] int&gt;0: length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) | 0 | 0 | |
sjdbGTFtagExonParentTranscript | string | transcript_id | * | [optional] string: GTF attribute name for parent transcript ID (default &quot;transcript_id&quot; works for GTF files) | 0 | 0 | |
sjdbFileChrStartEnd | file | 0-null | [optional] string(s): path to the files with genomic coordinates (chr &lt;tab&gt; start &lt;tab&gt; end &lt;tab&gt; strand) for the splice junction introns. | 0 | 0 | ||
genomeSAindexNbases | integer | * | [optional] int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1). | 0 | 0 | ||
genomeChrBinNbits | integer | * | [optional] int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). | 0 | 0 |
STAR indices were created for the XXX genom using STAR (%SOFTWARE_VERSION%).
Pubmed references: 23104886,
sequences (and qualities) of FASTQ files can be added to SAM files
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
sam | file path | absolute | 1 | path to the SAM file | 0 | 0 | |
fastq | file path | absolute | 1- | path to the FASTQ file(s) | 0 | 0 | |
output | file path | absolute | 1 | path to the output SAM file in which the sequences are added | 0 | 0 | |
unmapped | file path | absolute | * | path to a FASTQ file in which the unmapped sequences will be written to; exclusive with --preread flag | 0 | 0 | |
noquality | boolean | false | * | does not add the read quality values | 0 | 0 | |
update | boolean | false | * | overrides already existing output files | 0 | 0 | |
preread | boolean | false | * | does only index reads stored in the FASTQ file that are part of the SAM file; exclusive with --unmapped parameter | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
SAMFileWithSequences | string | absolute path to the SAM file with added sequences and, if enabled, qualities | 0 | 0 |
UnmappedReadFile | string | absolute path to file containing all unmapped reads in FASTQ format | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inputregs | file | 1 | file with specified genomic regions to analyze | 0 | 0 | ||
bams | file | 1 | path to bam files | 0 | 0 | ||
pattern | string | 1 | pattern to grep for bam files | 0 | 0 | ||
strandness | string | 1 | strandness of experiment | 0 | 0 | ||
out | file | 1 | output directory | 0 | 0 | ||
sampleAnnotation | file | 1 | file specifying two conditions | 0 | 0 | ||
pseudocount | integer | * | pseudocount to subtract from counts | 0 | 0 | ||
numrandomizations | integer | * | number of randomizations | 0 | 0 | ||
everyPos | string | * | every position of read is counted | 0 | 0 |
determines the best match(es) of insertion consensus sequences and sequences of a sequence assembly to extract the insertion sequences.
determines the best match(es) of insertion consensus sequences and sequences of a sequence assembly to extract the insertion sequences.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
fasta | file | 1 | Fasta file generated by SPAdes assembler. | 0 | 0 | ||
sam | file | 1 | SAM file containing consensus sequences mapped to fasta file. | 0 | 0 | ||
out | file | 1 | Output fasta file containing detected insertion sequences | 0 | 0 | ||
maxSize | integer | 8000 | * | [optional] Maximum length for an insertion. Detected insertions with greater length are discarded. | 0 | 0 | |
ignoreFastaExistence | boolean | false | * | Do not throw an error if fasta file does not exist | 0 | 0 |
assemblyAnalyzer was used to extract the sequence of insertions that were previously predicted with the SV caller. It therefore identified the best pair(s) of consensus sequences and assembled sequences.
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bam | file | absolute | 1 | path to the position-based-sorted BAM file | 0 | 0 | |
output | file | absolute | 1 | path to BEDGRAPH file | 0 | 0 | |
contigSizes | file | absolute | * | file containing the sizes of the contigs used in the BAM file if ranges should be extended (format: <chrName><TAB><SIZE>) | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
wiggleFile | string | absolute path to the converted output file | 0 | 0 |
Bedtools (%SOFTWARE_VERSION%) was used to convert BAM files to WIG files.
Pubmed references: 20110278,
creates plots based on statistics of BAM files
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inBam | string | valid file path, bam format | -1- | Path to the bam file that will be converted into bed format. An index of the bam file is not required. | 0 | 0 | |
outBed | string | -1- | Path for saving the resulting bed file. | 0 | 0 | ||
bedtoolsPath | string | valid file path to executable | bedtools | -1- | Path to the bedtools executable. Per default, it is assumed that bedtools is in the PATH variable. | 0 | 0 |
split | boolean | true | -1- | Defines how split alignments (cigar string that contains N) are handled. If true, the skipped region is not included in the bed regions. If false, the skipped region is included in the bed region, i.e. there is only one interval from alignment start to alignment end. | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
bedFile | string | path to the bed file that is created (same value as outBed parameter) | 0 | 0 |
Bed files were created from the bam files using bedtools bamtobed.
Pubmed references: 20110278,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inBam | string | valid file path, file ending .bam, indexed | -1- | Path to the bam file that will be converted into bigWig format. The bam file has to be indexed. | 0 | 0 | |
outBw | string | -1- | Path for saving the resulting bigWig file. | 0 | 0 | ||
bamCoveragePath | string | valid file path | bamCoverage | -1- | Path to the executable bamCoverage which is part of deepTools. Per default, it is assumed that bamCoverage is in the PATH variable. | 0 | 0 |
binSize | integer | positive, not zero | 1 | -1- | Resolution of the bigWig file. Increasing the binSize causes loss of information but decreases the size of the bigWig file. Highest resolution (at single basepair level) is achieved for binSize=1 (default). | 0 | 0 |
numberOfProcessors | integer | positive, not zero | 1 | -1- | Number of processors to use (parallelization) | 0 | 0 |
BigWig files were created from the bam files using the tool bamCoverage from the deepTools tool suite.
Pubmed references: 27079975,
creates various statistics on BAM files using RSeQC and samtools, which can be used for quality assessment
creates various statistics on BAM files using RSeQC and samtools, which can be used for quality assessment
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bam | file path | absolute | 1- | path to one or more BAM file(s) | 0 | 0 | |
outdir | folder path | absolute | 1 | path to the output folder; individual files will be stored in a sub-folder (using the basename of the BAM file as folder name) | 0 | 0 | |
readLength | integer | 1 | maximal length of the reads | 0 | 0 | ||
sampleDepth | integer | 100000 | * | number of reads which are used for sampling | 0 | 0 | |
annotation | file path | absolute | * | gene annotation in BED format | 0 | 0 | |
geneBodyAnnotation | file path | absolute | * | genes that are used to calculate the gene body coverage; should contain house keeping genes | 0 | 0 | |
idxstats | boolean | true | * | enables calculation of number of reads mapped on each chromosome | 0 | 0 | |
flagstat | boolean | true | * | enables calculation of flags of mapped reads | 0 | 0 | |
count | boolean | false | * | enables calculation of raw and rpkm count table for exons, introns and mRNAs | 0 | 0 | |
saturation | boolean | true | * | enables down-sampling of the mapped reads to infer the sequencing depth | 0 | 0 | |
statistics | boolean | true | * | calculates reads mapping statistics | 0 | 0 | |
clipping | boolean | true | * | enables clipping statistic of the mapped reads | 0 | 0 | |
insertion | boolean | true | * | enables insertion statistic of the mapped reads | 0 | 0 | |
deletion | boolean | true | * | enables deletion statistic of the mapped reads | 0 | 0 | |
inferExperiment | boolean | true | * | tries to infer if the sequencing was strand specific or not | 0 | 0 | |
junctionAnnotation | boolean | true | * | enables checking of how many of the splice junctions are novel or annotated | 0 | 0 | |
junctionSaturation | boolean | true | * | enables down-sampling of the spliced reads to infer if sequencing depth is enough for splicing analyses | 0 | 0 | |
distribution | boolean | true | * | calculates how mapped reads are distributed among different genomic features | 0 | 0 | |
duplication | boolean | true | * | calculates sequence duplication levels | 0 | 0 | |
gc | boolean | true | * | calculates GC-content of the mapped reads | 0 | 0 | |
nvc | boolean | true | * | checks if a nucleotide composition bias exist | 0 | 0 | |
insertSize | boolean | true | * | calculates the insert size between two paired RNA reads | 0 | 0 | |
fragmentSize | boolean | true | * | calculates the fragment size for each transcript | 0 | 0 | |
tin | boolean | true | * | calculates the transcript integrity number which is similar to the RNA integrity number | 0 | 0 | |
paired | boolean | false | * | must be set if paired-end data is analyzed | 0 | 0 | |
stranded | boolean | false | * | must be set if strand-specific data is analyzed | 0 | 0 | |
disableAllDefault | boolean | false | * | disables all options which are not explicitly activated | 0 | 0 |
Quality of the resulting mappings was assessed using RSeQC [Liguo Wang, Shengqin Wang, Wei Li; RSeQC: quality control of RNA-seq experiments, Bioinformatics, Volume 28, Issue 16, 15 August 2012, Pages 2184–2185].
Pubmed references: 22743226,
calls SNPs and small indels using the variant caller bcftools.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
reference | file | 1 | Path to the file containing the reference genome. | 0 | 0 | ||
bamfile | file | 1 | Path to the input bam file. | 0 | 0 | ||
vcf | file | 1 | Path of the output vcf file. | 0 | 0 | ||
maxdepth | integer | 100000 | * | Maximum number of reads per position. | 0 | 0 |
bcftoolsVariantCalling was used to call in particular SNPs with bcftools.
Pubmed references: 21903627,
combines expression of biological or technical replicates; all replicates are scaled to the same number of reads and averaged afterwards
combines expression of biological or technical replicates; all replicates are scaled to the same number of reads and averaged afterwards
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedgraphFiles | file path | absolute | 2- | path to sorted BEDGRAPH files; at least two files must be given; all files must contain the same chromosomes in the same order | 0 | 0 | |
outputFile | file path | absolute | 1 | path to the output file | 0 | 0 | |
mergedIdxstatsFile | file path | absolute | 1 | path to a tab-separated file that contains the output generated by samtools idxstats for all samples (columns 1-4) and in the 5th column the sample name; used columns: 1 -> chr name; 3 -> number of mapped reads; 5 -> name of the sample | 0 | 0 | |
notSkipHead | boolean | false | * | disables the skipping of the first line of the idxstats file (--mergedIdxstatsFile); default: first line is skipped | 0 | 0 | |
numberOfDigits | integer | 5 | * | number of decimal places to round the calculated values | 0 | 0 | |
normByReadCount | integer | 1000000 | * | number of reads to which each replicate is normed (based on the idxstats output) before values are averaged | 0 | 0 |
shrinks regions with the same score to one region in a bedgraph file or expands the file to a region size of one basepair
shrinks regions with the same score to one region in a bedgraph file or expands the file to a region size of one basepair
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedgraphFile | file | absolute | 1 | path to a sorted, not overlapping BEDGRAPH file | 0 | 0 | |
outputFile | file | absolute | 1 | path to the output file | 0 | 0 | |
genomeSize | file | absolute | * | path to file containing the size of the contigs | 0 | 0 | |
expand | boolean | false | * | expand the ranges instead of shrinking them | 0 | 0 | |
addZeroRanges | boolean | false | * | adds ranges that are missing with a zero value | 0 | 0 | |
omitZeroRanges | boolean | false | * | suppress the output of ranges with a zero value | 0 | 0 |
partitions regions into a fixed number of bins and calculates coverage in that bin
partitions regions into a fixed number of bins and calculates coverage in that bin
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedgraph | file | absolute | 0-null | bedgraph or bigwig file(s) | 0 | 0 | |
bedgraphPos | file | absolute | 0-null | bedgraph or bigwig file(s) for positive strand | 0 | 0 | |
bedgraphNeg | file | absolute | 0-null | bedgraph or bigwig file(s) for negative strand | 0 | 0 | |
annotation | file | absolute | 1- | region annotation file(s); (see writeGRangesToBed() in R/binGenome.lib.R for format info) | 0 | 0 | |
bedgraphNames | string | 0-null | sample names for generation of output filenames | 0 | 0 | ||
annotationNames | string | 0-null | annotation names for generation of output filenames | 0 | 0 | ||
bins | integer | >0 | 0-null | number of bins to partition each region | 0 | 0 | |
quantiles | integer | [0-100] | 0-null | determines the position at which expression exceeds specific quantiles in percent | 0 | 0 | |
outputDir | file | absolute | 1 | path to output folder; files will be named automatically based on the used parameters | 0 | 0 | |
cores | string | >0 | 1 | * | number of cores to use in parallel | 0 | 0 |
normalize | boolean | true | 0-null | write in addition a per-gene normalized version of the data | 0 | 0 | |
fixedBinSizeUpstream | string | * | creates bins with a fixed size upstream of the region; format: 'binsize:binnumber' | 0 | 0 | ||
fixedBinSizeDownstream | string | * | creates bins with a fixed size downstream of the region; format: 'binsize:binnumber' | 0 | 0 | ||
tmpDir | string | * | path to tmp folder | 0 | 0 |
Each region was binned into a fixed number of bins (x/x/x), and average coverage for each bin was calculated for each transcript in each sample.
Pubmed references:
technical demo which shows how docker containers can be used in combination with Watchdog; basic bowtie mapper
technical demo which shows how docker containers can be used in combination with Watchdog; basic bowtie mapper
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
genome | file path | absolute | 1 | path to indexed reference genome (withouth trailing .X.bt2 ending) | 0 | 0 | |
reads | file path | absolute | * | path to reads in FASTQ format; for mapping of paired-end data, two files are required | 0 | 0 | |
outfile | file path | absolute | 1 | path to output file, which is written in SAM format; a log file with .log suffix will also be written | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inReads | file | file exists, fastq format, ending .fq or .fastq | 1 | fastq file with the sequenced reads | 0 | 0 | |
bwaIndex | string | 1 | Common prefix of bwa index files for the reference genome | 0 | 0 | ||
outSai | string | 1 | file for writing mapped reads in bwa format | 0 | 0 | ||
bwaPath | file | file exists, bwa executable | bwa | * | path to BWA executable (default: use executable from PATH) | 0 | 0 |
threads | integer | >0 | 1 | * | number of threads to use for bwa aln (-t option of bwa) | 0 | 0 |
stopIfMoreThanBestHits | integer | >0 | * | stop searching when there are more than that many best hits (default: use bwa default) | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
bwaSaiFile | string | *.sai file created by the module (same value as given by the parameter outSai) | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
fasta | file | 1 | Path to the fasta file, which is going to be indexed. Index files will appear in the same folder! | 0 | 0 | ||
ignoreFastaExistence | boolean | false | * | do not throw an error if fasta file does not exist | 0 | 0 |
bwaIndex was used to index a fasta file, so that it can be subsequently utilized for searching sequences in it.
Pubmed references: 19451168,
creates a sam file with bwa sampe from mappings of bwa aln for paired reads
creates a sam file with bwa sampe from mappings of bwa aln for paired reads
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inReads1 | file | file exists, fastq format, ending .fq or .fastq | 1 | uncompressed fastq (.fq, .fastq) file with the sequenced reads | 0 | 0 | |
inReads2 | file | file exists, fastq format, ending .fq or .fastq | 1 | uncompressed fastq (.fq, .fastq) file with the sequenced reads (mates) | 0 | 0 | |
inSai1 | file | file exists | 1 | output of bwa aln for the file given by inReads1 | 0 | 0 | |
inSai2 | file | file exists | 1 | output of bwa aln for the file given by inReads2 | 0 | 0 | |
bwaIndex | string | 1 | Common prefix of bwa index files for the reference genome | 0 | 0 | ||
outSam | string | 1 | file for writing mapped reads in sam format | 0 | 0 | ||
bwaPath | file | file exists, bwa executable | bwa | * | path to BWA executable (default: use executable from PATH) | 0 | 0 |
indexInRam | boolean | False | * | option to load complete index into main memory (default: false) | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
bwaPairedSamFile | string | sam file created by the module (same value as given by the parameter outSam) | 0 | 0 |
Calculates dOCR lengths for genes from open chromatin regions in BED format.
Calculates dOCR lengths for genes from open chromatin regions in BED format.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
input | file | 1 | input file (in BED format) | 0 | 0 | ||
name | string | 1 | sample name used for output files | 0 | 0 | ||
output | string | 1 | output directory | 0 | 0 | ||
annotation | file | 1 | genome annotation file (in GTF format) | 0 | 0 | ||
d1 | integer | 10000 | * | [optional] maximum distance of OCR to gene end for this OCR to be added to this gene in the first step | 0 | 0 | |
d2 | integer | 5000 | * | [optional] maximum distance of OCR to last added OCR for a gene for this OCR to be added in the second step | 0 | 0 | |
gene | string | * | [optional] get total length of OCRs within gene (in_gene_length) and fraction of gene body covered by OCRs, default false | 0 | 0 |
dOCR lengths were calculated as previously described in Hennig T et al, 2028, PLOS Pathogens 14(3): e1006954
Pubmed references: 29579120,
Calculates the downsampling rate for each sample, such all samples will have approximately the same number of reads after downsampling with this rate.
Calculates the downsampling rate for each sample, such all samples will have approximately the same number of reads after downsampling with this rate.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
idxstats | file | 1 | idxstats file | 0 | 0 | ||
exclude | string | * | [optional] chromosomes to be excluded, comma separated | 0 | 0 | ||
samples | string | * | [optional] samples to be used, comma separated | 0 | 0 | ||
output | string | 1 | output table file | 0 | 0 |
Downsampling rates were determined such that all included samples will have approximately the same number of reads after downsampling with this rate.
Pubmed references:
creates a md5 checksum of a file or verifies file integrity based on a md5 checksum using md5sum
creates a md5 checksum of a file or verifies file integrity based on a md5 checksum using md5sum
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
input | file path | absolute | 1 | absolute path to file for which a checksum should be calculated or which should be verified | 0 | 0 | |
oldChecksumName | file path | absolute | * | absolute path to a (non-existent) file used to identify the correct checksum line for cases in which the file was renamed or moved after checksum creation; can only be used in verify mode | 0 | 0 | |
checksum | file path | absolute | .checksum.md5 | * | absolute path to the checksum file; by default '.checksum.md5' located in the same directory as the input file | 0 | 0 |
verify | boolean | false | * | flag to verify integrity of a file based on the checksum file | 0 | 0 | |
update | boolean | false | * | flag to update an already existing checksum in the checksum file | 0 | 0 | |
absolutePath | boolean | false | * | flag to store an absolute path in the checksum file instead of a relative one | 0 | 0 | |
ignorePath | boolean | false | * | flag to use only the name of the file for identification of the corresponding checksum line (ignores the location of the file); can only be used in verify mode | 0 | 0 |
combines the predictions of circularRNAs made with the modules for CIRI2 and circRNA_finder.
combines the predictions of circularRNAs made with the modules for CIRI2 and circRNA_finder.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inCircs1 | file | file exists | 1 | First prediction file with circRNAs and junction reads (tab-separated, 5 columns: chromosome, start, end, strand, list of reads) | 0 | 0 | |
inCircs2 | file | file exists | 1 | Second prediction file with circRNAs and junction reads (tab-separated, 5 columns: chromosome, start, end, strand, list of reads) | 0 | 0 | |
outUnion | file | 1 | Output path for the union of the predictions (coordinates and reads) | 0 | 0 | ||
outIntersection | file | 1 | Output path for the intersection of the predictions (coordinates and reads | 0 | 0 | ||
outIntersectedUnion | file | 1 | Output path for the intersected union of the predictions (intersection of coordinates, union of reads) | 0 | 0 | ||
minReads | integer | >=1 | 2 | * | Minimum number of predicted junction reads required for writing a circRNA into the output files. The cutoff is applied independently to the intersection, union and intersected union of the predictions. | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
circUnion | file | Output path for the union of the predictions (same as input parameter outUnion) | 0 | 0 |
circIntersection | file | Output path for the intersection of the predictions (same as input parameter outIntersection) | 0 | 0 |
circIntersectedUnion | file | Output path for the intersected union of the predictions (same as input parameter outIntersectedUnion) | 0 | 0 |
Predictions of circular RNAs were combined by forming the union/intersection of the individual predictions. The circular reads were combined by forming the union/intersection of the predictions.
Pubmed references:
runs circRNA_finder to detect circular RNAs in single-end or paired-end sequencing data.
runs circRNA_finder to detect circular RNAs in single-end or paired-end sequencing data.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inReads1 | file | file exists, fastq format | * | path to single-end fastq file or path to first fastq file with paired reads | 0 | 0 | |
inReads2 | file | file exists, fastq format | * | path to second fastq file with paired reads (paired-end data only) | 0 | 0 | |
strandedLibrary | integer | allowed values: 0,1,2 | 0 | * | indicates if the library is strand specific, 0 = unstranded/unknown, 1 = stranded (first read), 2 = stranded (second read), (default: 0),if the library type is unstranded/unknown the strand is guessed from the strand of the AG-GT splice site | 0 | 0 |
reference | file | file exists, fasta format | * | path to (multi-)fasta file with the reference genome (not required if STAR index or a STAR results is provided) | 0 | 0 | |
inSTAR | string | * | output prefix of a STAR mapping that was created with STAR run with chimeric segment detection | 0 | 0 | ||
outPrefix | string | 1 | path and file name prefix for all files produced by this module; the final file is named out/prefixcfCirc.txt | 0 | 0 | ||
outCirc | string | * | final output of predicted CircRNAs (can be used to save the final prediction in a different place than given in outPrefix) | 0 | 0 | ||
starPath | file | STAR | * | specify a path to the STAR executable if STAR is not part of your PATH variable | 0 | 0 | |
starIndex | file | * | STAR index for the reference genome, if no index is provided it is automatically created by the module using the file given by --reference | 0 | 0 | ||
starThreads | integer | >=1 | 1 | * | number of threads to use with STAR | 0 | 0 |
cfPath | file | postProcessStarAlignment.pl | * | path to circRNA_finder perl script postProcessStarAlignment.pl | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
cfCircs | string | path to file with predicted circRNAs, it corresponds to the value of the parameter outCirc if it is set, otherwise the file path is derived from outPrefix | 0 | 0 |
runs CIRI2 to detect circular RNAs in single-end or paired-end sequencing data.
runs CIRI2 to detect circular RNAs in single-end or paired-end sequencing data.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inReads1 | file | file exists, fastq format | * | path to first fastq file with reads (for single-end or paired-end data first reads) | 0 | 0 | |
inReads2 | file | file exists, fastq format | * | path to second fastq file with reads (for paired-end data second reads only) | 0 | 0 | |
inSAM | file | file exists, SAM format | * | path to SAM file that was created with BWA Mem (can be used as input instead of fastq files) | 0 | 0 | |
reference | file | file exists, fasta format | 1 | path to (multi-)fasta file with the reference genome | 0 | 0 | |
outPrefix | string | 1 | path and file name prefix for all files produced by this module, the final file is named out/prefixciriCirc.txt | 0 | 0 | ||
outCirc | string | * | final output of predicted CircRNAs (can be used to save the final prediction in a different place than given in outPrefix) | 0 | 0 | ||
bwaPath | file | bwa | * | specify a path to the BWA executable if bwa is not part of your PATH variable | 0 | 0 | |
bwaThreads | integer | >=1 | 1 | * | number of threads to use with BWA, default:1 | 0 | 0 |
bwaIndex | string | valid bwa index | * | BWA index for the reference genome provided by the --reference option, if no index is provided it is automatically created by the module | 0 | 0 | |
bwaSeedSize | integer | >=1 | 19 | * | BWA -k parameter for the minimum seed length | 0 | 0 |
bwaScoreThreshold | integer | >=1 | 30 | * | BWA -T parameter for the minimum alignment score; default is 30, but 19 recommended for CIRI2 | 0 | 0 |
ciriPath | file | CIRI2.pl | * | path to CIRI2 perl script | 0 | 0 | |
ciriThreads | integer | >=1 | 1 | * | number of threads to use for CIRI2 | 0 | 0 |
ciriAnnotation | file | file exists, GTF format | * | GTF file with gene annotations for the genome given in the --reference option, if a GTF file is passed to this module, CIRI annotates all circRNAs with the corresponding gene | 0 | 0 | |
ciriStringency | string | 3 allowed values: high, medium or low | high | * | Controls how stringent CIRI2 filters the circRNAs based on circular reads, cigar strings and false positive reads | 0 | 0 |
ciriKeepTmpFiles | boolean | False | * | if this flag is set, CIRI2 does not delete the temporary files at the end | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
ciriCircs | string | path to file with predicted circRNAs, it corresponds to the value of the parameter outCirc if it is set, otherwise the file path is derived from outPrefix | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
outdir | file | 1 | output directory | 0 | 0 | ||
genelist | file | 1 | list of genes to consider | 0 | 0 | ||
coveragefiles | file | 1 | path to coverage files | 0 | 0 | ||
exp | string | 1 | type of experiment | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedgraphTable | file | 1 | path to bedgraph table | 0 | 0 | ||
cluster | integer | 1 | number of clusters | 0 | 0 | ||
factor | string | 0-null | factor to consider | 0 | 0 | ||
coverageFiles | file | 1 | path to coveragefiles | 0 | 0 | ||
bedname | string | 1 | name of bed file | 0 | 0 | ||
aggregateFUN | string | 1 | function to aggregate | 0 | 0 | ||
normShapeSum | boolean | 1 | how to norm shape | 0 | 0 | ||
normLibSize | boolean | 1 | how to norm lib isze | 0 | 0 | ||
normBinLength | boolean | 1 | how to norm bin length | 0 | 0 | ||
bins | integer | 1 | number of bins | 0 | 0 | ||
cpm | file | 1 | path to cpm file | 0 | 0 | ||
plotname | string | * | name of plot | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
coverageFiles | string | path to coveragefiles | 0 | 0 |
bedname | string | name of bed file | 0 | 0 |
clusterfiles | string | path to cluster files | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inFile | file | file exists | * | input files given in the order of concatenation, files with ending .gz are interpreted as compressed files and are extracted | 0 | 0 | |
outFile | file | 1 | path to save the concatenated files | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
concatenatedFile | string | path of the concatenated file, this is the same value as given by the parameter outFile | 0 | 0 |
identifies consistent SNPs of a sample, so SNPs that were called by both bcftools and Varscan in all replicates of the sample.
identifies consistent SNPs of a sample, so SNPs that were called by both bcftools and Varscan in all replicates of the sample.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bcftool_rep | string | 1 | The .vcf-file, created by bcftools, for all replicates of the same sample. If you have multiple replicates, comma-separate them: bcftools_rep1.vcf,bcftools_rep2.vcf etc. | 0 | 0 | ||
varscan_rep | string | 1 | The .vcf-file, created by varscan, for all replicates of the same sample. If you have multiple replicates, comma-separate them: varscan_rep1.vcf,varscan_rep2.vcf etc. | 0 | 0 | ||
output | file | 1 | Path to your desired output file. | 0 | 0 |
consistentSNPs module was used to identify all SNPs from a sample that were called by both bcftools and Varscan in all replicates of the sample. These are the consistent SNPs of the sample.
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
jarPath | file | absolute | * | path to ContextMap jar file; if not given internal version will be used | 0 | 0 | |
reads | file | absolute | * | path to reads in fasta or fastq format | 0 | 0 | |
alignerName | enum | 1 | name of short-read alignment tool; supported values: 'bwa', 'bowtie1' or 'bowtie2' | 0 | 0 | ||
alignerBin | file | absolute | * | path to the executable of the chosen aligner tool | 0 | 0 | |
indexerBin | file | absolute | * | path to the executable of the aligner's indexing tool (not needed for BWA) | 0 | 0 | |
indices | file | absolute | 1- | comma separated list of paths to basenames of indices, which can be used by the chosen aligner | 0 | 0 | |
genome | file | absolute | 1 | path to a directory with genome sequences in fasta format (each chromosome in a separate file) | 0 | 0 | |
output | file | absolute | 1 | path to the output directory | 0 | 0 | |
skipsplit | string | * | comma separated list of booleans, each element refers to a given aligner index (same ordering); 'true' for no split detection, 'false' otherwise (req. in mining mode). | 0 | 0 | ||
skipmultisplit | string | * | comma separated list of booleans, each element refers to a given aligner index (same ordering); 'true' for no multisplit detection, 'false' otherwise (req. in mining mode). | 0 | 0 | ||
speciesindex | string | * | path to a directory containing index files created with the 'indexer' tool (req. in mining mode) | 0 | 0 | ||
alignerTmp | string | * | path to a directory for temporary alignment files | 0 | 0 | ||
seed | integer | >0 | * | seed length for the alignment (default: Bwt1: 30, BWA/Bwt2: 20) | 0 | 0 | |
splitseedsizes | integer | >0 | 15 | * | seed size for the split search seed (default: 15) | 0 | 0 |
mismatches | integer | >=0 | 4 | * | allowed mismatches in the whole read | 0 | 0 |
seedmismatches | integer | >=0 | * | allowed mismatches in the seed region (default: Bwt1: 1, BWA/Bwt2: 0) | 0 | 0 | |
splitseedmismatches | integer | >=0 | 0 | * | allowed mismatches for the split seed search (default: 0) | 0 | 0 |
mmdiff | integer | >=1 | 0 | * | maximum allowed mismatch difference between the best and second best alignment of the same read | 0 | 0 |
maxhits | integer | >=1 | * | maximum number of candidate alignments per read; reads with more hits are skipped (bwa/bwt1) or the already found hits are reported (bwt2) (default for bwa/bwt1:10, bwt2: 3) | 0 | 0 | |
minsize | integer | >=1 | 10 | * | minimum number of reads a genomic region has to contain for being regarded as a local context | 0 | 0 |
maxindelsize | integer | >=0 | 10 | * | maximum allowed size of insertions or deletions (default: 10) | 0 | 0 |
gtf | file | absolute | * | path to an annotation file in gtf format | 0 | 0 | |
threads | string | * | number of threads used for mapping | 0 | 0 | ||
localTmpFolder | folder | absolute | /usr/local/storage/ | * | path to a local storage that is used for temporary data | 0 | 0 |
mining | boolean | false | * | enables the mining for infections or contaminations | 0 | 0 | |
noclipping | boolean | false | * | disables the calculation of clipped alignments | 0 | 0 | |
noncanonicaljunctions | boolean | false | * | enables the prediction of non-canonical splice sites | 0 | 0 | |
strandspecific | boolean | false | * | enables strand specific mapping | 0 | 0 | |
pairedend | boolean | false | * | enables mapping of paired-end reads; nomenclature for mates from the same fragment: base_name/1 and base_name/2, respectively; only valid for versions smaller than 2.7.2 | 0 | 0 | |
polyA | boolean | false | * | enables the search for polyA-tails (mutually exclusive with --noclipping) | 0 | 0 | |
verbose | boolean | false | * | verbose mode | 0 | 0 | |
keeptmp | boolean | false | * | does not delete some temporary files | 0 | 0 | |
sequenceDB | boolean | false | * | sequence mapping to disk; recommended for very large data sets. | 0 | 0 | |
memoryScaleFactor | integer | [0,100] | 75 | * | scale factor in percent that defines the proportion of the memory that is used for java; default memory: 3GB*threads*(scaleFactor/100) | 0 | 0 |
memoryPerThread | integer | 3072 | * | total memory per thread in MB if running on local host; otherwise memory limit of Watchdog executor might be set; default: 3072 | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
contextMapSAMFile | string | path to mapped SAM file | 0 | 0 |
contextMapPolyAFile | string | path to detected polyA tails | 0 | 0 |
RNA-seq reads were mapped against the XXX genome using ContextMap (%SOFTWARE_VERSION%) with BWA as short read aligner and default parameters.
Pubmed references: 25928589,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
sourcePath | file | file exists | 1 | path of the file to copy | 0 | 0 | |
targetPath | file | 1 | path of the new location of the file, all non-existing parent directories of the file are created | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
gtf | file | 1 | path to gtf file | 0 | 0 | ||
tss | file | 1 | path to tss file | 0 | 0 | ||
outdir | file | 1 | path to output dir | 0 | 0 | ||
name | string | 1 | name | 0 | 0 | ||
info | boolean | * | if info should be written | 0 | 0 | ||
bed | boolean | * | if bed file should be written | 0 | 0 | ||
saf | boolean | * | if saf file should be written | 0 | 0 | ||
bedwindow | boolean | * | if bedwindow should be written for scaled metagenes | 0 | 0 | ||
antisense | boolean | * | if experiment is antisense | 0 | 0 | ||
filterDist | integer | * | if distance to annotated tss should be limited | 0 | 0 | ||
noMapping | boolean | * | if mapping should be avoided | 0 | 0 | ||
minDist | boolean | * | minimum distance | 0 | 0 | ||
genelist | file | * | list of genes | 0 | 0 |
sequence adapters can be removed and sequences can be trimmed based on length or base-call quality scores
sequence adapters can be removed and sequences can be trimmed based on length or base-call quality scores
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
fastq | file path | absolute | 1 | path to one FASTQ file | 0 | 0 | |
prim3 | string | * | adapter that was ligated at the 3' end; '$' at the end will cause that the adapter is anchored at the end of the read | 0 | 0 | ||
prim5 | string | * | adapter that was ligated at the 5' end; '^' at the start will cause that the adapter is anchored at the beginning of the read | 0 | 0 | ||
adapter | string | * | adapter that can be located at the 3' and 5' end | 0 | 0 | ||
errorRate | double | [0, 1] | 0.05 | * | maximum allowed error rate | 0 | 0 |
repeat | integer | [1, 100] | 1 | * | try to remove adapters at most N times | 0 | 0 |
minOverlap | integer | >0 | 6 | * | minimum overlap length | 0 | 0 |
minLength | integer | [1, 100000] | 40 | * | minimum read length after trimming | 0 | 0 |
maxLength | integer | [1, 100000] | -1 | * | maximum read length after trimming | 0 | 0 |
outfile | file path | absolute | 1 | path to an output file | 0 | 0 | |
infofile | file path | absolute | * | path to a file which will contain trimming statistics | 0 | 0 | |
shortenReads | integer | 0 | * | shorten reads to a maximal length after trimming; positive values keep the beginning of reads; negative ones the ends (starting from cutadapt version 1.17) | 0 | 0 | |
cutFixedLength | integer | [-1000000, 1000000] | 0 | * | trimmes a fixed length from the beginning (positive numbers) or the end of the reads (negative numbers) | 0 | 0 |
qualityCutoff | double | 0 | * | trimmes reads at the ends using a sliding window approach | 0 | 0 | |
qualityBase | integer | 33 | * | base quality value | 0 | 0 | |
noIndels | boolean | false | * | does not allow indels between read and adapter | 0 | 0 | |
discardTrimmed | boolean | false | * | discard sequences which were trimmed | 0 | 0 | |
discardUntrimmed | boolean | false | * | discard sequences which were not trimmed | 0 | 0 | |
maskAdapters | boolean | false | * | does not cut the adapters but replace the corresponding regions with N | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
cutadaptTrimFile | string | absolute path to the trimmed output file | 0 | 0 |
cutadaptInfoFile | string | absolute path to a file containing statistical values | 0 | 0 |
Cutadapt (%SOFTWARE_VERSION%) was used to remove adapters and trim sequences [Martin, Marcel. "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet.journal [Online], 17.1 (2011): pp. 10-12. Web. 14 Mar. 2019].
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
input | file | 1 | input file | 0 | 0 | ||
probability | double | 1 | probability of keeping a read (pair) | 0 | 0 | ||
output | string | 1 | output file | 0 | 0 | ||
pathToPicard | string | 1 | path to picard jar-file | 0 | 0 |
Downsampling of reads was performed with the DownsampleSam command line tool of the Picard library.
Pubmed references:
prints the currently set environment variables to the standard output stream
extracts clipped reads from a BAM file into a new BAM file.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bam | file | 1 | Path to bam file, where clipped reads should be extracted from. | 0 | 0 | ||
out | file | 1 | Path to output bam file, were only the extracted clipped reads are stored. | 0 | 0 |
extractClippedReads module was used to extract clipped reads, including soft- and hard-clippings, from the BAM file of a sample and pipe them into a new BAM file.
Pubmed references:
generates quality reports for sequencing data using fastQC
generates quality reports for sequencing data using fastQC
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
contaminants | file path | absolute | * | absolute path to a file containing non-default contaminants to screen for overrepresented sequences; format: name[TAB]sequence | 0 | 0 | |
adapters | file path | absolute | * | absolute path to a file containing non-default adapters to screen against the library; format: name[TAB]sequence | 0 | 0 | |
threads | integer | [1,128] | 1 | * | number of threads to use; each will consume about 256 megabyte of memory | 0 | 0 |
fastq | file path | absolute | 1 | absolute path to fastq file which should be analyzed | 0 | 0 | |
limits | file path | absolute | * | absolute path to a file containing non-default limits for warnings/errors; must be in the same format as the limits.txt shipped with fastQC | 0 | 0 | |
outdir | folder path | absolute | 1 | absolute path to output folder | 0 | 0 |
Quality of the sequencing data was checked using FastQC (%SOFTWARE_VERSION%) [Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc].
Pubmed references:
Downloads fastq files from the NCBI Sequence Read Archive (SRA) using the SRA toolkit. First performs prefetch and then fastq-dump. Can optionally use Aspera client ascp for much faster download (Aspera client should be installed).
Downloads fastq files from the NCBI Sequence Read Archive (SRA) using the SRA toolkit. First performs prefetch and then fastq-dump. Can optionally use Aspera client ascp for much faster download (Aspera client should be installed).
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
sraId | string | 1 | SRA id | 0 | 0 | ||
outputFolder | file | 1 | folder to which fastq files should be extracted | 0 | 0 | ||
pathToAspera | file | * | [optional] path to Aspera client to use Aspera to speedup download | 0 | 0 | ||
checkPresent | boolean | false | * | [optional] check if files already present in output folder and download previously successful. Tests if output fastq files exist, the log file from a previous download is present, fastq files are created not later than the lof file and the log files shows a succesful download. | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
isPairedEnd | boolean | Indicates whether paired-end (two fastq files) or single-end (one fastq file) sequencing data was downloaded | 0 | 0 |
readFile1 | string | path to first fastq file | 0 | 0 |
readFile2 | string | path to second fastq file (identical to first fastq file for single-end sequencing data) | 0 | 0 |
Sequencing data was downloaded from the NCBI Sequence Read Archive (SRA) using the SRA toolkit (version (%SOFTWARE_VERSION%)) [Leinonen R, Sugawara H, Shumway M; International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21.]
Pubmed references: 21062823,
reads or fragments per gene, exon or any other feature are counted using featureCounts
reads or fragments per gene, exon or any other feature are counted using featureCounts
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
annotation | file path | absolute | 1 | feature annotation in GTF or SAF format | 0 | 0 | |
input | file path | absolute | 1 | indexed BAM file which should be used for counting | 0 | 0 | |
output | file path | absolute | 1 | path to output file | 0 | 0 | |
annotationType | enum | SAF|GTF | * | disables automatic type detection based on the file ending of the input file; valid values: GTF or SAF; | 0 | 0 | |
featureType | string | exon | * | feature type (e.g. exon or intron) which is used for counting in GTF mode | 0 | 0 | |
groupType | string | gene_id | * | attribute which is used for summarization in GTF mode | 0 | 0 | |
stranded | integer | 0 | * | indicates strand-specific read counting; possible values: 0 (unstranded), 1 (stranded) and 2 (reversely stranded) | 0 | 0 | |
threads | integer | 1 | * | number of threads used for counting | 0 | 0 | |
disableGroupSummarization | boolean | false | * | flag that can be used to turn summarization on groupType off | 0 | 0 | |
multiMapping | boolean | false | * | flag that enables counting of multi mapped reads | 0 | 0 | |
primary | boolean | true | * | when enabled only alignments which are flagged as primary alignments are counted | 0 | 0 | |
countFragments | boolean | false | * | counts fragments instead of reads; only for paired end data | 0 | 0 | |
multiCountMetaFeatures | boolean | false | * | allows a read to be counted for more than one meta-feature | 0 | 0 | |
detailedReadAssignments | boolean | false | * | saves for each read if it was assigned or not; filename: {input_file_name}.featureCounts; format: read name<TAB>status<TAB>feature name<TAB>number of counts for that read | 0 | 0 | |
minOverlap | integer | 1 | * | minimum number of overlapping bases required to assign a read to a feature; also negative values are allowed | 2 | 2 | |
minReadOverlap | integer | 1 | * | minimum number of overlapping bases required to assign a read to a feature; also negative values are allowed | 1 | 1 | |
minFracOverlap | double | 0 | * | assign reads to the meta-feature/feature which has the largest number of overlapping bases | 2 | 2 | |
readExtension5 | integer | 0 | * | extend reads at the 5' end | 2 | 2 | |
readExtension3 | integer | 0 | * | extend reads at the 3' end | 2 | 2 | |
fraction | boolean | false | * | count fractional; only in combination with the --assignToAllOverlappingFeatures or/and --multiMapping flag(s) | 2 | 2 | |
largestOverlap | boolean | false | * | assign reads to the meta-feature/feature that has the largest number of overlapping bases. | 2 | 2 | |
longReads | boolean | false | * | mode for long read counting (e.g. Nanopore or PacBio) | 2 | 2 |
name | type | description | minV | maxV |
---|---|---|---|---|
FeatureCountSummaryFile | string | absolute file path to the summary file | 0 | 0 |
FeatureCountCountFile | string | absolute file path to the count file | 0 | 0 |
FeatureCounts (%SOFTWARE_VERSION%) was applied to count read/fragment counts per gene/exon/other feature according to %annotation§N% annotation [Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014].
Pubmed references: 24227677,
removes read pairs from sam/bam files created by bwa sampe
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inSamBam | file | file exsists, ending .sam or .bam | 1 | path to mapped paired reads in sam or bam format (recognized by file ending) created by bwa sampe | 0 | 0 | |
outSamBam | string | 1 | path to write remaining paired reads in sam or bam format (recognized by file ending) | 0 | 0 | ||
removeUnmapped | boolean | True | * | use this flag to remove pairs with at least one unmapped read | 0 | 0 | |
removeImproperPairs | boolean | True | * | use this flag to remove pairs that are not properly paired according to bwa sampe | 0 | 0 | |
removeMapqBelow | integer | >=0 | 20 | * | remove all read pairs with at least one mate of mapping quality smaller than minQuality (taken from field "MAPQ" in SAM file), setting the option to 0 deactivates filtering based on mapping quality | 0 | 0 |
removeMoreThanOptimalHits | integer | >=0 | 1 | * | remove all read pairs with more than maxHits optimal alignment positions for at least one mate (based bwa aln specific tag "X0"), setting the option to 0 deactivates filtering based on hit number | 0 | 0 |
isSingleEnd | boolean | False | * | use this flag to indicate that single end data should be filtered | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
filteredPairs | file | path of the sam/bam file with the remaining read pairs (same value as given in parameter outSamBam) | 0 | 0 |
We removed read pairs with unmapped reads/ improper pair classification/ low mapping quality/ multi-mappings (adjust to options used)
Pubmed references:
Identifies open chromatin regions from BAM files using F-Seq. For this purpose, BAM files are first converted to BED input format for F-Seq using bedtools.
Identifies open chromatin regions from BAM files using F-Seq. For this purpose, BAM files are first converted to BED input format for F-Seq using bedtools.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bam | file | 1 | bam file | 0 | 0 | ||
name | string | 1 | sample name used in output files | 0 | 0 | ||
dir | file | 1 | output directory | 0 | 0 | ||
pathToFseq | string | 1 | path to Fseq jar | 0 | 0 | ||
mergeDist | integer | 0 | * | [optional] distance for merging | 0 | 0 | |
heapSize | integer | -Xmx32000M | * | [optional] adjust JAVA OPTS heap size | 0 | 0 |
Open chromatin regions were determined using F-Seq [Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8].
Pubmed references: 18784119,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
outputDir | file | 1 | path to output folder | 0 | 0 | ||
bedgraphTable | file | 1 | path to table with bedgprahp paths | 0 | 0 | ||
bedfile | file | 1 | path to bed file | 0 | 0 | ||
bins | integer | 1 | number of bins to divide region | 0 | 0 | ||
fixedBinSizeUpstream | string | * | [optional] can be used to create fixed bins upstream; format: 'binsize:binnumber' | 0 | 0 | ||
fixedBinSizeDownstream | string | * | [optional] can be used to create fixed bins downstream; format: 'binsize:binnumber' | 0 | 0 | ||
factor | string | 0-null | [optional] factor to generate files for only that factor | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
coverageFiles | string | path to coverage files | 0 | 0 |
bedname | string | name of bed file | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedgraphTable | file | 1 | path to bedgraph table | 0 | 0 | ||
genelist | string | * | list of genes to consider | 0 | 0 | ||
experiment | string | * | type of experiment | 0 | 0 | ||
metaFrame | integer | 1 | frame to plot | 0 | 0 | ||
bins | integer | 1 | number of bins | 0 | 0 | ||
aggregateFUN | string | 1 | function to aggegate | 0 | 0 | ||
normShapeSum | boolean | 1 | how to norm shape | 0 | 0 | ||
normLibSize | boolean | 1 | how to norm lib size | 0 | 0 | ||
normBinLength | boolean | 1 | how to norm bin length | 0 | 0 | ||
wilcox | boolean | * | should wilcox test be done | 0 | 0 | ||
factor | string | 0-null | which factor to consider | 0 | 0 | ||
coverageFiles | file | 1 | path to coverage files | 0 | 0 | ||
bedname | string | 1 | name of bed file | 0 | 0 | ||
plotname | string | * | name of plot | 0 | 0 | ||
config | file | 1 | path to config file | 0 | 0 | ||
clusterPositions | file | * | positions to draw line | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
generateMetagenePlotsOutputFolder | string | path where metagene plot is | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bam | file | file exists, BAM format, reads sorted by coordinates | 1 | path to bam file whose genome coverage should be analyzed | 0 | 0 | |
genome | file | file exists, ending *.genome for an IGV genome file or ending *.chrom.sizes for a simple text file with genome sizes | * | genome file or file with chromosome sizes for the genome that was used to create the bam file, the file is required only if the tdf option is set | 0 | 0 | |
outPrefix | string | 1 | file name prefix for saving the bedgraph file (outPrefix.bedgraph) and the tdf file (outPrefix.bedgraph.tdf) | 0 | 0 | ||
tdf | boolean | true | * | transform bedgraph file into tdf format using igvtools | 0 | 0 | |
bedtoolsPath | file | existing executable | bedtools (in PATH) | * | path to bedtools executable, use if bedtools is not in PATH | 0 | 0 |
igvtoolsPath | file | existing executable | igvtools (in PATH) | * | path to igvtools executable, use if igvtools is not in PATH | 0 | 0 |
We created files for visualizing mapped reads with bedtools and igvtools.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
outputDir | file | 1 | path to output folder | 0 | 0 | ||
table | file | 1 | line entry of bedgraph table | 0 | 0 | ||
for | string | 1 | gives type coverage or metagenes to split table into | 0 | 0 | ||
factor | string | 0-null | [optional] factor to generate files for only that factor | 0 | 0 |
extracts information from text files using the exact or regex-based search of grep
extracts information from text files using the exact or regex-based search of grep
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
outputFile | file path | absolute | 1 | absolute path to file in which the output of grep is written | 0 | 0 | |
file | file path | absolute | 1 | absolute path to file to use as search input | 0 | 0 | |
options | string | * | additional flags or parameters that are directly delivered to grep | 0 | 0 | ||
pattern | string | 1 | pattern to search for; can also be a regex if parameter -P is set | 0 | 0 |
peforms gene set enrichment analysis with GSEAPreranked
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
gseaJar | file | 1 | Path of the GSEA jar file | 0 | 0 | ||
label | string | 1 | name of the analysis, e.g. sample name | 0 | 0 | ||
outdir | string | 1 | directory to store the results of GSEA | 0 | 0 | ||
geneTab | file | file exists | 1 | tab-separated table of genes with expression values/changes | 0 | 0 | |
hasHeader | boolean | False | * | indicates if the first line of the geneTab should be interpreted as header | 0 | 0 | |
geneCol | integer | >=0 | 0 | * | 0-based position of the column with gene names | 0 | 0 |
rankCol | integer | >=0 | 1 | * | 0-based position of the column with values to rank the genes, e.g. fold changes | 0 | 0 |
geneset | string | allowed values: go, hallmark, transcription_factor, oncogenic_signatures, immunologic_signatures | hallmark | * | gene sets to test for enrichment | 0 | 0 |
genesetVersion | string | 6.1 | * | version of MSigDB to use | 0 | 0 | |
scoring | string | allowed values: weighted, unweighted | unweighted | * | unweighted: classic score based on ranks, weighted: score includes values used for ranking | 0 | 0 |
plotNr | integer | gt;0 | 50 | * | create plots for "plot_nr" top scoring genes | 0 | 0 |
We performed gene set enrichment analysis with GSEAPreranked.
Pubmed references: 16199517,
extracts information on genes and exons from GTF files and stores it in CSV format
extracts information on genes and exons from GTF files and stores it in CSV format
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
gtf | file | absolute | 1 | path to the GTF file | 0 | 0 | |
output | file | absolute | 1 | path to the output file; for exons suffix '.exons' is added | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
geneInfoFile | string | absolute path to the resulting CSV file | 0 | 0 |
matches detected variants (SNPs and indels) to genomic features of a GTF file.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
gtf | file | 1 | Path to GTF file containing annotated genomic features. | 0 | 0 | ||
infile | file | 1 | Path to file containing variants. | 0 | 0 | ||
out | file | 1 | Path to output file, where results of variants matched on features are stored. | 0 | 0 | ||
mode | string | 1 | Select variant mode/type, which should get matched on GTF file. Modes are written in capital letters: SNP, INSERTION or DELETION. | 0 | 0 |
gtfMatcher was used to match detected variants, including SNPs, deletions and insertions, to genomic features of a GTF file.
Pubmed references:
compresses and decompresses files using gzip; is able to verify file integrity using a md5 checksum file
compresses and decompresses files using gzip; is able to verify file integrity using a md5 checksum file
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
output | absolute file path | ${input}.gz | * | path to output file | 0 | 0 | |
input | file path | absolute | 1 | path to input file | 0 | 0 | |
decompress | boolean | false | * | decompress the input file instead of compressing it | 0 | 0 | |
verify | boolean | true | * | verify file integrity after decompression using the md5 checksum file | 0 | 0 | |
oldPathMd5 | file path | absolute | * | path where the files was stored when the md5 checksum was created | 0 | 0 | |
limitLines | integer | [1,] | * | extract only the first N lines | 0 | 0 | |
delete | boolean | false | * | delete the file after compression was performed; enforces integrity check | 0 | 0 | |
md5 | file path | absolute | * | path to md5 checksum file to verify file integrity after decompression | 0 | 0 | |
quality | integer | [1,9] | 9 | * | compression quality ranging from 1 to 9; 9 being the slowest but best compression | 0 | 0 |
binaryName | enum | gzip | * | name of the gzip binary; possible values: 'gzip' or 'pigz' | 0 | 0 | |
threads | integer | [1,128] | 1 | * | number of cores to use; only possible if 'pigz' is used as binary | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
processedGzipFile | string | path to the input file | 0 | 0 |
createdGzipFile | string | path to the output file | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
input | file | 1 | Path to file containing the consistent SNPs | 0 | 0 | ||
reference | file | 1 | Path to file containing reference SNPs | 0 | 0 | ||
output | file | 1 | Path to output file containing strain prediction | 0 | 0 | ||
config | file | 1 | Path to config file containing an affiliation of reference samples and virus strain. | 0 | 0 |
identifyStrain module was used to predict the virus strain of a sample using its consistent SNPs.
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bam | file path | absolute | 1 | path to the BAM file | 0 | 0 | |
link | boolean | true | * | creates a link called NAME.bam.bai because some tool expect the index under that name; use --nolink to disable it | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
BAMFile | string | path to the BAM file for which the index was created | 0 | 0 |
Samtools (%SOFTWARE_VERSION%) was used to index the BAM files [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9].
Pubmed references: 19505943,
This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries. The expected proportions of these metrics vary depending on the type of library preparation used, resulting from technical differences between pair-end libraries and mate-pair libraries. For a brief primer on paired-end sequencing and mate-pair reads, see the GATK Dictionary. The CollectInsertSizeMetrics tool outputs the percentages of read pairs in each of the three orientations (FR, RF, and TANDEM) as a histogram. In addition, the insert size distribution is output as both a histogram (.insert_size_Histogram.pdf) and as a data table (.insert_size_metrics.txt). Note: Metrics labeled as percentages are actually expressed as fractions!
This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries. The expected proportions of these metrics vary depending on the type of library preparation used, resulting from technical differences between pair-end libraries and mate-pair libraries. For a brief primer on paired-end sequencing and mate-pair reads, see the GATK Dictionary. The CollectInsertSizeMetrics tool outputs the percentages of read pairs in each of the three orientations (FR, RF, and TANDEM) as a histogram. In addition, the insert size distribution is output as both a histogram (.insert_size_Histogram.pdf) and as a data table (.insert_size_metrics.txt). Note: Metrics labeled as percentages are actually expressed as fractions!
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
Histogram_FILE | file | 1 | File to write insert size Histogram chart to. Required. | 0 | 0 | ||
INPUT | file | 1 | Input SAM/BAM/CRAM file. Required. | 0 | 0 | ||
OUTPUT | file | 1 | The file to write the output to. Required. | 0 | 0 | ||
arguments_file | file | 0-null | [optional] read one or more arguments files and add them to the command line This argument may be specified 0 or more times. Default value: null. | 0 | 0 | ||
COMPRESSION_LEVEL | integer | * | [optional] Compression level for all compressed files created (e.g. BAM and VCF). Default value: 5. | 0 | 0 | ||
DEVIATIONS | double | * | [optional] Generate mean, sd and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and sd grossly misleading regarding the real distribution. Default value: 10.0. | 0 | 0 | ||
GA4GH_CLIENT_SECRETS | string | * | [optional] Google Genomics API client_secrets.json file path. Default value: client_secrets.json. | 0 | 0 | ||
HISTOGRAM_WIDTHW | integer | * | null | 0 | 0 | ||
MAX_RECORDS_IN_RAM | integer | * | [optional] When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. Default value: 500000. | 0 | 0 | ||
METRIC_ACCUMULATION_LEVEL | string | * | [optional] The level(s) at which to accumulate metrics. This argument may be specified 0 or more times. Default value: [ALL_READS]. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} | 0 | 0 | ||
MIN_HISTOGRAM_WIDTH | integer | * | [optional] Minimum width of histogram plots. In the case when the histogram would otherwise betruncated to a shorter range of sizes, the MIN_HISTOGRAM_WIDTH will enforce a minimum range. Default value: null. | 0 | 0 | ||
MINIMUM_PCT | double | * | [optional] When generating the Histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this percentage of overall reads. (Range: 0 to 1). Default value: 0.05. | 0 | 0 | ||
REFERENCE_SEQUENCE | file | * | [optional] Reference sequence file. Default value: null. | 0 | 0 | ||
STOP_AFTER | integer | * | [optional] Stop after processing N reads, mainly for debugging. Default value: 0. | 0 | 0 | ||
TMP_DIR | file | 0-null | [optional] One or more directories with space available to be used by this program for temporary storage of working files This argument may be specified 0 or more times. Default value: null. | 0 | 0 | ||
VALIDATION_STRINGENCY | string | STRICT | * | [optional] Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. Possible values: {STRICT, LENIENT, SILENT} | 0 | 0 | |
VERBOSITY | string | * | [optional] Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING, INFO, DEBUG} | 0 | 0 | ||
ASSUME_SORTED | boolean | true | * | [optional] If true (default), then the sort order in the header file will be ignored. Default value: true. Possible values: {true, false} | 0 | 0 | |
CREATE_INDEX | boolean | false | * | [optional] Whether to create an index when writing VCF or coordinate sorted BAM output. Default value: false. Possible values: {true, false} | 0 | 0 | |
CREATE_MD5_FILE | boolean | false | * | [optional] Whether to create an MD5 digest for any BAM or FASTQ files created. Default value:false. Possible values: {true, false} | 0 | 0 | |
INCLUDE_DUPLICATES | boolean | false | * | [optional] If true, also include reads marked as duplicates in the insert size histogram. Default value: false. Possible values: {true, false} | 0 | 0 | |
QUIET | boolean | false | * | [optional] Whether to suppress job-summary info on System.err. Default value: false. Possible values: {true, false} | 0 | 0 | |
USE_JDK_DEFLATER | boolean | false | * | [optional] Use the JDK Deflater instead of the Intel Deflater for writing compressed output. Default value: false. Possible values: {true, false} | 0 | 0 | |
USE_JDK_INFLATER | boolean | false | * | [optional] Use the JDK Inflater instead of the Intel Inflater for reading compressed input. Default value: false. Possible values: {true, false} | 0 | 0 | |
version | boolean | false | * | [optional] display the version number for this tool | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
outputHistogramFile | string | output file containing the histogram of insert sizes | 0 | 0 |
outputBamFile | string | txt file containing insert size metrics | 0 | 0 |
Insert size metrics were calculated with the picard library (%SOFTWARE_VERSION%).
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
input | string | 1- | multiple input files (or input folders) in the order in which they should be joined; in pattern mode (--pattern) folder path(s) are expected | 0 | 0 | ||
output | file path | absolute | 1 | path to output file | 0 | 0 | |
convertPairedEnd | boolean | false | * | special flag for joining of FASTQ files; adds /1 and /2 at the end of read names if casava format 1.8 or greater is used; default: disabled | 0 | 0 | |
pattern | string | 0-null | one ore more unix file pattern (e.g. *.txt) that are used to find files matching that pattern; one pattern corresponds to one input folder path; order of files to join can not be influenced | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
joinedFile | string | absolute file path to the joined file | 0 | 0 |
LEON is a reference-free method to compress high throughput sequencing data
LEON is a reference-free method to compress high throughput sequencing data
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
input | file path | absolute | 1 | absolute path to input file; supported file formats: compress: *.fastq or *.fq; decompress: *.leon.tar | 0 | 0 | |
threads | integer | 1 | * | number of cores to use | 0 | 0 | |
kmerSize | integer | 31 | * | k-mer size that is used for compression | 0 | 0 | |
outputFolder | folder path | absolute | 1 | path to folder in which the compressed file is stored; resulting file will have *.leon.tar or *.fastq ending | 0 | 0 | |
workingDir | folder path | absolute | /usr/local/storage/ | * | path to working directory | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
createdFile | string | path to the compressed or decompressed file | 0 | 0 |
Sequencing data was (de-)compressed using LEON (%SOFTWARE_VERSION%) [G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 2015, 16:288.].
Pubmed references: 26370285,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
folder | folder | absolute | 1- | one ore more input folders; one for each pattern | 0 | 0 | |
output | file path | absolute | * | write results to a file; one line per found file | 0 | 0 | |
sep | string | , | * | separator between entries | 0 | 0 | |
maxdepth | integer | 0 | * | descend at most n levels of folders | 0 | 0 | |
pattern | string | 1- | one ore more unix file pattern (e.g. *.txt) that are used to find files matching that pattern; one pattern corresponds to one input folder path | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
foundFiles | string | found files joined with the separator | 0 | 0 |
summarizes read counts remaining after different analysis steps of sequencing data
summarizes read counts remaining after different analysis steps of sequencing data
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
basicStatsSummary | file | file exists, output of mergeStatistics module | * | Output of the Watchdog Module mergeStatistics applied on the Basic Statistics reported by FASTQC (tab-separated table, column 0: type of count, column 1: read count, column 2: file name) | 0 | 0 | |
rawRegex | string | valid regular expression in python re | * | regular expression with one group expression to extract the sample name from the name of a fastq file with untrimmed reads | 0 | 0 | |
trimRegex | string | valid regular expression in python re | * | regular expression with one group expression to extract the sample name from the name of a fastq file with trimmed reads | 0 | 0 | |
idxstatsSummary | file | file exists, output of mergeStatistics module | * | Output of the Watchdog Module mergeStatistics applied on the Idxstatistics reported by the bamstats module (tab-separated table, column 0: chromosome, column 2: read count, column 4: file name) | 0 | 0 | |
bamRegex | string | valid regular expression in python re | * | regular expression with one group expression to extract the sample name from the name of a bam file with mapped reads | 0 | 0 | |
chromosomeGroupingTable | file | * | tab-separated table with a header with chromosome names in column 0 and groups in column 1 | 0 | 0 | ||
countTable | string | 1 | path for writing a table with all extracted read counts | 0 | 0 | ||
countPlot | string | * | path for saving a summary plot of total, trimmed and mapped reads, format is identified by file ending, all formats supported by pyplot are allowed | 0 | 0 | ||
groupPlot | string | * | path for saving a summary plot of the fraction of mapped reads for given groups of chromosomes, format is identified by file ending, all formats supported by pyplot are allowed | 0 | 0 |
We created figures summarizing the number of reads in our sequencing experiments before and after adapter removal and mapping.
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
infile | file | 2- | input bam file(s) | 0 | 0 | ||
outfile | file | 1 | output bam file | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
mergedBamFile | string | output bam file (= value for parameter outfile) | 0 | 0 |
bam files were merged using samtools (Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9)
Pubmed references: 19505943,
combines the output of multiple featureCounts runs in one CSV file
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
searchFolder | file | absolute | 1 | path to the folder in which *.counts files are located | 0 | 0 | |
output | file | absolute | 1 | path to the output file | 0 | 0 | |
statsFolder | file | absolute | * | path to merged statistic folder required for plotting | 0 | 0 | |
featureAnnotation | file | absolute | * | annotation file which is joined with the count file | 0 | 0 | |
featureAnnotationID | string | Geneid | * | name of the column with is used for joining | 0 | 0 | |
featureAnnotationType | string | type | * | name of the column in the annotation file for which a distribution plot is created | 0 | 0 | |
featureAnnotationExonLength | string | exon_length | * | name of the column that contains the exon length of the features | 0 | 0 | |
noPlotting | boolean | false | * | disables the execution of R scripts | 0 | 0 | |
prefixNames | boolean | false | * | prefixes the names of the features with continuous numbers | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
mergedCountFile | string | absolute path to the merged count file in CSV format | 0 | 0 |
takes a folder containing BAM statistics generated by the bamstats module and generates table-formated files
takes a folder containing BAM statistics generated by the bamstats module and generates table-formated files
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
type | string | 1 | type of the statistic merger that should be called; allowed values: FastQC, Star, BamstatsMerger, CutadaptMerger, FeatureCounts, FlagstatMerger | 0 | 0 | ||
inputDir | folder path | absolute | 1 | path to input folder | 0 | 0 | |
outputDir | folder path | absolute | 1 | path to output folder | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
mergedFile | string | absolute path to the merged file | 0 | 0 |
mergedType | string | type of the merger (parameter: type) | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
outputDir | file | 1 | path to output folder | 0 | 0 | ||
gtf | file | 1 | path to gtf file | 0 | 0 | ||
bam | file | 1 | path to bam file | 0 | 0 | ||
promStart | integer | * | start position of promoter window | 0 | 0 | ||
promEnd | integer | * | end position of promoter window | 0 | 0 | ||
bodyStart | integer | * | start position of body window | 0 | 0 | ||
bodyLength | integer | * | end position of body window | 0 | 0 | ||
genelist | string | 1 | list of genes to consider | 0 | 0 | ||
tss | file | 1 | path to tss file | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
pausingindices | string | dir where pausing indices are computed | 0 | 0 |
analyzes strand cross-correlation in mapped reads from ChIP-seq experiments
analyzes strand cross-correlation in mapped reads from ChIP-seq experiments
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inBam | string | valid file path, bam format, ending *.bam | 1 | Path to the bam file with mapped ChIP-seq reads. An index of the bam file is not required. | 0 | 0 | |
outPrefix | string | 1 | Common prefix of all output files. The module produces 3 files: outPrefix.txt (summary file), outPrefix.pdf (cross-correlation plot) and outPrefix.Rdata (R session of the analysis). | 0 | 0 | ||
sppPath | string | valid file path to the script run_spp.R | 1 | Path to executable (R script) of phantompeakqualtools which is usually called run_spp.R | 0 | 0 | |
rscriptPath | string | valid file path, executable | Rscript in PATH variable | * | Path to executable Rscript if not given in PATH variable | 0 | 0 |
tmpdir | string | path to existing folder | return value of the tempdir() function of R | * | Folder for writing temporary files. The tool copies the whole bam file to this location. All temporary files are extended with a random suffix. | 0 | 0 |
threads | integer | >=1 | 1 | * | Number of threads used for the calculations | 0 | 0 |
Phantompeakqualtools were used to perform quality control of the mapped ChIP-seq reads.
Pubmed references: 22955991,
collects single amss files and creates annotation files for featurecounts and dexseq
collects single amss files and creates annotation files for featurecounts and dexseq
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
indir | file | 1 | input directory | 0 | 0 | ||
annot | file | 1 | annotation file name to write in | 0 | 0 | ||
annot_fc | file | 1 | annotation file to write in for featurecounts | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
controlCondition | string | 1 | name of control condition | 0 | 0 | ||
testCondition | string | 1 | name of test condition | 0 | 0 | ||
sampleAnnotation | string | 1 | path to sample annotation file with conditions | 0 | 0 | ||
out | file | 1 | output directory | 0 | 0 |
Calculates readthrough and readin values and optionally downstream FPKM and expression in dOCR regions
Calculates readthrough and readin values and optionally downstream FPKM and expression in dOCR regions
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
annotation | file | 1 | annotation file path | 0 | 0 | ||
genecounts | file | 1 | gene read count file | 0 | 0 | ||
input | file | 1 | input file | 0 | 0 | ||
output | file | 1 | output file | 0 | 0 | ||
readthroughLength | integer | 5000 | * | [optional] length of downstream window in which read-through is calculated | 0 | 0 | |
readinLength | integer | 5000 | * | [optional] length of upstream window in which read-in is calculated | 0 | 0 | |
strandedness | integer | 0 | * | strandedness: 0=not strandspecific, 1=first read indicates strand, 2=second read indicates strand | 0 | 0 | |
overlap | integer | 25 | * | [optional] minimum overlap of read to be counted for read-through/in window | 0 | 0 | |
idxstats | file | * | [optional] idxstats file with numbers of mapped reads per chromosome, necessary for calculating downstream FPKM and transcription in dOCR regions | 0 | 0 | ||
normFactor | string | * | [optional] factor for normalizing by mapped reads and gene length for downstream FPKM calculation | 0 | 0 | ||
exclude | string | * | [optional] chromosomes to exclude from calculating total mapped reads, separated by , | 0 | 0 | ||
excludeType | string | * | [optional] gene types to exclude when determining genes with no other genes up- or down-stream, separated by , | 0 | 0 | ||
dOCRFile | string | * | [optional] file containing dOCR lengths | 0 | 0 | ||
windowLength | integer | 1000 | * | [optional] number of steps for evaluating transcription on dOCRs | 0 | 0 |
Read-through was calculated as previously described in Hennig T et al, 2028, PLOS Pathogens 14(3): e1006954
Pubmed references: 29579120,
calculates readout for every sample in a project from recount.
calculates readout for every sample in a project from recount.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
projectID | string | mutually exclusive with projectFile | * | project id of a sra project indexed in recount2, it is possible to pass several project ids separated by , | 0 | 0 | |
projectFile | file | file exists, mutually exclusive with projectID | * | file with one line giving project ids (file content = all allowed values for projectID) | 0 | 0 | |
geneTSV | file | file exists | 1 | tab-separated file with genes, cooridantes, exonic basepairs and upstream and downstream regions (requires a line with column names chr, geneid, exonic_bps, upstream_start, upstream_end, downstream_start and downstream_end) | 0 | 0 | |
outfolder | folder | 1 | folder for saving final results, creates a subfolder for the project with a table of coverage values for every sample in the project | 0 | 0 | ||
tmpfolder | folder | 1 | folder for saving temporary data, creates a subfolder for the project (named projectID) | 0 | 0 | ||
Rscript | file | executable | * | path to Rscript executable (preferentially version 5.3) | 0 | 0 | |
removeTmpSampleData | boolean | true | * | if this flag is set, temporary files for samples are deleted at the end (default behaviour) | 0 | 0 | |
removeTmpProjectData | boolean | true | * | if this flag is set, temporary files for projects are deleted at the end (default behaviour) | 0 | 0 | |
threads | integer | >=1 | 1 | * | number of threads to use, equivalent to number of samples processed in parallel | 0 | 0 |
downloadParallel | boolean | false | * | if this flag is set, big wig files are downloaded by in parallel (default: not set) | 0 | 0 | |
localRecountFolder | folder | absolute | * | folder that can contain locally processed or already downloaded recount data; structure: projectID/rse_gene.Rdata and projectID/bw/sampleID.bw | 0 | 0 |
Normalized readcounts for genes, upstream regions and downstream regions were calculated from the bigwigfiles provided by the Recount2 project.
Pubmed references: 28398307,
removes linearly mappable reads from a circRNA prediction.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
mapping | file | SAM or BAM format | 1 | path to a SAM or BAM file with mapped reads from the sample for which circRNAs were predicted (file ending is used to decide if it is SAM or BAM format) | 0 | 0 | |
circRNAPrediction | file | file exists | 1 | predicted circRNAs from the CIRI2, circRNAfinder or the circCombination module (tab-separated, 5 columns: chromosome, start, end, strand, list of reads) | 0 | 0 | |
circOut | file | 1 | all circRNAs from the input file with at least minReads remaining circular reads after removing all linearly mappable reads from the lists circular junction reads | 0 | 0 | ||
minReads | integer | >=1 | 2 | * | Minimum number of predicted junction reads required for writing a circRNA to the outputfile, default:2 | 0 | 0 |
paired | string | 'yes' or 'no' | yes | 1 | indicates if SAM or BAM input file contains paired-end (yes) or single-end (no) data | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
filteredCircs | file | path to circRNA predictions with the filtered lists of circular reads (same as input parameter circOut) | 0 | 0 |
We filtered the predicted circular reads by removing those reads that can be mapped elsewhere in a linear way.
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
in1 | file | file exists, allowed file endings: fastq, fq, fq.gz | 1 | first (gzipped) fastQ file with the sequenced reads | 0 | 0 | |
in2 | file | file exists, allowed file endings: fastq, fq, fq.gz | * | second (gzipped) fastQ file with the sequenced reads (for paired-end data only) | 0 | 0 | |
rrnaIndex | string | filename prefix for a bwa index | 1 | Common prefix of bwa index files for the rRNA sequence | 0 | 0 | |
out1 | string | file path with file ending fastq, fq, fa or fq.gz | 1 | file for writing non-rRNA reads from in1 in fasta or (gzipped) fastq format | 0 | 0 | |
out2 | string | file path with file ending fastq, fq, fa or fq.gz | * | file for writing non-rRNA reads from in2 (for paired-end data) in fasta or (gzipped) fastq format | 0 | 0 | |
sam | string | file path with file ending sam | 1 | sam file for writing rRNA reads from in1 and in2 | 0 | 0 | |
workdir | folder | folder exists | os.getcwd() | * | path to directory for writing large temporary files (content is deleted at the end of execution), default: current directory | 0 | 0 |
keepTmp | boolean | False | * | option to keep temporary files | 0 | 0 | |
maxEditDistance | integer | >=0 | infinity | * | maximum allowed edit distance for a read alignment against rRNA | 0 | 0 |
maxMismatches | integer | >=0 | infinity | * | maximum allowed number of mismatches for a read alignment against rRNA | 0 | 0 |
maxIndels | integer | >=0 | infinity | * | maximum allowed number of indels for a read alignment against rRNA | 0 | 0 |
pairFiltering | integer | 1 or 2 | 2 | * | Number of reads of a pair required to fulfil the options above (maxEditDistance, maxMismatches, maxIndels) | 0 | 0 |
bwaPath | executable | bwa | * | path to bwa executable | 0 | 0 | |
seedSize | integer | >=1 | 25 | * | size of initial seed for bwa (-k option of bwa) | 0 | 0 |
threads | integer | >=1 | 1 | * | number of threads to use for bwa (-t option of bwa) | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
rrnaSAMFile | string | path to rRNA reads in SAM format (same value as given by the sam parameter) | 0 | 0 |
filteredFQ1 | string | path to non rRNA reads in FASTQ format (same value as given by the out1 parameter) | 0 | 0 |
filteredFQ2 | string | path to non rRNA reads in FASTQ format (same value as given by the out2 parameter), for single-end data the value of the return variable is set to "not_defined_for_single_end" | 0 | 0 |
Before mapping the reads to the reference genome we removed reads originating from rRNAs
Pubmed references:
converts SAM files into compressed BAM format using samtools sort
converts SAM files into compressed BAM format using samtools sort
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
sam | file path | absolute | 1 | path to SAM file that should be compressed | 0 | 0 | |
bam | file path | absolute | 1 | path to ouput BAM file | 0 | 0 | |
threads | integer | 1 | * | number of threads to use for compression | 0 | 0 | |
quality | integer | [1, 9] | 9 | * | compression level; 1 is the worst/fastest and 9 is the best/slowest compression | 0 | 0 |
memory | string | 768M | * | maximal memory that can be used per thread; only an estimation and might be exceeded! | 0 | 0 | |
tmpFolder | folder path | absolute | * | write temporary files to that folder | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
BAMFile | string | absolute path to the resulting BAM file | 0 | 0 |
Samtools (%SOFTWARE_VERSION%) was used to convert SAM to BAM files [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9].
Pubmed references: 19505943,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bamoutput | boolean | false | * | output BAM | 0 | 0 | |
cramoutput | boolean | false | * | output CRAM (requires reference sequence) | 0 | 0 | |
fastCompression | boolean | false | * | use fast BAM compression (implies bamoutput) | 0 | 0 | |
uncompressedBam | boolean | false | * | uncompressed BAM output (implies bamoutput) | 0 | 0 | |
includeHeader | boolean | false | * | include header in SAM output | 0 | 0 | |
printOnlyHeader | boolean | false | * | print SAM header only (no alignments) | 0 | 0 | |
printCounts | boolean | false | * | print only the count of matching records | 0 | 0 | |
output | file | stdout | * | output file name | 0 | 0 | |
outputReadsNotSelected | file | * | output reads not selected by filters to FILE | 0 | 0 | ||
referenceLengths | file | * | FILE listing reference names and lengths (see long help) | 0 | 0 | ||
bedfile | file | * | only include reads overlapping this BED FILE | 0 | 0 | ||
readgroup | string | * | only include reads in read group STR | 0 | 0 | ||
readgroupFile | file | * | only include reads with read group listed in FILE | 0 | 0 | ||
mappingquality | integer | 0 | * | only include reads with mapping quality at least INT | 0 | 0 | |
library | string | * | only include reads in library STR | 0 | 0 | ||
minquerylength | integer | * | only include reads with number of CIGAR operations consuming query sequence at least INT | 0 | 0 | ||
bitsset | integer | 0 | * | only include reads with all bits set in INT set in FLAG | 0 | 0 | |
bitsnotset | integer | 0 | * | only include reads with none of the bits set in INT set in FLAG | 0 | 0 | |
readTagToStrip | string | * | read tag to strip (repeatable) | 0 | 0 | ||
collapseCIGAROperation | string | * | collapse the backward CIGAR operation | 0 | 0 | ||
seed | double | 0 | * | integer part sets seed of random number generator, rest sets fraction of templates to subsample | 0 | 0 | |
threads | string | * | number of BAM/CRAM compression threads | 0 | 0 | ||
printLongHelp | string | * | print long help, including note about region specification | 0 | 0 | ||
inputfmtoption | string | * | Specify a single input file format option in the form of OPTION or OPTION=VALUE | 0 | 0 | ||
outputfmt | string | * | Specify output format (SAM, BAM, CRAM) | 0 | 0 | ||
outputfmtoption | string | * | Specify a single output file format option in the form of OPTION or OPTION=VALUE | 0 | 0 | ||
reference | string | * | Reference sequence FASTA FILE | 0 | 0 | ||
inbam | file | * | input bam file | 0 | 0 | ||
insam | file | * | input sam file | 0 | 0 | ||
incram | file | * | input cram file | 0 | 0 | ||
region | string | * | region selected | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
outputFile | string | output file (= value for parameter output) | 0 | 0 |
Samtools was used to convert BAM/SAM/CRAM to BAM/SAM/CRAM [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9]
Pubmed references: 19505943,
Performs visualization of splicing events across multiple samples using ggsashimi.
Performs visualization of splicing events across multiple samples using ggsashimi.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
help | string | * | show this help message and exit | 0 | 0 | ||
bam | file | * | Individual bam file or file with a list of bam files. In the case of a list of files the format is tsv: 1col: id for bam file, 2col: path of bam file, 3+col: additional columns | 0 | 0 | ||
coordinates | string | * | Genomic region. Format: chr:start-end (1-based) | 0 | 0 | ||
outprefix | string | sashimi | * | Prefix for plot file name | 0 | 0 | |
outstrand | string | both | * | Only for --strand other than 'NONE'. Choose which signal strand to plot: both, plus, minus | 0 | 0 | |
mincoverage | integer | 1 | * | Minimum number of reads supporting a junction to be drawn | 0 | 0 | |
junctionsbed | file | * | Junction BED file name | 0 | 0 | ||
gtf | file | * | Gtf file with annotation (only exons is enough) | 0 | 0 | ||
strand | string | NONE | * | Strand specificity: NONE, SENSE, ANTISENSE, MATE1_SENSE, MATE2_SENSE | 0 | 0 | |
overlay | integer | * | Index of column with overlay levels (1-based) | 0 | 0 | ||
aggr | string | * | Aggregate function for overlay: mean, median, mean_j, median_j. Use mean_j | median_j to keep density overlay but aggregate junction counts | 0 | 0 | ||
colorfactor | integer | * | Index of column with color levels (1-based) | 0 | 0 | ||
alpha | double | 0.5 | * | Transparency level for density histogram | 0 | 0 | |
palette | file | * | Color palette file. tsv file with at least 1 column, where the color is the first column | 0 | 0 | ||
labels | integer | * | Index of column with labels (1-based) | 0 | 0 | ||
height | double | 2 | * | Height of the individual signal plot in inches | 0 | 0 | |
annheight | double | 1.5 | * | Height of annotation plot in inches | 0 | 0 | |
width | double | 10 | * | Width of the plot in inches | 0 | 0 | |
basesize | integer | 14 | * | Base font size of the plot in pch | 0 | 0 | |
outformat | string | * | Output file format: pdf, svg, png, jpeg, tiff | 0 | 0 | ||
outresolution | integer | 300 | * | Output file resolution in PPI (pixels per inch). Applies only to raster output formats | 0 | 0 | |
shrink | boolean | false | * | Shrink the junctions by a factor for nicer display | 0 | 0 |
Sashimi plots were created using ggsashimi [Garrido-Martín D, Palumbo E, Guigó R, Breschi A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 2018 Aug 17,14(8):e1006360. ]
Pubmed references: 30118475,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bedgraphTable | file | 1 | table with paths to bedgraph files and conditions/replicates | 0 | 0 | ||
genelist | string | * | list of genes to consider | 0 | 0 | ||
experiment | string | * | type of experiment | 0 | 0 | ||
metaFrame | integer | 1 | frame to plot | 0 | 0 | ||
bins | integer | 1 | number of fixed bins to scale | 0 | 0 | ||
aggregateFUN | string | 1 | function for aggregation | 0 | 0 | ||
normShapeSum | boolean | 1 | how to norm shape | 0 | 0 | ||
normLibSize | boolean | 1 | how to norm lib size | 0 | 0 | ||
normBinLength | boolean | 1 | how to norm bin length | 0 | 0 | ||
factor | string | 0-null | factor to consider | 0 | 0 | ||
coverageFiles | file | 1 | path to where coverage files are | 0 | 0 | ||
bedname | string | 1 | name of bed file | 0 | 0 | ||
plotname | string | * | name of plot | 0 | 0 | ||
config | file | 1 | file to configs | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
scaledMetashapeOutputFolder | string | folder where plot is | 0 | 0 |
SPRING is a reference-free method to compress high throughput sequencing data
SPRING is a reference-free method to compress high throughput sequencing data
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
fastq | file | * | path to one or two (PE datasets) fastq files; possible endings: *.fastq, *.fq, *.fastq.gz or *.fq.gz file | 0 | 0 | ||
spring | file | 1 | path to compressed spring file; possible endings: *.spring or *.tar | 0 | 0 | ||
compress | boolean | true | * | if true the fastq files are compressed; otherwise the spring file is decompressed | 0 | 0 | |
preserveOrder | boolean | true | * | preserve read order | 0 | 0 | |
quality | boolean | true | * | retain quality values during compression | 0 | 0 | |
ids | boolean | true | * | retain read identifiers during compression | 0 | 0 | |
qualityMode | enum | lossless | * | possible values: 'lossless', 'qvz qv_ratio', 'ill_bin' or 'binary thr high low' | 0 | 0 | |
long | boolean | false | * | use for compression of arbitrarily long reads | 0 | 0 | |
decompressRange | string | * | decompress only reads (or read pairs for PE datasets) from start to end (both inclusive); e.g. '1 100' | 0 | 0 | ||
workingDir | file | /usr/local/storage/ | * | path to working directory | 0 | 0 | |
threads | integer | 1 | * | number of cores to use | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
createdFile | string | path to the compressed or decompressed file (separated by ',' in case of PE datasets) | 0 | 0 |
isPairedEnd | boolean | true if paired-end data was processed | 0 | 0 |
downloads and extracts FASTQ files from the Sequence Read Archive (SRA)
downloads and extracts FASTQ files from the Sequence Read Archive (SRA)
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
sraFile | file path | absolute | 0-null | path to the *.sra file(s); can not be used in combination with --sraID | 0 | 0 | |
sraID | string | 0-null | one or more SRA ID(s); can not be used in combination with --sraFile | 0 | 0 | ||
rename | string | * | new basename for the resulting fastq files; | 0 | 0 | ||
outputFolder | folder path | absolute | 1 | path to folder in which the files should be extracted | 0 | 0 | |
tmpFolder | folder path | absolute | /usr/local/storage | * | tmp folder; default: /usr/local/storage | 0 | 0 |
deleteOnSuccess | boolean | false | * | deletes the SRA file when extraction was successfull | 0 | 0 | |
disablePrefetch | boolean | false | * | disables prefetching of the sra files | 0 | 0 | |
binaryName | enum | fastq-dump | * | name of the sra-toolkit binary; possible values: 'fastq-dump' or 'fasterq-dump' | 0 | 0 | |
threads | integer | [1,128] | 1 | * | number of cores to use; only possible if 'fasterq-dump' is used as binary | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
isPairedEnd | boolean | true, if paired end data was downloaded from SRA | 0 | 0 |
baseName | string | absolute base name path to the created files | 0 | 0 |
createdFiles | string | absolute path to all files that were downloaded separated by ',' | 0 | 0 |
Public samples were downloaded from the SRA (accession number: TODO %sraID%) [Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2010;39(Database issue):D19-21.].
Pubmed references: 21062823,
calls deletions and insertions. Deletions are also verified and consensus sequences of insertions are extracted.
calls deletions and insertions. Deletions are also verified and consensus sequences of insertions are extracted.
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
help | string | * | show this help message and exit | 0 | 0 | ||
bed | file | * | Path to bedgraph file | 0 | 0 | ||
min_cld | integer | 100 | * | The mininum distance of two clusters, at which they still get combined | 0 | 0 | |
min_size | integer | 2 | * | Minimum size of a deletion. | 0 | 0 | |
max_z | double | 0.0 | * | Maximum z score threshold for coverage analysis | 0 | 0 | |
max_direct | double | -2.5 | * | Maximum direct z score threshold for coverage analysis | 0 | 0 | |
max_local | double | -6.0 | * | Maximum local z score threshold for coverage analysis | 0 | 0 | |
range | integer | 500 | * | Size of range/region before a certain position, used for the determination of local z Score parameters | 0 | 0 | |
pc | integer | 1 | * | Pseudo count for coverages over positions | 0 | 0 | |
tol | double | 0.8 | * | Tolerance of insertion positions mapped to deletions | 0 | 0 | |
bam | file | * | Path to bam file used for clipping patter analysis | 0 | 0 | ||
out_del | file | * | Path to output txt file containing deletions | 0 | 0 | ||
out_ins | file | * | Path to output txt file containing insertions | 0 | 0 | ||
max_patt_diff | integer | 10 | * | Maximum distance of peaks of clipped reads to count them as insertion | 0 | 0 | |
min_sur_z | double | 50.0 | * | Minimum local z score for clipping pattern analysis | 0 | 0 | |
ws | integer | 20 | * | Size of the window, whose position are controlled to be significantly low | 0 | 0 | |
min_z | double | 10.0 | * | Minimum z score for clipping pattern analysis | 0 | 0 | |
get_clp_file | file | * | Set this paramter as a path to get a file containing for each position the number of clipped reads | 0 | 0 | ||
min_reads | integer | 10 | * | Minimum number of reads at which a position is permitted to be a peak | 0 | 0 | |
gen_prop | integer | 1000 | * | Number of propagations to determine genome start/end | 0 | 0 | |
gap | integer | 5 | * | Maximum number of permitted consecutive gaps/0-coverage positions during the determination of the genome start/end | 0 | 0 | |
ref | file | * | Path to reference genome | 0 | 0 | ||
fir_ws | double | 0.0 | * | Primary threshold for the score, which is used for the verification of deletions with clipped sequences | 0 | 0 | |
sec_ws | double | 1.0 | * | Secondary, more stringent threshold for the score, which is used for the verification of deletions with clipped sequences | 0 | 0 | |
con_path | file | * | Path to the file containing the consensus sequences | 0 | 0 | ||
mpc | integer | 1 | * | Small pseudo count for the log used for the computation of the PWMs | 0 | 0 | |
min_length | integer | 10 | * | The minimum length of a consensus sequence | 0 | 0 | |
clp_ver_range | integer | 100 | * | The range of clipped positons of deletions, where consesus sequences are tried to match on | 0 | 0 |
svCaller was used to call deletions, insertions as well as consensus sequences of insertions and to verify the predicted deletions.
Pubmed references:
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
inReads1 | file | file exists, fastq format (read names without read numbers as /1) | * | path to first fastq file with reads | 0 | 0 | |
inReads2 | file | file exists, fastq format (read names without read numbers as /2) | * | path to second fastq file with reads | 0 | 0 | |
inPrefix | string | prefix1.[fastq|fq] and prefix2.[fastq|fq] exist and meet the restrictions of inReads1 and inReads2 | * | reads in two fastq files: prefix1.[fastq|fq], prefix2.[fastq|fq], can be used instead of inReads1 and inReads2 | 0 | 0 | |
outReads1 | string | * | output file for first reads of paired data | 0 | 0 | ||
outReads2 | string | * | output file for second reads of paired data | 0 | 0 | ||
outSingletons | string | * | output file for singleton reads without a mate | 0 | 0 | ||
outPrefix | string | * | writes output to three files: prefix1.fastq, prefix2.fastq, prefixsingleton.fastq, can be used instead of outReads1, outReads2 and outSingleton | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
pairedReads1 | string | output file for first reads of paired data given in the parameters via outReads1 or outPrefix | 0 | 0 |
singletonReads | string | output file for second reads of paired data given in the parameters via outReads2 or outPrefix | 0 | 0 |
pairedReads2 | string | output file for singleton reads without a mate given in the parameters via outReads2 or outPrefix | 0 | 0 |
We removed all reads with missing mates from the paired-end fastq files.
Pubmed references:
unique molecular identifiers (UMIs) can be used to remove PCR duplicates
unique molecular identifiers (UMIs) can be used to remove PCR duplicates
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
bamFile | file path | absolute | 1 | path to the BAM file; UMI must be a suffix of the fastq id separated with '_' | 0 | 0 | |
outputFile | file path | absolute | 1 | path to the de-duplicated BAM file | 0 | 0 | |
deleteOnSuccess | boolean | false | * | deletes the BAM file when deduplication was successfull | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
deduplicatedFile | string | absolute path to the de-duplicated BAM file | 0 | 0 |
UMI-tools was used to remove PCR duplicates from the raw sequecing data based on UMIs [Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017;27(3):491-499.].
Pubmed references: 28100584,
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
infile | file | 1 | input file, must be *tar, *tar.gz or *tar.bz2 | 0 | 0 | ||
outputdir | file | * | [optional] output directory for extracting archive | 0 | 0 |
name | type | restrictions | default | occurrence | description | minV | maxV |
---|---|---|---|---|---|---|---|
uri | string | 1- | one ore more URI(s) pointing to the resource(s) to download | 0 | 0 | ||
output | folder path | absolute | 1 | path to a folder in which the downloaded files should be stored; filename remains untouched | 0 | 0 | |
rename | string | 0-null | renames the file to that name; multiple names must be provided in case of multiple URIs | 0 | 0 | ||
disableSizeCheck | boolean | false | * | flag that can be used to disable the size check that checks if a file is greater than 1KB | 0 | 0 |
name | type | description | minV | maxV |
---|---|---|---|---|
downloadedFolder | string | path to the folder in which the files were stored | 0 | 0 |
numberOfFiles | integer | number of files that were downloaded | 0 | 0 |
downloadedFiles | string | absolute path to the downloaded file(s) separated by ',' | 0 | 0 |