title author category description

About

GENERAL

2019-08-09

AttachAnno

attaches annotations to a CSV file

Michael Kluge

AttachAnno

by Michael Kluge - version 1

version {@VERSION_LINKS@}

attaches annotations to a CSV file

Dependencies

GNU core utilities
GNU R
packages: getopt, stringi

Parameter

name	type	restrictions	default	occurrence	description
targetFile	file	absolute		1	path to char-separated table file with header
targetSep	string		\t	1-	separating char in the annotation file(s)
outputFile	file	absolute		1	path to the annotated output file
targetIDcolumn	string			1-	name of the column of the target file that should be used to merge the table with the annotation file(s)
annotationIDcolumn	string			1-	name(s) of the column(s) of the annotation file(s) that should be used to merge the table with the annotation file(s)
annotationFile	file	absolute		1-	path(s) to annotation table file(s) that should be attached
annotationSep	string		\t	1-	separating char in the target file

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

BWA

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

runs BWA mem alignment on sequencing reads

Parameter

name	type	default	occurrence	description
index	file		1	genome index
in1	file		1	in1.fastq
in2	file		*	[optional] in2.fastq
out	file		1	outfile
all	boolean		*	output all alignments for SE or unpaired PE [default false]
ignoreIndexExistence	boolean		*	do not throw an error if index does not exist [default false]
numberOfThreads	integer	1	*	[optional] number of threads [default 1]
minimumSeedLength	integer	19	*	[optional] minimum seed length [default 19]

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Sequencing reads were aligned using BWA mem (version (%SOFTWARE_VERSION%)) [Li, H., Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754–1760]

Pubmed references: 19451168,

ChIPSeeker

by Michael Kluge - version 1

version {@VERSION_LINKS@}

ChIPSeeker can be used to visualize called peaks in ChIP-seq data

Dependencies

GNU R
packages: GenomicFeatures, ChIPseeker, getopt

Parameter

name	type	restrictions	default	occurrence	description
bedFiles	file	absolute		1-	path to .bed or .narrowPeak files that contain called peaks
annoDb	string			1	name of the R genome annotation database (e.g .org.Hs.eg.db)
txdb	string			1	file or name of R library containing transcript-related features of a particular genome (e.g. TxDb.Hsapiens.UCSC.hg38.knownGene)
outputDir	file	absolute		1	path to an output folder in which the plots will be stored
promotorUpstream	integer		3000	*	size in bp used to define the promotor region upstream of the annotated TSS (transcription start site)
promotorDownstream	integer		3000	*	size in bp used to define the promotor region downstream of the annotated TSS (transcription start site)
resample	integer		1000	*	number of resample iterations for confidence interval estimation
conf	string		0.95	*	confidence interval to be estimated

Return values

name	type	description	minV	maxV
ChIPSeekerOutputFolder	string	path to the output folder containing the plots	0	0

Citation info

Pubmed references: 25765347,

DETest

by Michael Kluge - version 1

version {@VERSION_LINKS@}

performs differential gene expression tests based on count tables

Dependencies

GNU R
packages: edgeR, DESeq, DESeq2, limma, Biobase, RColorBrewer, gplots, getopt, genefilter, lattice

Parameter

name	type	restrictions	default	occurrence	description
controlCondition	string			1	name of the control condition
testCondition	string			1	name of the test condition
countFile	file	absolute		1	count file with features in rows and samples in columns
sampleAnnotation	file	absolute		1	annotation file with sample names in the first colum and sample condition in the second condition; (header: sample\tcondition)
featureAnnotation	file	absolute		*	annotation file which is joined with the count file
featureAnnotationID	string		FeatureID	*	name of the column used for joining
featureAnnotationType	string		type	*	name of the column in the annotation file for which a distribution plot is created
excludeSamples	string			0-null	names of samples that should be excluded from the analysis
pValueCutoff	double	[0-1]	0.01	*	p-Value cutoff for significant results
minKeepReads	integer	>=0	25	*	number of reads a feature must own in average per sample to pass filtering step before DE test is performed
foldchangeCutoff	integer		0.0,0.415,1.0	0-null	log2 foldchange cutoffs for which a own result file will be created; will be used for both directions (+/-)
foldchangeCutoffNames	string		significant,0.33-fold,2-fold	0-null	corresponding names to the foldchange cutoffs
foldchangeCutoff	double		1	*	log2 foldchange cutoffs the two-colored volcano plot; will be used for both directions (+/-)
downregColor	string		red	*	color for down-regulated genes in the two-colored volcano plot
upregColor	string		blue	*	color for down-regulated genes in the two-colored volcano plot
output	file	absolute		1	path to output folder
method	string		all	*	method that should be applied; one of: limma, DESeq, DESeq2, edgeR, all

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Differential gene expression analysis was performed using %method% (%SOFTWARE_VERSION%).

Pubmed references: 20979621, 19910308, 25605792, 25516281,

Links

http://bioconductor.org/packages/release/bioc/html/DESeq.html
http://bioconductor.org/packages/release/bioc/html/DESeq2.html
http://bioconductor.org/packages/release/bioc/html/limma.html
http://bioconductor.org/packages/release/bioc/html/DESeq.html

SEQUENCING

2019-08-08

DEXSeq

tests RNA-seq data for differential exon usage

Michael Kluge

DEXSeq

by Michael Kluge - version 1

version {@VERSION_LINKS@}

tests RNA-seq data for differential exon usage

Dependencies

GNU core utils
GNU R
packages: getopt, DEXSeq, GenomicFeatures, BiocParallel, GenomicRanges, GenomicFeatures

Parameter

name	type	restrictions	default	occurrence	description
controlCondition	string			1	name of the control condition
testCondition	string			1	name of the test condition
countFile	file	absolute		1	count file with features in rows and samples in columns
flattedGTFAnnotation	file	absolute		1	flatted GTF file which was used to create the count file; created by dexseq_prepare_annotation.py that comes with DEXSeq
sampleAnnotation	file	absolute		1	annotation file with sample names in the first colum and sample condition in the second condition (header: sample\tcondition)
featureAnnotation	file	absolute		*	annotation file which is joined with the count file
featureAnnotationID	string		Geneid	*	name of the column with is used for joining
featureAnnotationName	string		name	*	name of the column in the annotation file that contains the name of the feature
excludeSamples	string			0-null	names of samples that should be excluded from the analysis
pValueCutoff	double	[0,1]	0.01	*	p-Value cutoff for significant results
minKeepReads	integer	[1,]	25	*	number of reads a feature must own in average per sample to pass filtering step before DE test is performed
output	file	absolute		1	output folder
threads	integer	[1,]	1	*	number of threads to use for testing

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Differential exon usage was determined using DEXSeq (%SOFTWARE_VERSION%).

Pubmed references: 22722343,

DaPars

by Michael Kluge - version 1

version {@VERSION_LINKS@}

dynamic analysis of alternative polyadenylation from RNA-seq

Dependencies

DaPars
python
GNU core utilities

Parameter

name	type	restrictions	default	occurrence	description
controlCondition	string			1	name of the control condition
testCondition	string			1	name of the test condition
sampleAnnotation	file	absolute		1	annotation file with sample names in the first colum and sample condition in the second condition (header: sample\tcondition)
excludeSamples	string			1-	names of samples that should be excluded from the analysis
wigFolder	file			1	folder containing the wig files (format: folder/samplename.bedgraph)
wigEnding	string		bedgraph	*	ending of the wig files
annotated3UTR	file	absolute		1	path to annotated 3' UTR regions created with DaPars_Extract_Anno.py
outputFile	file	absolute		1	path to the output file
coverageCutoff	integer	[1,]	30	*	coverage threshold
FDRCutoff	double	[0,1]	0.01	*	FDR cutoff
PDUICutoff	double	[0,100]	0.5	*	degree of difference in APA usage in percent
FoldChangeCutoff	double		0.5	*	log2 foldchange cutoff between the two conditions
numberOfCondASamplesReachingCutoff	integer	[1,]		*	number of samples from condition A that must pass the coverage cutoff; default: all samples
numberOfCondBSamplesReachingCutoff	integer	[1,]		*	number of samples from condition B that must pass the coverage cutoff; default: all samples

Return values

name	type	description	minV	maxV
wiggleFile	string	path to the output file in WIG format	0	0

Citation info

DaPars (%SOFTWARE_VERSION%) was used to identify alternative polyadenylation.

Pubmed references: 25409906,

EnrichAnno

by Michael Kluge - version 1

version {@VERSION_LINKS@}

gene set enrichment analysis on GO and KEGG

Dependencies

GNU R
packages: getopt, clusterProfiler, pathview, KEGGREST

Parameter

name	type	restrictions	default	occurrence	description
backgroundFile	file	absolute		1	path to file with header, which contains a list of ENSEMBL or GENDCODE identifiers that should be used as backgroud
testFiles	string	absolute		1-	path to file(s) with header, which contain a list of ENSEMBL or GENDCODE identifiers that should be used for enrichment testing
orgDB	string			1	name of the organism database (orgDB) that should be used as GO annotation; if package is missing it is installed via biocLite
keggDBName	string			*	organism code for KEGG (e.g. mmu / hsa); http://www.genome.jp/kegg/catalog/org_list.html; if not supported by KEGGREST parameter will be ignored
pValueCutoff	double		0.01	*	p-Value cutoff for significant results
plotKegg	boolean		true	*	if enabled, plots are created for KEGG pathways
output	file	absolute		1	path to output basename; folder is created if not existent
suffix	string			*	suffix that is inserted before basename of output; if a absolute path basename is applied
foldchangeCol	string			*	name of the colum that contains the log2FC

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Afterwards gene set enrichment analysis was performed on gene sets defined by GO (%orgDB%) and KEGG (%keggDBName%) enrichment on up-/down-regulated genes using clusterProfiler (%SOFTWARE_VERSION%).

Pubmed references: 22455463,

GEM

by Michael Kluge - version 1

version {@VERSION_LINKS@}

identifies protein-DNA interaction at high resolution in ChIP-seq data

Dependencies

GEM
java
GNU core utilities

Parameter

name	type	restrictions	default	occurrence	description
jarPath	file	absolute		1	path to GEM jar file
expt	file	absolute		1	aligned read file
readDistribution	file	absolute		1	read spatial distribution file
gpsOnly	boolean		true	*	run in GPS only mode
k	integer		8	*	length of the k-mer for motif finding, use --k or (--kmin & --kmax); GEM parameter
kMin	integer		6	*	min value of k, e.g. 6; GEM parameter
kMax	integer		13	*	max value of k, e.g. 13; GEM parameter
seed	string			*	exact k-mer string to jump start k-mer set motif discovery; GEM parameter
genome	folder	absolute		*	the path to the genome sequence directory, for motif finding; GEM parameter
outputPrefix	file	absolute		1	output folder name and file name prefix
control	file	absolute		*	aligned reads file for control
chrSize	file	absolute		*	genome chrom.sizes file with chr name/length pairs
format	string		BED	*	read file format: BED/SAM/BOWTIE/ELAND/NOVO
sizeInBp	integer			*	size of mappable genome in bp (default is estimated from genome chrom sizes)
alphaValue	double			*	minimum alpha value for sparse prior (default is esitmated from the whole dataset coverage)
qValue	double		2	*	significance level for q-value, specify as -log10(q-value) (default=2, q-value=0.01)
threads	integer		#CPU	*	maximum number of threads to run GEM in paralell
kSeqs	integer		5000	*	number of binding events to use for motif discovery; GEM parameter
memoryPerThread	integer		2048	*	total memory per thread in MB if running on local host; otherwise memory limit of executor might be set
useFixedAlpha	boolean		false	*	use a fixed user-specified alpha value for all the regions
JASPAROutput	boolean		true	*	output motif PFM in JASPAR format; GEM parameter
MEMEOutput	boolean		true	*	output motif PFM in MEME format; GEM parameter
HOMEROutput	boolean		true	*	output motif PFM in HOMER format; GEM parameter
BEDOutput	boolean		true	*	output binding events in BED format for UCSC Genome Browser
NarrowPeakOutput	boolean		true	*	output binding events in ENCODE NarrowPeak format
workingDir	folder path	absolute	/usr/local/storage/	*	path to working directory

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

GEM (%SOFTWARE_VERSION%) was used to call peaks in the ChIP-seq data [Y. Guo, S. Mahony, D.K. Gifford, High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology, (2012) 8(8): e1002638].

Pubmed references: 22912568,

HISAT2

by Daniel Strobl - version 1

version {@VERSION_LINKS@}

Performs spliced RNA-seq read mapping using HISAT2.

Dependencies

HISAT2

Parameter

name	type	default	occurrence	description
unpaired	file		*	Files with unpaired reads. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
s	integer		*	skip the first <int> reads/pairs in the input (none)
u	integer		*	stop after first <int> reads/pairs (no limit)
trim5	string		*	trim <int> bases from 5'/left end of reads (0)
trim3	string		*	trim <int> bases from 3'/right end of reads (0)
nceil	string		*	func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)
pencansplice	integer		*	penalty for a canonical splice site (0)
pennoncansplice	integer		*	penalty for a non-canonical splice site (12)
pencanintronlen	string		*	penalty for long introns (G,-8,1) with canonical splice sites
pennoncanintronlen	string		*	penalty for long introns (G,-8,1) with noncanonical splice sites
minintronlen	integer		*	minimum intron length (20)
maxintronlen	integer		*	maximum intron length (500000)
knownsplicesiteinfile	file		*	provide a list of known splice sites
novelsplicesiteoutfile	file		*	report a list of splice sites
rnastrandness	string		*	Specify strand-specific information (unstranded)
ma	integer		*	match bonus (0 for --end-to-end, 2 for --local)
mp	string		*	max and min penalties for mismatch; lower qual = lower penalty <6,2>
sp	string		*	max and min penalties for soft-clipping; lower qual = lower penalty <2,1>
np	integer		*	penalty for non-A/C/G/Ts in read/ref (1)
rdg	string		*	read gap open, extend penalties (5,3)
rfg	string		*	reference gap open, extend penalties (5,3)
scoremin	string		*	min acceptable alignment score w/r/t read length (L,0.0,-0.2)
k	integer		*	report up to <int> alns per read; MAPQ not meaningful
a	integer		*	report all alignments; very slow, MAPQ not meaningful
un	file		*	write unpaired reads that didn't align to <path>
al	file		*	write unpaired reads that aligned at least once to <path>
unconc	file		*	write pairs that didn't align concordantly to <path>
alconc	file		*	write pairs that aligned concordantly at least once to <path>
metfile	file		*	send metrics to file at <path> (off)
met	integer		*	report internal counters & metrics every <int> secs (1)
rgid	string		*	set read group id, reflected in @RG line and RG:Z: opt field
rg	string		*	add <text> (\"lab:value\") to @RG line of SAM header.
offrate	integer		*	override offrate of index; must be >= index's offrate
threads	integer		*	number of alignment threads to launch (1)
seed	integer		*	seed for random number generator (0)
index	string		1	Index filename prefix (minus trailing .X.ht2)
paired1	file		*	Files with #1 mates, paired with files in <m2>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
paired2	file		*	Files with #2 mates, paired with files in <m1>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
output	file		1	File for SAM output
fastq	boolean	false	*	query input files are FASTQ .fq/.fastq (default)
qseq	boolean	false	*	query input files are in Illumina's qseq format
fasta	boolean	false	*	query input files are (multi-)FASTA .fa/.mfa
raw	boolean	false	*	query input files are raw one-sequence-per-line
c	boolean	false	*	paired1, paired2, unpaired are sequences themselves, not files
phred33	boolean	false	*	qualities are Phred+33 (default)
phred64	boolean	false	*	qualities are Phred+64
intquals	boolean	false	*	qualities encoded as space-delimited integers
ignorequals	boolean	false	*	treat all quality values as 30 on Phred scale (off)
nofw	boolean	false	*	do not align forward (original) version of read (off)
norc	boolean	false	*	do not align reverse-complement version of read (off)
novelsplicesiteinfile	boolean	false	*	provide a list of novel splice sites
notempsplicesite	boolean	false	*	disable the use of splice sites found
nosplicedalignment	boolean	false	*	disable spliced alignment
tmo	boolean	false	*	Reports only those alignments within known transcriptome
dta	boolean	false	*	Reports alignments tailored for transcript assemblers
dtacufflinks	boolean	false	*	Reports alignments tailored specifically for cufflinks
fr	boolean	false	*	-1, -2 mates align fw/rev
nomixed	boolean	false	*	suppress unpaired alignments for paired reads
nodiscordant	boolean	false	*	suppress discordant alignments for paired reads
t	boolean	false	*	print wall-clock time taken by search phases
quiet	boolean	false	*	print nothing to stderr except serious errors
metstderr	boolean	false	*	send metrics to stderr (off)
nohead	boolean	false	*	supppress header lines, i.e. lines starting with @
nosq	boolean	false	*	supppress @SQ header lines
omitsecseq	boolean	false	*	put '*' in SEQ and QUAL fields for secondary alignments.
reorder	boolean	false	*	force SAM output order to match order of input reads
mm	boolean	false	*	use memory-mapped I/O for index; many 'bowtie's can share
qcfilter	boolean	false	*	filter out reads that are bad according to QSEQ filter
nondeterministic	boolean	false	*	seed rand. gen. arbitrarily instead of using read attributes
removechrname	boolean	false	*	remove 'chr' from reference names in alignment
addchrname	boolean	false	*	add 'chr' to reference names in alignment
rf	boolean	false	*	-1, -2 mates align rev/fw
ff	boolean	false	*	-1, -2 mates align fw/fw

Return values

name	type	description	minV	maxV
SAMFile	string	output SAM file (= value for parameter output)	0	0

Citation info

Sequencing reads were mapped using HISAT2 (version (%SOFTWARE_VERSION%)) [Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015 Apr;12(4):357-60].

Pubmed references: 25751142,

SPades

by Florian Röckl - version 1

version {@VERSION_LINKS@}

assembles transcript sequences of a sample using RNA-seq reads.

Dependencies

python3

Parameter

name	type	default	occurrence	description
forward	file		1	Path to FastQ or FastQ.gz file containing the forward reads
reverse	file		1	Path to FastQ or FastQ.gz file containing the reverse reads
cons_path	file		1	Path to file containing consensus sequences (from svCaller)
outFolder	file		1	Path to output folder, where SPades stores all its resulting files.
memory	integer	40	*	[optional] RAM limit.
ignoreConsensusExistence	boolean	false	*	do not throw an error if file containing consensus sequences does not exist

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

SPades was used to assemble transcript sequences by using the forward and reverse RNA-seq reads of a sample.

Pubmed references: 32559359,

STARgenomeGenerate

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

Generation of genome indices for STAR

Dependencies

STAR

Parameter

name	type	default	occurrence	description
runThreadN	integer		*	[optional] int: number of threads to run STAR
genomeDir	file		1	string: path to the directory where genome files will be generated
genomeFastaFiles	file		1-	string(s): path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they cannot be zipped.
sjdbGTFfile	file		*	[optional] string: path to the GTF file with annotations
sjdbOverhang	integer	100	*	[optional] int&gt;0: length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)
sjdbGTFtagExonParentTranscript	string	transcript_id	*	[optional] string: GTF attribute name for parent transcript ID (default &quot;transcript_id&quot; works for GTF files)
sjdbFileChrStartEnd	file		0-null	[optional] string(s): path to the files with genomic coordinates (chr &lt;tab&gt; start &lt;tab&gt; end &lt;tab&gt; strand) for the splice junction introns.
genomeSAindexNbases	integer		*	[optional] int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).
genomeChrBinNbits	integer		*	[optional] int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

STAR indices were created for the XXX genom using STAR (%SOFTWARE_VERSION%).

Pubmed references: 23104886,

addSequence2Sam

by Michael Kluge - version 1

version {@VERSION_LINKS@}

sequences (and qualities) of FASTQ files can be added to SAM files

Dependencies

perl

Parameter

name	type	restrictions	default	occurrence	description
sam	file path	absolute		1	path to the SAM file
fastq	file path	absolute		1-	path to the FASTQ file(s)
output	file path	absolute		1	path to the output SAM file in which the sequences are added
unmapped	file path	absolute		*	path to a FASTQ file in which the unmapped sequences will be written to; exclusive with --preread flag
noquality	boolean		false	*	does not add the read quality values
update	boolean		false	*	overrides already existing output files
preread	boolean		false	*	does only index reads stored in the FASTQ file that are part of the SAM file; exclusive with --unmapped parameter

Return values

name	type	description	minV	maxV
SAMFileWithSequences	string	absolute path to the SAM file with added sequences and, if enabled, qualities	0	0
UnmappedReadFile	string	absolute path to file containing all unmapped reads in FASTQ format	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

amss

by Elena Weiß - version 1

version {@VERSION_LINKS@}

computes AMSS per input window

Dependencies

sharedUtils

Parameter

name	type	occurrence	description
inputregs	file	1	file with specified genomic regions to analyze
bams	file	1	path to bam files
pattern	string	1	pattern to grep for bam files
strandness	string	1	strandness of experiment
out	file	1	output directory
sampleAnnotation	file	1	file specifying two conditions
pseudocount	integer	*	pseudocount to subtract from counts
numrandomizations	integer	*	number of randomizations
everyPos	string	*	every position of read is counted

Return values

name	type	description	minV	maxV
out	string	output directory to computed AMSS	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

assemblyAnalyzer

by Florian Roeckl - version 1

version {@VERSION_LINKS@}

determines the best match(es) of insertion consensus sequences and sequences of a sequence assembly to extract the insertion sequences.

Dependencies

python3

Parameter

name	type	default	occurrence	description
fasta	file		1	Fasta file generated by SPAdes assembler.
sam	file		1	SAM file containing consensus sequences mapped to fasta file.
out	file		1	Output fasta file containing detected insertion sequences
maxSize	integer	8000	*	[optional] Maximum length for an insertion. Detected insertions with greater length are discarded.
ignoreFastaExistence	boolean	false	*	Do not throw an error if fasta file does not exist

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

assemblyAnalyzer was used to extract the sequence of insertions that were previously predicted with the SV caller. It therefore identified the best pair(s) of consensus sequences and assembled sequences.

Pubmed references:

bam2wiggle

by Michael Kluge - version 1

version {@VERSION_LINKS@}

converts BAM files to WIG files

Dependencies

bedtools
GNU core utilities

Parameter

name	type	restrictions	occurrence	description
bam	file	absolute	1	path to the position-based-sorted BAM file
output	file	absolute	1	path to BEDGRAPH file
contigSizes	file	absolute	*	file containing the sizes of the contigs used in the BAM file if ranges should be extended (format: <chrName><TAB><SIZE>)

Return values

name	type	description	minV	maxV
wiggleFile	string	absolute path to the converted output file	0	0

Citation info

Bedtools (%SOFTWARE_VERSION%) was used to convert BAM files to WIG files.

Pubmed references: 20110278,

bamContigtDistribution

by Michael Kluge - version 1

version {@VERSION_LINKS@}

creates plots based on statistics of BAM files

Dependencies

GNU R
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
bamMergedStats	file	absolute		1	path to the merged bam stats file	0	0
outputFile	file	absolute		1	path to a output pdf file	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

bamToBed

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

converts bam format into bed format

Dependencies

python3
bedtools (version 2.x)

Parameter

name	type	restrictions	default	occurrence	description
inBam	string	valid file path, bam format		-1-	Path to the bam file that will be converted into bed format. An index of the bam file is not required.
outBed	string			-1-	Path for saving the resulting bed file.
bedtoolsPath	string	valid file path to executable	bedtools	-1-	Path to the bedtools executable. Per default, it is assumed that bedtools is in the PATH variable.
split	boolean		true	-1-	Defines how split alignments (cigar string that contains N) are handled. If true, the skipped region is not included in the bed regions. If false, the skipped region is included in the bed region, i.e. there is only one interval from alignment start to alignment end.

Return values

name	type	description	minV	maxV
bedFile	string	path to the bed file that is created (same value as outBed parameter)	0	0

Citation info

Bed files were created from the bam files using bedtools bamtobed.

Pubmed references: 20110278,

bamToBigWig

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

converts bam format into bigwig format

Dependencies

python3
deeptools >=2.0

Parameter

name	type	restrictions	default	occurrence	description
inBam	string	valid file path, file ending .bam, indexed		-1-	Path to the bam file that will be converted into bigWig format. The bam file has to be indexed.
outBw	string			-1-	Path for saving the resulting bigWig file.
bamCoveragePath	string	valid file path	bamCoverage	-1-	Path to the executable bamCoverage which is part of deepTools. Per default, it is assumed that bamCoverage is in the PATH variable.
binSize	integer	positive, not zero	1	-1-	Resolution of the bigWig file. Increasing the binSize causes loss of information but decreases the size of the bigWig file. Highest resolution (at single basepair level) is achieved for binSize=1 (default).
numberOfProcessors	integer	positive, not zero	1	-1-	Number of processors to use (parallelization)

Return values

name	type	description	minV	maxV
bigWigFile	string	Path to the output bigWig file	0	0

Citation info

BigWig files were created from the bam files using the tool bamCoverage from the deepTools tool suite.

Pubmed references: 27079975,

bamstats

by Michael Kluge - version 1

version {@VERSION_LINKS@}

creates various statistics on BAM files using RSeQC and samtools, which can be used for quality assessment

Dependencies

python3
rseqc
samtools
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
bam	file path	absolute		1-	path to one or more BAM file(s)
outdir	folder path	absolute		1	path to the output folder; individual files will be stored in a sub-folder (using the basename of the BAM file as folder name)
readLength	integer			1	maximal length of the reads
sampleDepth	integer		100000	*	number of reads which are used for sampling
annotation	file path	absolute		*	gene annotation in BED format
geneBodyAnnotation	file path	absolute		*	genes that are used to calculate the gene body coverage; should contain house keeping genes
idxstats	boolean		true	*	enables calculation of number of reads mapped on each chromosome
flagstat	boolean		true	*	enables calculation of flags of mapped reads
count	boolean		false	*	enables calculation of raw and rpkm count table for exons, introns and mRNAs
saturation	boolean		true	*	enables down-sampling of the mapped reads to infer the sequencing depth
statistics	boolean		true	*	calculates reads mapping statistics
clipping	boolean		true	*	enables clipping statistic of the mapped reads
insertion	boolean		true	*	enables insertion statistic of the mapped reads
deletion	boolean		true	*	enables deletion statistic of the mapped reads
inferExperiment	boolean		true	*	tries to infer if the sequencing was strand specific or not
junctionAnnotation	boolean		true	*	enables checking of how many of the splice junctions are novel or annotated
junctionSaturation	boolean		true	*	enables down-sampling of the spliced reads to infer if sequencing depth is enough for splicing analyses
distribution	boolean		true	*	calculates how mapped reads are distributed among different genomic features
duplication	boolean		true	*	calculates sequence duplication levels
gc	boolean		true	*	calculates GC-content of the mapped reads
nvc	boolean		true	*	checks if a nucleotide composition bias exist
insertSize	boolean		true	*	calculates the insert size between two paired RNA reads
fragmentSize	boolean		true	*	calculates the fragment size for each transcript
tin	boolean		true	*	calculates the transcript integrity number which is similar to the RNA integrity number
paired	boolean		false	*	must be set if paired-end data is analyzed
stranded	boolean		false	*	must be set if strand-specific data is analyzed
disableAllDefault	boolean		false	*	disables all options which are not explicitly activated

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Quality of the resulting mappings was assessed using RSeQC [Liguo Wang, Shengqin Wang, Wei Li; RSeQC: quality control of RNA-seq experiments, Bioinformatics, Volume 28, Issue 16, 15 August 2012, Pages 2184–2185].

Pubmed references: 22743226,

bcftoolsVariantCalling

by Florian Roeckl - version 1

version {@VERSION_LINKS@}

calls SNPs and small indels using the variant caller bcftools.

Dependencies

bcftools

Parameter

name	type	default	occurrence	description
reference	file		1	Path to the file containing the reference genome.
bamfile	file		1	Path to the input bam file.
vcf	file		1	Path of the output vcf file.
maxdepth	integer	100000	*	Maximum number of reads per position.

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

bcftoolsVariantCalling was used to call in particular SNPs with bcftools.

Pubmed references: 21903627,

bedgraphReplicateMerger

by Michael Kluge - version 1

version {@VERSION_LINKS@}

combines expression of biological or technical replicates; all replicates are scaled to the same number of reads and averaged afterwards

Dependencies

python3

Parameter

name	type	restrictions	default	occurrence	description
bedgraphFiles	file path	absolute		2-	path to sorted BEDGRAPH files; at least two files must be given; all files must contain the same chromosomes in the same order
outputFile	file path	absolute		1	path to the output file
mergedIdxstatsFile	file path	absolute		1	path to a tab-separated file that contains the output generated by samtools idxstats for all samples (columns 1-4) and in the 5th column the sample name; used columns: 1 -> chr name; 3 -> number of mapped reads; 5 -> name of the sample
notSkipHead	boolean		false	*	disables the skipping of the first line of the idxstats file (--mergedIdxstatsFile); default: first line is skipped
numberOfDigits	integer		5	*	number of decimal places to round the calculated values
normByReadCount	integer		1000000	*	number of reads to which each replicate is normed (based on the idxstats output) before values are averaged

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

bedgraphShrinker

by Michael Kluge - version 1

version {@VERSION_LINKS@}

shrinks regions with the same score to one region in a bedgraph file or expands the file to a region size of one basepair

Dependencies

python3

Parameter

name	type	restrictions	default	occurrence	description
bedgraphFile	file	absolute		1	path to a sorted, not overlapping BEDGRAPH file
outputFile	file	absolute		1	path to the output file
genomeSize	file	absolute		*	path to file containing the size of the contigs
expand	boolean		false	*	expand the ranges instead of shrinking them
addZeroRanges	boolean		false	*	adds ranges that are missing with a zero value
omitZeroRanges	boolean		false	*	suppress the output of ranges with a zero value

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

binGenome

by Michael Kluge - version 1

version {@VERSION_LINKS@}

partitions regions into a fixed number of bins and calculates coverage in that bin

Dependencies

java

Parameter

name	type	restrictions	default	occurrence	description
bedgraph	file	absolute		0-null	bedgraph or bigwig file(s)
bedgraphPos	file	absolute		0-null	bedgraph or bigwig file(s) for positive strand
bedgraphNeg	file	absolute		0-null	bedgraph or bigwig file(s) for negative strand
annotation	file	absolute		1-	region annotation file(s); (see writeGRangesToBed() in R/binGenome.lib.R for format info)
bedgraphNames	string			0-null	sample names for generation of output filenames
annotationNames	string			0-null	annotation names for generation of output filenames
bins	integer	>0		0-null	number of bins to partition each region
quantiles	integer	[0-100]		0-null	determines the position at which expression exceeds specific quantiles in percent
outputDir	file	absolute		1	path to output folder; files will be named automatically based on the used parameters
cores	string	>0	1	*	number of cores to use in parallel
normalize	boolean		true	0-null	write in addition a per-gene normalized version of the data
fixedBinSizeUpstream	string			*	creates bins with a fixed size upstream of the region; format: 'binsize:binnumber'
fixedBinSizeDownstream	string			*	creates bins with a fixed size downstream of the region; format: 'binsize:binnumber'
tmpDir	string			*	path to tmp folder

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Each region was binned into a fixed number of bins (x/x/x), and average coverage for each bin was calculated for each transcript in each sample.

Pubmed references:

bowtie2Docker

by Michael Kluge - version 1

version {@VERSION_LINKS@}

technical demo which shows how docker containers can be used in combination with Watchdog; basic bowtie mapper

Dependencies

docker

Parameter

name	type	restrictions	occurrence	description
genome	file path	absolute	1	path to indexed reference genome (withouth trailing .X.bt2 ending)
reads	file path	absolute	*	path to reads in FASTQ format; for mapping of paired-end data, two files are required
outfile	file path	absolute	1	path to output file, which is written in SAM format; a log file with .log suffix will also be written

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

bwaAln

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

maps reads with bwa aln

Dependencies

python3
bwa

Parameter

name	type	restrictions	default	occurrence	description
inReads	file	file exists, fastq format, ending .fq or .fastq		1	fastq file with the sequenced reads
bwaIndex	string			1	Common prefix of bwa index files for the reference genome
outSai	string			1	file for writing mapped reads in bwa format
bwaPath	file	file exists, bwa executable	bwa	*	path to BWA executable (default: use executable from PATH)
threads	integer	>0	1	*	number of threads to use for bwa aln (-t option of bwa)
stopIfMoreThanBestHits	integer	>0		*	stop searching when there are more than that many best hits (default: use bwa default)

Return values

name	type	description	minV	maxV
bwaSaiFile	string	*.sai file created by the module (same value as given by the parameter outSai)	0	0

Citation info

We mapped the reads with bwa aln.

Pubmed references: 19451168,

bwaIndex

by Florian Röckl - version 1

version {@VERSION_LINKS@}

generates an index for a fasta file.

Dependencies

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
fasta	file			1	Path to the fasta file, which is going to be indexed. Index files will appear in the same folder!	0	0
ignoreFastaExistence	boolean		false	*	do not throw an error if fasta file does not exist	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

bwaIndex was used to index a fasta file, so that it can be subsequently utilized for searching sequences in it.

Pubmed references: 19451168,

bwaSampe

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

creates a sam file with bwa sampe from mappings of bwa aln for paired reads

Dependencies

python3
bwa

Parameter

name	type	restrictions	default	occurrence	description
inReads1	file	file exists, fastq format, ending .fq or .fastq		1	uncompressed fastq (.fq, .fastq) file with the sequenced reads
inReads2	file	file exists, fastq format, ending .fq or .fastq		1	uncompressed fastq (.fq, .fastq) file with the sequenced reads (mates)
inSai1	file	file exists		1	output of bwa aln for the file given by inReads1
inSai2	file	file exists		1	output of bwa aln for the file given by inReads2
bwaIndex	string			1	Common prefix of bwa index files for the reference genome
outSam	string			1	file for writing mapped reads in sam format
bwaPath	file	file exists, bwa executable	bwa	*	path to BWA executable (default: use executable from PATH)
indexInRam	boolean		False	*	option to load complete index into main memory (default: false)

Return values

name	type	description	minV	maxV
bwaPairedSamFile	string	sam file created by the module (same value as given by the parameter outSam)	0	0

Citation info

We created a sam file with bwa sampe.

Pubmed references: 19451168,

calcDOCRs

by Katharina Reinisch - version 1

version {@VERSION_LINKS@}

Calculates dOCR lengths for genes from open chromatin regions in BED format.

Dependencies

java
picard (jar included with module)
Apache Commons CLI library (jar included with module)

Parameter

name	type	default	occurrence	description
input	file		1	input file (in BED format)
name	string		1	sample name used for output files
output	string		1	output directory
annotation	file		1	genome annotation file (in GTF format)
d1	integer	10000	*	[optional] maximum distance of OCR to gene end for this OCR to be added to this gene in the first step
d2	integer	5000	*	[optional] maximum distance of OCR to last added OCR for a gene for this OCR to be added in the second step
gene	string		*	[optional] get total length of OCRs within gene (in_gene_length) and fraction of gene body covered by OCRs, default false

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

dOCR lengths were calculated as previously described in Hennig T et al, 2028, PLOS Pathogens 14(3): e1006954

Pubmed references: 29579120,

calcDownsampleRate

by Katharina Reinisch - version 1

version {@VERSION_LINKS@}

Calculates the downsampling rate for each sample, such all samples will have approximately the same number of reads after downsampling with this rate.

Dependencies

Python 3

Parameter

name	type	occurrence	description
idxstats	file	1	idxstats file
exclude	string	*	[optional] chromosomes to be excluded, comma separated
samples	string	*	[optional] samples to be used, comma separated
output	string	1	output table file

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Downsampling rates were determined such that all included samples will have approximately the same number of reads after downsampling with this rate.

Pubmed references:

checksum

by Michael Kluge - version 1

version {@VERSION_LINKS@}

creates a md5 checksum of a file or verifies file integrity based on a md5 checksum using md5sum

Dependencies

GNU md5sum
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
input	file path	absolute		1	absolute path to file for which a checksum should be calculated or which should be verified
oldChecksumName	file path	absolute		*	absolute path to a (non-existent) file used to identify the correct checksum line for cases in which the file was renamed or moved after checksum creation; can only be used in verify mode
checksum	file path	absolute	.checksum.md5	*	absolute path to the checksum file; by default '.checksum.md5' located in the same directory as the input file
verify	boolean		false	*	flag to verify integrity of a file based on the checksum file
update	boolean		false	*	flag to update an already existing checksum in the checksum file
absolutePath	boolean		false	*	flag to store an absolute path in the checksum file instead of a relative one
ignorePath	boolean		false	*	flag to use only the name of the file for identification of the corresponding checksum line (ignores the location of the file); can only be used in verify mode

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

circCombination

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

combines the predictions of circularRNAs made with the modules for CIRI2 and circRNA_finder.

Dependencies

Python3

Parameter

name	type	restrictions	default	occurrence	description
inCircs1	file	file exists		1	First prediction file with circRNAs and junction reads (tab-separated, 5 columns: chromosome, start, end, strand, list of reads)
inCircs2	file	file exists		1	Second prediction file with circRNAs and junction reads (tab-separated, 5 columns: chromosome, start, end, strand, list of reads)
outUnion	file			1	Output path for the union of the predictions (coordinates and reads)
outIntersection	file			1	Output path for the intersection of the predictions (coordinates and reads
outIntersectedUnion	file			1	Output path for the intersected union of the predictions (intersection of coordinates, union of reads)
minReads	integer	>=1	2	*	Minimum number of predicted junction reads required for writing a circRNA into the output files. The cutoff is applied independently to the intersection, union and intersected union of the predictions.

Return values

name	type	description
circUnion	file	Output path for the union of the predictions (same as input parameter outUnion)
circIntersection	file	Output path for the intersection of the predictions (same as input parameter outIntersection)
circIntersectedUnion	file	Output path for the intersected union of the predictions (same as input parameter outIntersectedUnion)

Citation info

Predictions of circular RNAs were combined by forming the union/intersection of the individual predictions. The circular reads were combined by forming the union/intersection of the predictions.

Pubmed references:

circRNAfinder

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

runs circRNA_finder to detect circular RNAs in single-end or paired-end sequencing data.

Dependencies

python3
perl
awk
samtools

Parameter

name	type	restrictions	default	occurrence	description
inReads1	file	file exists, fastq format		*	path to single-end fastq file or path to first fastq file with paired reads
inReads2	file	file exists, fastq format		*	path to second fastq file with paired reads (paired-end data only)
strandedLibrary	integer	allowed values: 0,1,2	0	*	indicates if the library is strand specific, 0 = unstranded/unknown, 1 = stranded (first read), 2 = stranded (second read), (default: 0),if the library type is unstranded/unknown the strand is guessed from the strand of the AG-GT splice site
reference	file	file exists, fasta format		*	path to (multi-)fasta file with the reference genome (not required if STAR index or a STAR results is provided)
inSTAR	string			*	output prefix of a STAR mapping that was created with STAR run with chimeric segment detection
outPrefix	string			1	path and file name prefix for all files produced by this module; the final file is named out/prefixcfCirc.txt
outCirc	string			*	final output of predicted CircRNAs (can be used to save the final prediction in a different place than given in outPrefix)
starPath	file		STAR	*	specify a path to the STAR executable if STAR is not part of your PATH variable
starIndex	file			*	STAR index for the reference genome, if no index is provided it is automatically created by the module using the file given by --reference
starThreads	integer	>=1	1	*	number of threads to use with STAR
cfPath	file		postProcessStarAlignment.pl	*	path to circRNA_finder perl script postProcessStarAlignment.pl

Return values

name	type	description	minV	maxV
cfCircs	string	path to file with predicted circRNAs, it corresponds to the value of the parameter outCirc if it is set, otherwise the file path is derived from outPrefix	0	0

Citation info

We predicted circular RNAs using circRNA_finder.

Pubmed references: 25544350,

ciri2

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

runs CIRI2 to detect circular RNAs in single-end or paired-end sequencing data.

Dependencies

python3
perl

Parameter

name	type	restrictions	default	occurrence	description
inReads1	file	file exists, fastq format		*	path to first fastq file with reads (for single-end or paired-end data first reads)
inReads2	file	file exists, fastq format		*	path to second fastq file with reads (for paired-end data second reads only)
inSAM	file	file exists, SAM format		*	path to SAM file that was created with BWA Mem (can be used as input instead of fastq files)
reference	file	file exists, fasta format		1	path to (multi-)fasta file with the reference genome
outPrefix	string			1	path and file name prefix for all files produced by this module, the final file is named out/prefixciriCirc.txt
outCirc	string			*	final output of predicted CircRNAs (can be used to save the final prediction in a different place than given in outPrefix)
bwaPath	file		bwa	*	specify a path to the BWA executable if bwa is not part of your PATH variable
bwaThreads	integer	>=1	1	*	number of threads to use with BWA, default:1
bwaIndex	string	valid bwa index		*	BWA index for the reference genome provided by the --reference option, if no index is provided it is automatically created by the module
bwaSeedSize	integer	>=1	19	*	BWA -k parameter for the minimum seed length
bwaScoreThreshold	integer	>=1	30	*	BWA -T parameter for the minimum alignment score; default is 30, but 19 recommended for CIRI2
ciriPath	file		CIRI2.pl	*	path to CIRI2 perl script
ciriThreads	integer	>=1	1	*	number of threads to use for CIRI2
ciriAnnotation	file	file exists, GTF format		*	GTF file with gene annotations for the genome given in the --reference option, if a GTF file is passed to this module, CIRI annotates all circRNAs with the corresponding gene
ciriStringency	string	3 allowed values: high, medium or low	high	*	Controls how stringent CIRI2 filters the circRNAs based on circular reads, cigar strings and false positive reads
ciriKeepTmpFiles	boolean		False	*	if this flag is set, CIRI2 does not delete the temporary files at the end

Return values

name	type	description	minV	maxV
ciriCircs	string	path to file with predicted circRNAs, it corresponds to the value of the parameter outCirc if it is set, otherwise the file path is derived from outPrefix	0	0

Citation info

We predicted circular RNAs using CIRI2.

Pubmed references: 28334140,

classifyPeaks

by Elena Weiß - version 1

version {@VERSION_LINKS@}

classifies peaks

Dependencies

DEPENDENCY [0-]

Parameter

name	type	occurrence	description
outdir	file	1	output directory
genelist	file	1	list of genes to consider
coveragefiles	file	1	path to coverage files
exp	string	1	type of experiment

Return values

name	type	description	minV	maxV
classifyPeaksOutputFolder	string	output directory	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

clustering

by Elena Weiß - version 1

version {@VERSION_LINKS@}

clusters coverage files and creates heatmap

Dependencies

binGenome
sharedUtils

Parameter

name	type	occurrence	description
bedgraphTable	file	1	path to bedgraph table
cluster	integer	1	number of clusters
factor	string	0-null	factor to consider
coverageFiles	file	1	path to coveragefiles
bedname	string	1	name of bed file
aggregateFUN	string	1	function to aggregate
normShapeSum	boolean	1	how to norm shape
normLibSize	boolean	1	how to norm lib isze
normBinLength	boolean	1	how to norm bin length
bins	integer	1	number of bins
cpm	file	1	path to cpm file
plotname	string	*	name of plot

Return values

name	type	description
coverageFiles	string	path to coveragefiles
bedname	string	name of bed file
clusterfiles	string	path to cluster files

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

concatenateFiles

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

concatenates 2 or more files.

Dependencies

Python3

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
inFile	file	file exists		*	input files given in the order of concatenation, files with ending .gz are interpreted as compressed files and are extracted	0	0
outFile	file			1	path to save the concatenated files	0	0

Return values

name	type	description	minV	maxV
concatenatedFile	string	path of the concatenated file, this is the same value as given by the parameter outFile	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

consistentSNPs

by Florian Röckl - version 1

version {@VERSION_LINKS@}

identifies consistent SNPs of a sample, so SNPs that were called by both bcftools and Varscan in all replicates of the sample.

Dependencies

python3

Parameter

name	type	occurrence	description
bcftool_rep	string	1	The .vcf-file, created by bcftools, for all replicates of the same sample. If you have multiple replicates, comma-separate them: bcftools_rep1.vcf,bcftools_rep2.vcf etc.
varscan_rep	string	1	The .vcf-file, created by varscan, for all replicates of the same sample. If you have multiple replicates, comma-separate them: varscan_rep1.vcf,varscan_rep2.vcf etc.
output	file	1	Path to your desired output file.

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

consistentSNPs module was used to identify all SNPs from a sample that were called by both bcftools and Varscan in all replicates of the sample. These are the consistent SNPs of the sample.

Pubmed references:

contextMap

by Michael Kluge - version 1

version {@VERSION_LINKS@}

context-based RNA-seq read mapping

Dependencies

java
bwa / bowtie / bowtie2
GNU core utilities

Parameter

name	type	restrictions	default	occurrence	description
jarPath	file	absolute		*	path to ContextMap jar file; if not given internal version will be used
reads	file	absolute		*	path to reads in fasta or fastq format
alignerName	enum			1	name of short-read alignment tool; supported values: 'bwa', 'bowtie1' or 'bowtie2'
alignerBin	file	absolute		*	path to the executable of the chosen aligner tool
indexerBin	file	absolute		*	path to the executable of the aligner's indexing tool (not needed for BWA)
indices	file	absolute		1-	comma separated list of paths to basenames of indices, which can be used by the chosen aligner
genome	file	absolute		1	path to a directory with genome sequences in fasta format (each chromosome in a separate file)
output	file	absolute		1	path to the output directory
skipsplit	string			*	comma separated list of booleans, each element refers to a given aligner index (same ordering); 'true' for no split detection, 'false' otherwise (req. in mining mode).
skipmultisplit	string			*	comma separated list of booleans, each element refers to a given aligner index (same ordering); 'true' for no multisplit detection, 'false' otherwise (req. in mining mode).
speciesindex	string			*	path to a directory containing index files created with the 'indexer' tool (req. in mining mode)
alignerTmp	string			*	path to a directory for temporary alignment files
seed	integer	>0		*	seed length for the alignment (default: Bwt1: 30, BWA/Bwt2: 20)
splitseedsizes	integer	>0	15	*	seed size for the split search seed (default: 15)
mismatches	integer	>=0	4	*	allowed mismatches in the whole read
seedmismatches	integer	>=0		*	allowed mismatches in the seed region (default: Bwt1: 1, BWA/Bwt2: 0)
splitseedmismatches	integer	>=0	0	*	allowed mismatches for the split seed search (default: 0)
mmdiff	integer	>=1	0	*	maximum allowed mismatch difference between the best and second best alignment of the same read
maxhits	integer	>=1		*	maximum number of candidate alignments per read; reads with more hits are skipped (bwa/bwt1) or the already found hits are reported (bwt2) (default for bwa/bwt1:10, bwt2: 3)
minsize	integer	>=1	10	*	minimum number of reads a genomic region has to contain for being regarded as a local context
maxindelsize	integer	>=0	10	*	maximum allowed size of insertions or deletions (default: 10)
gtf	file	absolute		*	path to an annotation file in gtf format
threads	string			*	number of threads used for mapping
localTmpFolder	folder	absolute	/usr/local/storage/	*	path to a local storage that is used for temporary data
mining	boolean		false	*	enables the mining for infections or contaminations
noclipping	boolean		false	*	disables the calculation of clipped alignments
noncanonicaljunctions	boolean		false	*	enables the prediction of non-canonical splice sites
strandspecific	boolean		false	*	enables strand specific mapping
pairedend	boolean		false	*	enables mapping of paired-end reads; nomenclature for mates from the same fragment: base_name/1 and base_name/2, respectively; only valid for versions smaller than 2.7.2
polyA	boolean		false	*	enables the search for polyA-tails (mutually exclusive with --noclipping)
verbose	boolean		false	*	verbose mode
keeptmp	boolean		false	*	does not delete some temporary files
sequenceDB	boolean		false	*	sequence mapping to disk; recommended for very large data sets.
memoryScaleFactor	integer	[0,100]	75	*	scale factor in percent that defines the proportion of the memory that is used for java; default memory: 3GBthreads(scaleFactor/100)
memoryPerThread	integer		3072	*	total memory per thread in MB if running on local host; otherwise memory limit of Watchdog executor might be set; default: 3072

Return values

name	type	description	minV	maxV
contextMapSAMFile	string	path to mapped SAM file	0	0
contextMapPolyAFile	string	path to detected polyA tails	0	0

Citation info

RNA-seq reads were mapped against the XXX genome using ContextMap (%SOFTWARE_VERSION%) with BWA as short read aligner and default parameters.

Pubmed references: 25928589,

copyFile

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

copies a given file to a new location.

Dependencies

Python3

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
sourcePath	file	file exists		1	path of the file to copy	0	0
targetPath	file			1	path of the new location of the file, all non-existing parent directories of the file are created	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

createBEDandSAF

by Elena Weiß - version 1

version {@VERSION_LINKS@}

creates bed and saf files given a tss file

Dependencies

java

Parameter

name	type	occurrence	description
gtf	file	1	path to gtf file
tss	file	1	path to tss file
outdir	file	1	path to output dir
name	string	1	name
info	boolean	*	if info should be written
bed	boolean	*	if bed file should be written
saf	boolean	*	if saf file should be written
bedwindow	boolean	*	if bedwindow should be written for scaled metagenes
antisense	boolean	*	if experiment is antisense
filterDist	integer	*	if distance to annotated tss should be limited
noMapping	boolean	*	if mapping should be avoided
minDist	boolean	*	minimum distance
genelist	file	*	list of genes

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

createFolder

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

creates a folder and its parent directories.

Dependencies

Python3

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
folderPath	folder			1	folder that will be created	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

cutadapt

by Michael Kluge - version 1

version {@VERSION_LINKS@}

sequence adapters can be removed and sequences can be trimmed based on length or base-call quality scores

Dependencies

cutadapt
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
fastq	file path	absolute		1	path to one FASTQ file
prim3	string			*	adapter that was ligated at the 3' end; '$' at the end will cause that the adapter is anchored at the end of the read
prim5	string			*	adapter that was ligated at the 5' end; '^' at the start will cause that the adapter is anchored at the beginning of the read
adapter	string			*	adapter that can be located at the 3' and 5' end
errorRate	double	[0, 1]	0.05	*	maximum allowed error rate
repeat	integer	[1, 100]	1	*	try to remove adapters at most N times
minOverlap	integer	>0	6	*	minimum overlap length
minLength	integer	[1, 100000]	40	*	minimum read length after trimming
maxLength	integer	[1, 100000]	-1	*	maximum read length after trimming
outfile	file path	absolute		1	path to an output file
infofile	file path	absolute		*	path to a file which will contain trimming statistics
shortenReads	integer		0	*	shorten reads to a maximal length after trimming; positive values keep the beginning of reads; negative ones the ends (starting from cutadapt version 1.17)
cutFixedLength	integer	[-1000000, 1000000]	0	*	trimmes a fixed length from the beginning (positive numbers) or the end of the reads (negative numbers)
qualityCutoff	double		0	*	trimmes reads at the ends using a sliding window approach
qualityBase	integer		33	*	base quality value
noIndels	boolean		false	*	does not allow indels between read and adapter
discardTrimmed	boolean		false	*	discard sequences which were trimmed
discardUntrimmed	boolean		false	*	discard sequences which were not trimmed
maskAdapters	boolean		false	*	does not cut the adapters but replace the corresponding regions with N

Return values

name	type	description	minV	maxV
cutadaptTrimFile	string	absolute path to the trimmed output file	0	0
cutadaptInfoFile	string	absolute path to a file containing statistical values	0	0

Citation info

Cutadapt (%SOFTWARE_VERSION%) was used to remove adapters and trim sequences [Martin, Marcel. "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet.journal [Online], 17.1 (2011): pp. 10-12. Web. 14 Mar. 2019].

Pubmed references:

deleteFolder

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

deletes a folder and all its content.

Dependencies

Python3

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
folder	folder	path to existing folder		1	path to the folder that will be deleted	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

downsampleSam

by Katharina Reinisch - version 1

version {@VERSION_LINKS@}

Performs downsampling of reads

Dependencies

Java
Picard

Parameter

name	type	occurrence	description
input	file	1	input file
probability	double	1	probability of keeping a read (pair)
output	string	1	output file
pathToPicard	string	1	path to picard jar-file

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Downsampling of reads was performed with the DownsampleSam command line tool of the Picard library.

Pubmed references:

env

by Michael Kluge - version 1

version {@VERSION_LINKS@}

prints the currently set environment variables to the standard output stream

Dependencies

GNU env

Parameter

{@PARAMETER@}

name	type	restrictions	default	occurrence	description	minV	maxV

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

extractClippedReads

by Florian Röckl - version 1

version {@VERSION_LINKS@}

extracts clipped reads from a BAM file into a new BAM file.

Dependencies

samtools
awk

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
bam	file			1	Path to bam file, where clipped reads should be extracted from.	0	0
out	file			1	Path to output bam file, were only the extracted clipped reads are stored.	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

extractClippedReads module was used to extract clipped reads, including soft- and hard-clippings, from the BAM file of a sample and pipe them into a new BAM file.

Pubmed references:

fastQC

by Michael Kluge - version 1

version {@VERSION_LINKS@}

generates quality reports for sequencing data using fastQC

Dependencies

fastQC (tested with 0.11.3)
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
contaminants	file path	absolute		*	absolute path to a file containing non-default contaminants to screen for overrepresented sequences; format: name[TAB]sequence
adapters	file path	absolute		*	absolute path to a file containing non-default adapters to screen against the library; format: name[TAB]sequence
threads	integer	[1,128]	1	*	number of threads to use; each will consume about 256 megabyte of memory
fastq	file path	absolute		1	absolute path to fastq file which should be analyzed
limits	file path	absolute		*	absolute path to a file containing non-default limits for warnings/errors; must be in the same format as the limits.txt shipped with fastQC
outdir	folder path	absolute		1	absolute path to output folder

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Quality of the sequencing data was checked using FastQC (%SOFTWARE_VERSION%) [Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc].

Pubmed references:

Links

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

sequencing

2019-10-16

fastqDump

Downloads fastq files from the NCBI Sequence Read Archive (SRA) using the SRA toolkit. First performs prefetch and then fastq-dump. Can optionally use Aspera client ascp for much faster download (Aspera client should be installed).

Caroline Friedel

fastqDump

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

Dependencies

SRA toolkit

Parameter

name	type	default	occurrence	description
sraId	string		1	SRA id
outputFolder	file		1	folder to which fastq files should be extracted
pathToAspera	file		*	[optional] path to Aspera client to use Aspera to speedup download
checkPresent	boolean	false	*	[optional] check if files already present in output folder and download previously successful. Tests if output fastq files exist, the log file from a previous download is present, fastq files are created not later than the lof file and the log files shows a succesful download.

Return values

name	type	description
isPairedEnd	boolean	Indicates whether paired-end (two fastq files) or single-end (one fastq file) sequencing data was downloaded
readFile1	string	path to first fastq file
readFile2	string	path to second fastq file (identical to first fastq file for single-end sequencing data)

Citation info

Sequencing data was downloaded from the NCBI Sequence Read Archive (SRA) using the SRA toolkit (version (%SOFTWARE_VERSION%)) [Leinonen R, Sugawara H, Shumway M; International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21.]

Pubmed references: 21062823,

featureCounts

by Michael Kluge - version 2

version 1 2

reads or fragments per gene, exon or any other feature are counted using featureCounts

Dependencies

featureCounts (v. 1.4.6)
featureCounts (v. 1.6.1)
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
annotation	file path	absolute		1	feature annotation in GTF or SAF format	0	0
input	file path	absolute		1	indexed BAM file which should be used for counting	0	0
output	file path	absolute		1	path to output file	0	0
annotationType	enum	SAF\|GTF		*	disables automatic type detection based on the file ending of the input file; valid values: GTF or SAF;	0	0
featureType	string		exon	*	feature type (e.g. exon or intron) which is used for counting in GTF mode	0	0
groupType	string		gene_id	*	attribute which is used for summarization in GTF mode	0	0
stranded	integer		0	*	indicates strand-specific read counting; possible values: 0 (unstranded), 1 (stranded) and 2 (reversely stranded)	0	0
threads	integer		1	*	number of threads used for counting	0	0
disableGroupSummarization	boolean		false	*	flag that can be used to turn summarization on groupType off	0	0
multiMapping	boolean		false	*	flag that enables counting of multi mapped reads	0	0
primary	boolean		true	*	when enabled only alignments which are flagged as primary alignments are counted	0	0
countFragments	boolean		false	*	counts fragments instead of reads; only for paired end data	0	0
multiCountMetaFeatures	boolean		false	*	allows a read to be counted for more than one meta-feature	0	0
detailedReadAssignments	boolean		false	*	saves for each read if it was assigned or not; filename: {input_file_name}.featureCounts; format: read name<TAB>status<TAB>feature name<TAB>number of counts for that read	0	0
minOverlap	integer		1	*	minimum number of overlapping bases required to assign a read to a feature; also negative values are allowed	2	2
minReadOverlap	integer		1	*	minimum number of overlapping bases required to assign a read to a feature; also negative values are allowed	1	1
minFracOverlap	double		0	*	assign reads to the meta-feature/feature which has the largest number of overlapping bases	2	2
readExtension5	integer		0	*	extend reads at the 5' end	2	2
readExtension3	integer		0	*	extend reads at the 3' end	2	2
fraction	boolean		false	*	count fractional; only in combination with the --assignToAllOverlappingFeatures or/and --multiMapping flag(s)	2	2
largestOverlap	boolean		false	*	assign reads to the meta-feature/feature that has the largest number of overlapping bases.	2	2
longReads	boolean		false	*	mode for long read counting (e.g. Nanopore or PacBio)	2	2

Return values

name	type	description	minV	maxV
FeatureCountSummaryFile	string	absolute file path to the summary file	0	0
FeatureCountCountFile	string	absolute file path to the count file	0	0

Citation info

FeatureCounts (%SOFTWARE_VERSION%) was applied to count read/fragment counts per gene/exon/other feature according to %annotation§N% annotation [Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014].

Pubmed references: 24227677,

filterBwaSampe

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

removes read pairs from sam/bam files created by bwa sampe

Dependencies

python3
pysam package

Parameter

name	type	restrictions	default	occurrence	description
inSamBam	file	file exsists, ending .sam or .bam		1	path to mapped paired reads in sam or bam format (recognized by file ending) created by bwa sampe
outSamBam	string			1	path to write remaining paired reads in sam or bam format (recognized by file ending)
removeUnmapped	boolean		True	*	use this flag to remove pairs with at least one unmapped read
removeImproperPairs	boolean		True	*	use this flag to remove pairs that are not properly paired according to bwa sampe
removeMapqBelow	integer	>=0	20	*	remove all read pairs with at least one mate of mapping quality smaller than minQuality (taken from field "MAPQ" in SAM file), setting the option to 0 deactivates filtering based on mapping quality
removeMoreThanOptimalHits	integer	>=0	1	*	remove all read pairs with more than maxHits optimal alignment positions for at least one mate (based bwa aln specific tag "X0"), setting the option to 0 deactivates filtering based on hit number
isSingleEnd	boolean		False	*	use this flag to indicate that single end data should be filtered

Return values

name	type	description	minV	maxV
filteredPairs	file	path of the sam/bam file with the remaining read pairs (same value as given in parameter outSamBam)	0	0

Citation info

We removed read pairs with unmapped reads/ improper pair classification/ low mapping quality/ multi-mappings (adjust to options used)

Pubmed references:

fseq

by Katharina Reinisch - version 1

version {@VERSION_LINKS@}

Identifies open chromatin regions from BAM files using F-Seq. For this purpose, BAM files are first converted to BED input format for F-Seq using bedtools.

Dependencies

F-Seq
bedtools

Parameter

name	type	default	occurrence	description
bam	file		1	bam file
name	string		1	sample name used in output files
dir	file		1	output directory
pathToFseq	string		1	path to Fseq jar
mergeDist	integer	0	*	[optional] distance for merging
heapSize	integer	-Xmx32000M	*	[optional] adjust JAVA OPTS heap size

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Open chromatin regions were determined using F-Seq [Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8].

Pubmed references: 18784119,

generateCoverageFiles

by Elena Weiß - version 1

version {@VERSION_LINKS@}

generates coverage files

Dependencies

binGenome

Parameter

name	type	occurrence	description
outputDir	file	1	path to output folder
bedgraphTable	file	1	path to table with bedgprahp paths
bedfile	file	1	path to bed file
bins	integer	1	number of bins to divide region
fixedBinSizeUpstream	string	*	[optional] can be used to create fixed bins upstream; format: 'binsize:binnumber'
fixedBinSizeDownstream	string	*	[optional] can be used to create fixed bins downstream; format: 'binsize:binnumber'
factor	string	0-null	[optional] factor to generate files for only that factor

Return values

name	type	description	minV	maxV
coverageFiles	string	path to coverage files	0	0
bedname	string	name of bed file	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

generateMetagenePlots

by Elena Weiß - version 1

version {@VERSION_LINKS@}

generates metagene plots

Dependencies

binGenome
sharedUtils

Parameter

name	type	occurrence	description
bedgraphTable	file	1	path to bedgraph table
genelist	string	*	list of genes to consider
experiment	string	*	type of experiment
metaFrame	integer	1	frame to plot
bins	integer	1	number of bins
aggregateFUN	string	1	function to aggegate
normShapeSum	boolean	1	how to norm shape
normLibSize	boolean	1	how to norm lib size
normBinLength	boolean	1	how to norm bin length
wilcox	boolean	*	should wilcox test be done
factor	string	0-null	which factor to consider
coverageFiles	file	1	path to coverage files
bedname	string	1	name of bed file
plotname	string	*	name of plot
config	file	1	path to config file
clusterPositions	file	*	positions to draw line

Return values

name	type	description	minV	maxV
generateMetagenePlotsOutputFolder	string	path where metagene plot is	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

genomeCoverage

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

converts bam format to bedgraph and tdf format

Dependencies

python3
bedtools
igvtools

Parameter

name	type	restrictions	default	occurrence	description
bam	file	file exists, BAM format, reads sorted by coordinates		1	path to bam file whose genome coverage should be analyzed
genome	file	file exists, ending .genome for an IGV genome file or ending .chrom.sizes for a simple text file with genome sizes		*	genome file or file with chromosome sizes for the genome that was used to create the bam file, the file is required only if the tdf option is set
outPrefix	string			1	file name prefix for saving the bedgraph file (outPrefix.bedgraph) and the tdf file (outPrefix.bedgraph.tdf)
tdf	boolean		true	*	transform bedgraph file into tdf format using igvtools
bedtoolsPath	file	existing executable	bedtools (in PATH)	*	path to bedtools executable, use if bedtools is not in PATH
igvtoolsPath	file	existing executable	igvtools (in PATH)	*	path to igvtools executable, use if igvtools is not in PATH

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

We created files for visualizing mapped reads with bedtools and igvtools.

Pubmed references: 20110278, 21221095,

getSplittedTables

by Elena Weiß - version 1

version {@VERSION_LINKS@}

splits bedgraph table in order to process parallel

Dependencies

{@DEPENDENCIES@}

Parameter

name	type	occurrence	description
outputDir	file	1	path to output folder
table	file	1	line entry of bedgraph table
for	string	1	gives type coverage or metagenes to split table into
factor	string	0-null	[optional] factor to generate files for only that factor

Return values

name	type	description	minV	maxV
list	string	dir where tables are written	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

grep

by Michael Kluge - version 1

version {@VERSION_LINKS@}

extracts information from text files using the exact or regex-based search of grep

Dependencies

GNU grep
GNU Core Utilities

Parameter

name	type	restrictions	occurrence	description
outputFile	file path	absolute	1	absolute path to file in which the output of grep is written
file	file path	absolute	1	absolute path to file to use as search input
options	string		*	additional flags or parameters that are directly delivered to grep
pattern	string		1	pattern to search for; can also be a regex if parameter -P is set

Return values

name	type	description	minV	maxV
grepResultFile	string	path to the output file	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

gseaPreranked

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

peforms gene set enrichment analysis with GSEAPreranked

Dependencies

python3
java8
GSEA

Parameter

name	type	restrictions	default	occurrence	description
gseaJar	file			1	Path of the GSEA jar file
label	string			1	name of the analysis, e.g. sample name
outdir	string			1	directory to store the results of GSEA
geneTab	file	file exists		1	tab-separated table of genes with expression values/changes
hasHeader	boolean		False	*	indicates if the first line of the geneTab should be interpreted as header
geneCol	integer	>=0	0	*	0-based position of the column with gene names
rankCol	integer	>=0	1	*	0-based position of the column with values to rank the genes, e.g. fold changes
geneset	string	allowed values: go, hallmark, transcription_factor, oncogenic_signatures, immunologic_signatures	hallmark	*	gene sets to test for enrichment
genesetVersion	string		6.1	*	version of MSigDB to use
scoring	string	allowed values: weighted, unweighted	unweighted	*	unweighted: classic score based on ranks, weighted: score includes values used for ranking
plotNr	integer	gt;0	50	*	create plots for "plot_nr" top scoring genes

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

We performed gene set enrichment analysis with GSEAPreranked.

Pubmed references: 16199517,

gtf2info

by Michael Kluge - version 1

version {@VERSION_LINKS@}

extracts information on genes and exons from GTF files and stores it in CSV format

Dependencies

perl

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
gtf	file	absolute		1	path to the GTF file	0	0
output	file	absolute		1	path to the output file; for exons suffix '.exons' is added	0	0

Return values

name	type	description	minV	maxV
geneInfoFile	string	absolute path to the resulting CSV file	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

gtfMatcher

by Florian Röckl - version 1

version {@VERSION_LINKS@}

matches detected variants (SNPs and indels) to genomic features of a GTF file.

Dependencies

python3

Parameter

name	type	occurrence	description
gtf	file	1	Path to GTF file containing annotated genomic features.
infile	file	1	Path to file containing variants.
out	file	1	Path to output file, where results of variants matched on features are stored.
mode	string	1	Select variant mode/type, which should get matched on GTF file. Modes are written in capital letters: SNP, INSERTION or DELETION.

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

gtfMatcher was used to match detected variants, including SNPs, deletions and insertions, to genomic features of a GTF file.

Pubmed references:

gzip

by Michael Kluge - version 1

version {@VERSION_LINKS@}

compresses and decompresses files using gzip; is able to verify file integrity using a md5 checksum file

Dependencies

GNU gzip or pigz
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
output	absolute file path		${input}.gz	*	path to output file
input	file path	absolute		1	path to input file
decompress	boolean		false	*	decompress the input file instead of compressing it
verify	boolean		true	*	verify file integrity after decompression using the md5 checksum file
oldPathMd5	file path	absolute		*	path where the files was stored when the md5 checksum was created
limitLines	integer	[1,]		*	extract only the first N lines
delete	boolean		false	*	delete the file after compression was performed; enforces integrity check
md5	file path	absolute		*	path to md5 checksum file to verify file integrity after decompression
quality	integer	[1,9]	9	*	compression quality ranging from 1 to 9; 9 being the slowest but best compression
binaryName	enum		gzip	*	name of the gzip binary; possible values: 'gzip' or 'pigz'
threads	integer	[1,128]	1	*	number of cores to use; only possible if 'pigz' is used as binary

Return values

name	type	description	minV	maxV
processedGzipFile	string	path to the input file	0	0
createdGzipFile	string	path to the output file	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

identifyStrain

by Florian Röckl - version 1

version {@VERSION_LINKS@}

identifies the virus strain of a sample.

Dependencies

python3

Parameter

name	type	occurrence	description
input	file	1	Path to file containing the consistent SNPs
reference	file	1	Path to file containing reference SNPs
output	file	1	Path to output file containing strain prediction
config	file	1	Path to config file containing an affiliation of reference samples and virus strain.

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

identifyStrain module was used to predict the virus strain of a sample using its consistent SNPs.

Pubmed references:

indexBam

by Michael Kluge - version 1

version {@VERSION_LINKS@}

creates an index for a BAM file using samtools index

Dependencies

samtools
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
bam	file path	absolute		1	path to the BAM file	0	0
link	boolean		true	*	creates a link called NAME.bam.bai because some tool expect the index under that name; use --nolink to disable it	0	0

Return values

name	type	description	minV	maxV
BAMFile	string	path to the BAM file for which the index was created	0	0

Citation info

Samtools (%SOFTWARE_VERSION%) was used to index the BAM files [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9].

Pubmed references: 19505943,

Links

https://www.htslib.org/doc/samtools.html

Sequencing

2022-03-23

insertSizeMetrics

This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries. The expected proportions of these metrics vary depending on the type of library preparation used, resulting from technical differences between pair-end libraries and mate-pair libraries. For a brief primer on paired-end sequencing and mate-pair reads, see the GATK Dictionary. The CollectInsertSizeMetrics tool outputs the percentages of read pairs in each of the three orientations (FR, RF, and TANDEM) as a histogram. In addition, the insert size distribution is output as both a histogram (.insert_size_Histogram.pdf) and as a data table (.insert_size_metrics.txt). Note: Metrics labeled as percentages are actually expressed as fractions!

Caroline Friedel

insertSizeMetrics

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

Dependencies

picard
java

Parameter

name	type	default	occurrence	description
Histogram_FILE	file		1	File to write insert size Histogram chart to. Required.
INPUT	file		1	Input SAM/BAM/CRAM file. Required.
OUTPUT	file		1	The file to write the output to. Required.
arguments_file	file		0-null	[optional] read one or more arguments files and add them to the command line This argument may be specified 0 or more times. Default value: null.
COMPRESSION_LEVEL	integer		*	[optional] Compression level for all compressed files created (e.g. BAM and VCF). Default value: 5.
DEVIATIONS	double		*	[optional] Generate mean, sd and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and sd grossly misleading regarding the real distribution. Default value: 10.0.
GA4GH_CLIENT_SECRETS	string		*	[optional] Google Genomics API client_secrets.json file path. Default value: client_secrets.json.
HISTOGRAM_WIDTHW	integer		*	null
MAX_RECORDS_IN_RAM	integer		*	[optional] When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. Default value: 500000.
METRIC_ACCUMULATION_LEVEL	string		*	[optional] The level(s) at which to accumulate metrics. This argument may be specified 0 or more times. Default value: [ALL_READS]. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP}
MIN_HISTOGRAM_WIDTH	integer		*	[optional] Minimum width of histogram plots. In the case when the histogram would otherwise betruncated to a shorter range of sizes, the MIN_HISTOGRAM_WIDTH will enforce a minimum range. Default value: null.
MINIMUM_PCT	double		*	[optional] When generating the Histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this percentage of overall reads. (Range: 0 to 1). Default value: 0.05.
REFERENCE_SEQUENCE	file		*	[optional] Reference sequence file. Default value: null.
STOP_AFTER	integer		*	[optional] Stop after processing N reads, mainly for debugging. Default value: 0.
TMP_DIR	file		0-null	[optional] One or more directories with space available to be used by this program for temporary storage of working files This argument may be specified 0 or more times. Default value: null.
VALIDATION_STRINGENCY	string	STRICT	*	[optional] Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. Possible values: {STRICT, LENIENT, SILENT}
VERBOSITY	string		*	[optional] Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING, INFO, DEBUG}
ASSUME_SORTED	boolean	true	*	[optional] If true (default), then the sort order in the header file will be ignored. Default value: true. Possible values: {true, false}
CREATE_INDEX	boolean	false	*	[optional] Whether to create an index when writing VCF or coordinate sorted BAM output. Default value: false. Possible values: {true, false}
CREATE_MD5_FILE	boolean	false	*	[optional] Whether to create an MD5 digest for any BAM or FASTQ files created. Default value:false. Possible values: {true, false}
INCLUDE_DUPLICATES	boolean	false	*	[optional] If true, also include reads marked as duplicates in the insert size histogram. Default value: false. Possible values: {true, false}
QUIET	boolean	false	*	[optional] Whether to suppress job-summary info on System.err. Default value: false. Possible values: {true, false}
USE_JDK_DEFLATER	boolean	false	*	[optional] Use the JDK Deflater instead of the Intel Deflater for writing compressed output. Default value: false. Possible values: {true, false}
USE_JDK_INFLATER	boolean	false	*	[optional] Use the JDK Inflater instead of the Intel Inflater for reading compressed input. Default value: false. Possible values: {true, false}
version	boolean	false	*	[optional] display the version number for this tool

Return values

name	type	description	minV	maxV
outputHistogramFile	string	output file containing the histogram of insert sizes	0	0
outputBamFile	string	txt file containing insert size metrics	0	0

Citation info

Insert size metrics were calculated with the picard library (%SOFTWARE_VERSION%).

Pubmed references:

joinFiles

by Michael Kluge - version 1

version {@VERSION_LINKS@}

joins two or more files together

Dependencies

GNU cat
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
input	string			1-	multiple input files (or input folders) in the order in which they should be joined; in pattern mode (--pattern) folder path(s) are expected
output	file path	absolute		1	path to output file
convertPairedEnd	boolean		false	*	special flag for joining of FASTQ files; adds /1 and /2 at the end of read names if casava format 1.8 or greater is used; default: disabled
pattern	string			0-null	one ore more unix file pattern (e.g. *.txt) that are used to find files matching that pattern; one pattern corresponds to one input folder path; order of files to join can not be influenced

Return values

name	type	description	minV	maxV
joinedFile	string	absolute file path to the joined file	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

leon

by Michael Kluge - version 1

version {@VERSION_LINKS@}

LEON is a reference-free method to compress high throughput sequencing data

Dependencies

LEON (tested with 1.0.0)

Parameter

name	type	restrictions	default	occurrence	description
input	file path	absolute		1	absolute path to input file; supported file formats: compress: .fastq or .fq; decompress: *.leon.tar
threads	integer		1	*	number of cores to use
kmerSize	integer		31	*	k-mer size that is used for compression
outputFolder	folder path	absolute		1	path to folder in which the compressed file is stored; resulting file will have .leon.tar or .fastq ending
workingDir	folder path	absolute	/usr/local/storage/	*	path to working directory

Return values

name	type	description	minV	maxV
createdFile	string	path to the compressed or decompressed file	0	0

Citation info

Sequencing data was (de-)compressed using LEON (%SOFTWARE_VERSION%) [G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 2015, 16:288.].

Pubmed references: 26370285,

listFiles

by Michael Kluge - version 1

version {@VERSION_LINKS@}

lists files in directories based on pattern

Dependencies

GNU Core Utilities
GNU findutils

Parameter

name	type	restrictions	default	occurrence	description
folder	folder	absolute		1-	one ore more input folders; one for each pattern
output	file path	absolute		*	write results to a file; one line per found file
sep	string		,	*	separator between entries
maxdepth	integer		0	*	descend at most n levels of folders
pattern	string			1-	one ore more unix file pattern (e.g. *.txt) that are used to find files matching that pattern; one pattern corresponds to one input folder path

Return values

name	type	description	minV	maxV
foundFiles	string	found files joined with the separator	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

mappingSummary

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

summarizes read counts remaining after different analysis steps of sequencing data

Dependencies

python3
matplotlib
seaborn

Parameter

name	type	restrictions	occurrence	description
basicStatsSummary	file	file exists, output of mergeStatistics module	*	Output of the Watchdog Module mergeStatistics applied on the Basic Statistics reported by FASTQC (tab-separated table, column 0: type of count, column 1: read count, column 2: file name)
rawRegex	string	valid regular expression in python re	*	regular expression with one group expression to extract the sample name from the name of a fastq file with untrimmed reads
trimRegex	string	valid regular expression in python re	*	regular expression with one group expression to extract the sample name from the name of a fastq file with trimmed reads
idxstatsSummary	file	file exists, output of mergeStatistics module	*	Output of the Watchdog Module mergeStatistics applied on the Idxstatistics reported by the bamstats module (tab-separated table, column 0: chromosome, column 2: read count, column 4: file name)
bamRegex	string	valid regular expression in python re	*	regular expression with one group expression to extract the sample name from the name of a bam file with mapped reads
chromosomeGroupingTable	file		*	tab-separated table with a header with chromosome names in column 0 and groups in column 1
countTable	string		1	path for writing a table with all extracted read counts
countPlot	string		*	path for saving a summary plot of total, trimmed and mapped reads, format is identified by file ending, all formats supported by pyplot are allowed
groupPlot	string		*	path for saving a summary plot of the fraction of mapped reads for given groups of chromosomes, format is identified by file ending, all formats supported by pyplot are allowed

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

We created figures summarizing the number of reads in our sequencing experiments before and after adapter removal and mapping.

Pubmed references:

mergeBam

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

merges 2 or more bam files using samtools

Dependencies

samtools

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
infile	file			2-	input bam file(s)	0	0
outfile	file			1	output bam file	0	0

Return values

name	type	description	minV	maxV
mergedBamFile	string	output bam file (= value for parameter outfile)	0	0

Citation info

bam files were merged using samtools (Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9)

Pubmed references: 19505943,

mergeFeatureCounts

by Michael Kluge - version 1

version {@VERSION_LINKS@}

combines the output of multiple featureCounts runs in one CSV file

Dependencies

GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
searchFolder	file	absolute		1	path to the folder in which *.counts files are located
output	file	absolute		1	path to the output file
statsFolder	file	absolute		*	path to merged statistic folder required for plotting
featureAnnotation	file	absolute		*	annotation file which is joined with the count file
featureAnnotationID	string		Geneid	*	name of the column with is used for joining
featureAnnotationType	string		type	*	name of the column in the annotation file for which a distribution plot is created
featureAnnotationExonLength	string		exon_length	*	name of the column that contains the exon length of the features
noPlotting	boolean		false	*	disables the execution of R scripts
prefixNames	boolean		false	*	prefixes the names of the features with continuous numbers

Return values

name	type	description	minV	maxV
mergedCountFile	string	absolute path to the merged count file in CSV format	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

mergeStatistics

by Michael Kluge - version 1

version {@VERSION_LINKS@}

takes a folder containing BAM statistics generated by the bamstats module and generates table-formated files

Dependencies

java11

Parameter

name	type	restrictions	occurrence	description
type	string		1	type of the statistic merger that should be called; allowed values: FastQC, Star, BamstatsMerger, CutadaptMerger, FeatureCounts, FlagstatMerger
inputDir	folder path	absolute	1	path to input folder
outputDir	folder path	absolute	1	path to output folder

Return values

name	type	description	minV	maxV
mergedFile	string	absolute path to the merged file	0	0
mergedType	string	type of the merger (parameter: type)	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

normalizeCPM

by Elena Weiß - version 1

version {@VERSION_LINKS@}

normalizes CPM

Dependencies

binGenome
sharedUtils

Parameter

name	type	occurrence	description
sums	file	1	files to sum
counts	file	1	file of counts
outputFile	file	1	path to output file

Return values

name	type	description	minV	maxV
normedCounts	string	file of normed counts	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

pausingIndex

by Elena Weiß - version 1

version {@VERSION_LINKS@}

computes pausing index for given window frames

Dependencies

createBEDandSAF
featureCounts

Parameter

name	type	occurrence	description
outputDir	file	1	path to output folder
gtf	file	1	path to gtf file
bam	file	1	path to bam file
promStart	integer	*	start position of promoter window
promEnd	integer	*	end position of promoter window
bodyStart	integer	*	start position of body window
bodyLength	integer	*	end position of body window
genelist	string	1	list of genes to consider
tss	file	1	path to tss file

Return values

name	type	description	minV	maxV
pausingindices	string	dir where pausing indices are computed	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

phantomPeak

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

analyzes strand cross-correlation in mapped reads from ChIP-seq experiments

Dependencies

python3
R >=3.1
spp (phantompeakqualtools)

Parameter

name	type	restrictions	default	occurrence	description
inBam	string	valid file path, bam format, ending *.bam		1	Path to the bam file with mapped ChIP-seq reads. An index of the bam file is not required.
outPrefix	string			1	Common prefix of all output files. The module produces 3 files: outPrefix.txt (summary file), outPrefix.pdf (cross-correlation plot) and outPrefix.Rdata (R session of the analysis).
sppPath	string	valid file path to the script run_spp.R		1	Path to executable (R script) of phantompeakqualtools which is usually called run_spp.R
rscriptPath	string	valid file path, executable	Rscript in PATH variable	*	Path to executable Rscript if not given in PATH variable
tmpdir	string	path to existing folder	return value of the tempdir() function of R	*	Folder for writing temporary files. The tool copies the whole bam file to this location. All temporary files are extended with a random suffix.
threads	integer	>=1	1	*	Number of threads used for the calculations

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Phantompeakqualtools were used to perform quality control of the mapped ChIP-seq reads.

Pubmed references: 22955991,

preDexseq

by Elena Weiß - version 1

version {@VERSION_LINKS@}

collects single amss files and creates annotation files for featurecounts and dexseq

Dependencies

sharedUtils

Parameter

name	type	occurrence	description
indir	file	1	input directory
annot	file	1	annotation file name to write in
annot_fc	file	1	annotation file to write in for featurecounts

Return values

name	type	description	minV	maxV
out	string	path to output directory	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

quantCurveScore

by Elena Weiß - version 1

version {@VERSION_LINKS@}

computes output score table

Dependencies

sharedUtils

Parameter

name	type	occurrence	description
controlCondition	string	1	name of control condition
testCondition	string	1	name of test condition
sampleAnnotation	string	1	path to sample annotation file with conditions
out	file	1	output directory

Return values

name	type	description	minV	maxV
out	string	output directory	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

readthrough

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

Calculates readthrough and readin values and optionally downstream FPKM and expression in dOCR regions

Dependencies

java
picard (jar included with module)
Apache Commons CLI library (jar included with module)

Parameter

name	type	default	occurrence	description
annotation	file		1	annotation file path
genecounts	file		1	gene read count file
input	file		1	input file
output	file		1	output file
readthroughLength	integer	5000	*	[optional] length of downstream window in which read-through is calculated
readinLength	integer	5000	*	[optional] length of upstream window in which read-in is calculated
strandedness	integer	0	*	strandedness: 0=not strandspecific, 1=first read indicates strand, 2=second read indicates strand
overlap	integer	25	*	[optional] minimum overlap of read to be counted for read-through/in window
idxstats	file		*	[optional] idxstats file with numbers of mapped reads per chromosome, necessary for calculating downstream FPKM and transcription in dOCR regions
normFactor	string		*	[optional] factor for normalizing by mapped reads and gene length for downstream FPKM calculation
exclude	string		*	[optional] chromosomes to exclude from calculating total mapped reads, separated by ,
excludeType	string		*	[optional] gene types to exclude when determining genes with no other genes up- or down-stream, separated by ,
dOCRFile	string		*	[optional] file containing dOCR lengths
windowLength	integer	1000	*	[optional] number of steps for evaluating transcription on dOCRs

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Read-through was calculated as previously described in Hennig T et al, 2028, PLOS Pathogens 14(3): e1006954

Pubmed references: 29579120,

recountReadout

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

calculates readout for every sample in a project from recount.

Dependencies

Python3
R 3.5.x
R packages recount and recount.bwtool

Parameter

name	type	restrictions	default	occurrence	description
projectID	string	mutually exclusive with projectFile		*	project id of a sra project indexed in recount2, it is possible to pass several project ids separated by ,
projectFile	file	file exists, mutually exclusive with projectID		*	file with one line giving project ids (file content = all allowed values for projectID)
geneTSV	file	file exists		1	tab-separated file with genes, cooridantes, exonic basepairs and upstream and downstream regions (requires a line with column names chr, geneid, exonic_bps, upstream_start, upstream_end, downstream_start and downstream_end)
outfolder	folder			1	folder for saving final results, creates a subfolder for the project with a table of coverage values for every sample in the project
tmpfolder	folder			1	folder for saving temporary data, creates a subfolder for the project (named projectID)
Rscript	file	executable		*	path to Rscript executable (preferentially version 5.3)
removeTmpSampleData	boolean		true	*	if this flag is set, temporary files for samples are deleted at the end (default behaviour)
removeTmpProjectData	boolean		true	*	if this flag is set, temporary files for projects are deleted at the end (default behaviour)
threads	integer	>=1	1	*	number of threads to use, equivalent to number of samples processed in parallel
downloadParallel	boolean		false	*	if this flag is set, big wig files are downloaded by in parallel (default: not set)
localRecountFolder	folder	absolute		*	folder that can contain locally processed or already downloaded recount data; structure: projectID/rse_gene.Rdata and projectID/bw/sampleID.bw

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Normalized readcounts for genes, upstream regions and downstream regions were calculated from the bigwigfiles provided by the Recount2 project.

Pubmed references: 28398307,

removeLinearReads

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

removes linearly mappable reads from a circRNA prediction.

Dependencies

python3
pysam (v0.14.1)

Parameter

name	type	restrictions	default	occurrence	description
mapping	file	SAM or BAM format		1	path to a SAM or BAM file with mapped reads from the sample for which circRNAs were predicted (file ending is used to decide if it is SAM or BAM format)
circRNAPrediction	file	file exists		1	predicted circRNAs from the CIRI2, circRNAfinder or the circCombination module (tab-separated, 5 columns: chromosome, start, end, strand, list of reads)
circOut	file			1	all circRNAs from the input file with at least minReads remaining circular reads after removing all linearly mappable reads from the lists circular junction reads
minReads	integer	>=1	2	*	Minimum number of predicted junction reads required for writing a circRNA to the outputfile, default:2
paired	string	'yes' or 'no'	yes	1	indicates if SAM or BAM input file contains paired-end (yes) or single-end (no) data

Return values

name	type	description	minV	maxV
filteredCircs	file	path to circRNA predictions with the filtered lists of circular reads (same as input parameter circOut)	0	0

Citation info

We filtered the predicted circular reads by removing those reads that can be mapped elsewhere in a linear way.

Pubmed references:

rrnaFilter

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

removes rrna reads from sequencing data

Dependencies

python3
pysam
bwa

Parameter

name	type	restrictions	default	occurrence	description
in1	file	file exists, allowed file endings: fastq, fq, fq.gz		1	first (gzipped) fastQ file with the sequenced reads
in2	file	file exists, allowed file endings: fastq, fq, fq.gz		*	second (gzipped) fastQ file with the sequenced reads (for paired-end data only)
rrnaIndex	string	filename prefix for a bwa index		1	Common prefix of bwa index files for the rRNA sequence
out1	string	file path with file ending fastq, fq, fa or fq.gz		1	file for writing non-rRNA reads from in1 in fasta or (gzipped) fastq format
out2	string	file path with file ending fastq, fq, fa or fq.gz		*	file for writing non-rRNA reads from in2 (for paired-end data) in fasta or (gzipped) fastq format
sam	string	file path with file ending sam		1	sam file for writing rRNA reads from in1 and in2
workdir	folder	folder exists	os.getcwd()	*	path to directory for writing large temporary files (content is deleted at the end of execution), default: current directory
keepTmp	boolean		False	*	option to keep temporary files
maxEditDistance	integer	>=0	infinity	*	maximum allowed edit distance for a read alignment against rRNA
maxMismatches	integer	>=0	infinity	*	maximum allowed number of mismatches for a read alignment against rRNA
maxIndels	integer	>=0	infinity	*	maximum allowed number of indels for a read alignment against rRNA
pairFiltering	integer	1 or 2	2	*	Number of reads of a pair required to fulfil the options above (maxEditDistance, maxMismatches, maxIndels)
bwaPath	executable		bwa	*	path to bwa executable
seedSize	integer	>=1	25	*	size of initial seed for bwa (-k option of bwa)
threads	integer	>=1	1	*	number of threads to use for bwa (-t option of bwa)

Return values

name	type	description
rrnaSAMFile	string	path to rRNA reads in SAM format (same value as given by the sam parameter)
filteredFQ1	string	path to non rRNA reads in FASTQ format (same value as given by the out1 parameter)
filteredFQ2	string	path to non rRNA reads in FASTQ format (same value as given by the out2 parameter), for single-end data the value of the return variable is set to "not_defined_for_single_end"

Citation info

Before mapping the reads to the reference genome we removed reads originating from rRNAs

Pubmed references:

sam2bam

by Michael Kluge - version 1

version {@VERSION_LINKS@}

converts SAM files into compressed BAM format using samtools sort

Dependencies

samtools
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
sam	file path	absolute		1	path to SAM file that should be compressed
bam	file path	absolute		1	path to ouput BAM file
threads	integer		1	*	number of threads to use for compression
quality	integer	[1, 9]	9	*	compression level; 1 is the worst/fastest and 9 is the best/slowest compression
memory	string		768M	*	maximal memory that can be used per thread; only an estimation and might be exceeded!
tmpFolder	folder path	absolute		*	write temporary files to that folder

Return values

name	type	description	minV	maxV
BAMFile	string	absolute path to the resulting BAM file	0	0

Citation info

Samtools (%SOFTWARE_VERSION%) was used to convert SAM to BAM files [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9].

Pubmed references: 19505943,

samtoolsView

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

runs samtools view on BAM/SAM/CRAM files

Dependencies

samtools

Parameter

name	type	default	occurrence	description
bamoutput	boolean	false	*	output BAM
cramoutput	boolean	false	*	output CRAM (requires reference sequence)
fastCompression	boolean	false	*	use fast BAM compression (implies bamoutput)
uncompressedBam	boolean	false	*	uncompressed BAM output (implies bamoutput)
includeHeader	boolean	false	*	include header in SAM output
printOnlyHeader	boolean	false	*	print SAM header only (no alignments)
printCounts	boolean	false	*	print only the count of matching records
output	file	stdout	*	output file name
outputReadsNotSelected	file		*	output reads not selected by filters to FILE
referenceLengths	file		*	FILE listing reference names and lengths (see long help)
bedfile	file		*	only include reads overlapping this BED FILE
readgroup	string		*	only include reads in read group STR
readgroupFile	file		*	only include reads with read group listed in FILE
mappingquality	integer	0	*	only include reads with mapping quality at least INT
library	string		*	only include reads in library STR
minquerylength	integer		*	only include reads with number of CIGAR operations consuming query sequence at least INT
bitsset	integer	0	*	only include reads with all bits set in INT set in FLAG
bitsnotset	integer	0	*	only include reads with none of the bits set in INT set in FLAG
readTagToStrip	string		*	read tag to strip (repeatable)
collapseCIGAROperation	string		*	collapse the backward CIGAR operation
seed	double	0	*	integer part sets seed of random number generator, rest sets fraction of templates to subsample
threads	string		*	number of BAM/CRAM compression threads
printLongHelp	string		*	print long help, including note about region specification
inputfmtoption	string		*	Specify a single input file format option in the form of OPTION or OPTION=VALUE
outputfmt	string		*	Specify output format (SAM, BAM, CRAM)
outputfmtoption	string		*	Specify a single output file format option in the form of OPTION or OPTION=VALUE
reference	string		*	Reference sequence FASTA FILE
inbam	file		*	input bam file
insam	file		*	input sam file
incram	file		*	input cram file
region	string		*	region selected

Return values

name	type	description	minV	maxV
outputFile	string	output file (= value for parameter output)	0	0

Citation info

Samtools was used to convert BAM/SAM/CRAM to BAM/SAM/CRAM [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9]

Pubmed references: 19505943,

sashimiPlot

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

Performs visualization of splicing events across multiple samples using ggsashimi.

Dependencies

ggsashimi
python (2.7 or 3)
samtools (>=1.3)
R (>=3.3)
R package ggplot2 (>=2.2.1)
R package data.table (>=1.10.4)
R package gridExtra (>=2.2.1)
R package svglite (>=1.2.1), when generating output images in SVG format

Parameter

name	type	default	occurrence	description
help	string		*	show this help message and exit
bam	file		*	Individual bam file or file with a list of bam files. In the case of a list of files the format is tsv: 1col: id for bam file, 2col: path of bam file, 3+col: additional columns
coordinates	string		*	Genomic region. Format: chr:start-end (1-based)
outprefix	string	sashimi	*	Prefix for plot file name
outstrand	string	both	*	Only for --strand other than 'NONE'. Choose which signal strand to plot: both, plus, minus
mincoverage	integer	1	*	Minimum number of reads supporting a junction to be drawn
junctionsbed	file		*	Junction BED file name
gtf	file		*	Gtf file with annotation (only exons is enough)
strand	string	NONE	*	Strand specificity: NONE, SENSE, ANTISENSE, MATE1_SENSE, MATE2_SENSE
overlay	integer		*	Index of column with overlay levels (1-based)
aggr	string		*	Aggregate function for overlay: mean, median, mean_j, median_j. Use mean_j \| median_j to keep density overlay but aggregate junction counts
colorfactor	integer		*	Index of column with color levels (1-based)
alpha	double	0.5	*	Transparency level for density histogram
palette	file		*	Color palette file. tsv file with at least 1 column, where the color is the first column
labels	integer		*	Index of column with labels (1-based)
height	double	2	*	Height of the individual signal plot in inches
annheight	double	1.5	*	Height of annotation plot in inches
width	double	10	*	Width of the plot in inches
basesize	integer	14	*	Base font size of the plot in pch
outformat	string	pdf	*	Output file format: pdf, svg, png, jpeg, tiff
outresolution	integer	300	*	Output file resolution in PPI (pixels per inch). Applies only to raster output formats
shrink	boolean	false	*	Shrink the junctions by a factor for nicer display

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Sashimi plots were created using ggsashimi [Garrido-Martín D, Palumbo E, Guigó R, Breschi A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 2018 Aug 17,14(8):e1006360. ]

Pubmed references: 30118475,

scaledMetashape

by Elena Weiß - version 1

version {@VERSION_LINKS@}

creates metagene over whole body

Dependencies

binGenome
sharedUtils

Parameter

name	type	occurrence	description
bedgraphTable	file	1	table with paths to bedgraph files and conditions/replicates
genelist	string	*	list of genes to consider
experiment	string	*	type of experiment
metaFrame	integer	1	frame to plot
bins	integer	1	number of fixed bins to scale
aggregateFUN	string	1	function for aggregation
normShapeSum	boolean	1	how to norm shape
normLibSize	boolean	1	how to norm lib size
normBinLength	boolean	1	how to norm bin length
factor	string	0-null	factor to consider
coverageFiles	file	1	path to where coverage files are
bedname	string	1	name of bed file
plotname	string	*	name of plot
config	file	1	file to configs

Return values

name	type	description	minV	maxV
scaledMetashapeOutputFolder	string	folder where plot is	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

spring

by Michael Kluge - version 1

version {@VERSION_LINKS@}

SPRING is a reference-free method to compress high throughput sequencing data

Dependencies

SPRING (tested with 1.0v1.0)
GNU core utilities

Parameter

name	type	default	occurrence	description
fastq	file		*	path to one or two (PE datasets) fastq files; possible endings: .fastq, .fq, .fastq.gz or .fq.gz file
spring	file		1	path to compressed spring file; possible endings: .spring or .tar
compress	boolean	true	*	if true the fastq files are compressed; otherwise the spring file is decompressed
preserveOrder	boolean	true	*	preserve read order
quality	boolean	true	*	retain quality values during compression
ids	boolean	true	*	retain read identifiers during compression
qualityMode	enum	lossless	*	possible values: 'lossless', 'qvz qv_ratio', 'ill_bin' or 'binary thr high low'
long	boolean	false	*	use for compression of arbitrarily long reads
decompressRange	string		*	decompress only reads (or read pairs for PE datasets) from start to end (both inclusive); e.g. '1 100'
workingDir	file	/usr/local/storage/	*	path to working directory
threads	integer	1	*	number of cores to use

Return values

name	type	description	minV	maxV
createdFile	string	path to the compressed or decompressed file (separated by ',' in case of PE datasets)	0	0
isPairedEnd	boolean	true if paired-end data was processed	0	0

Citation info

The FASTQ files were compressed using SPRING.

Pubmed references: 30535063,

sraDump

by Michael Kluge - version 1

version {@VERSION_LINKS@}

downloads and extracts FASTQ files from the Sequence Read Archive (SRA)

Dependencies

fastq-dump or fasterq-dump
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
sraFile	file path	absolute		0-null	path to the *.sra file(s); can not be used in combination with --sraID
sraID	string			0-null	one or more SRA ID(s); can not be used in combination with --sraFile
rename	string			*	new basename for the resulting fastq files;
outputFolder	folder path	absolute		1	path to folder in which the files should be extracted
tmpFolder	folder path	absolute	/usr/local/storage	*	tmp folder; default: /usr/local/storage
deleteOnSuccess	boolean		false	*	deletes the SRA file when extraction was successfull
disablePrefetch	boolean		false	*	disables prefetching of the sra files
binaryName	enum		fastq-dump	*	name of the sra-toolkit binary; possible values: 'fastq-dump' or 'fasterq-dump'
threads	integer	[1,128]	1	*	number of cores to use; only possible if 'fasterq-dump' is used as binary

Return values

name	type	description
isPairedEnd	boolean	true, if paired end data was downloaded from SRA
baseName	string	absolute base name path to the created files
createdFiles	string	absolute path to all files that were downloaded separated by ','

Citation info

Public samples were downloaded from the SRA (accession number: TODO %sraID%) [Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2010;39(Database issue):D19-21.].

Pubmed references: 21062823,

sumIdxStat

by Elena Weiß - version 1

version {@VERSION_LINKS@}

sums up idxstats

Dependencies

{@DEPENDENCIES@}

Parameter

name	type	occurrence	description
inputFile	file	1	path to input file
outputFile	file	1	path to output file
excludeChrom	string	0-null	chromosome to exclude from sum

Return values

name	type	description	minV	maxV
samplesSum	string	sum of idxstats	0	0

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

svCaller

by Florian Röckl - version 1

version {@VERSION_LINKS@}

calls deletions and insertions. Deletions are also verified and consensus sequences of insertions are extracted.

Dependencies

python3

Parameter

name	type	default	occurrence	description
help	string		*	show this help message and exit
bed	file		*	Path to bedgraph file
min_cld	integer	100	*	The mininum distance of two clusters, at which they still get combined
min_size	integer	2	*	Minimum size of a deletion.
max_z	double	0.0	*	Maximum z score threshold for coverage analysis
max_direct	double	-2.5	*	Maximum direct z score threshold for coverage analysis
max_local	double	-6.0	*	Maximum local z score threshold for coverage analysis
range	integer	500	*	Size of range/region before a certain position, used for the determination of local z Score parameters
pc	integer	1	*	Pseudo count for coverages over positions
tol	double	0.8	*	Tolerance of insertion positions mapped to deletions
bam	file		*	Path to bam file used for clipping patter analysis
out_del	file		*	Path to output txt file containing deletions
out_ins	file		*	Path to output txt file containing insertions
max_patt_diff	integer	10	*	Maximum distance of peaks of clipped reads to count them as insertion
min_sur_z	double	50.0	*	Minimum local z score for clipping pattern analysis
ws	integer	20	*	Size of the window, whose position are controlled to be significantly low
min_z	double	10.0	*	Minimum z score for clipping pattern analysis
get_clp_file	file		*	Set this paramter as a path to get a file containing for each position the number of clipped reads
min_reads	integer	10	*	Minimum number of reads at which a position is permitted to be a peak
gen_prop	integer	1000	*	Number of propagations to determine genome start/end
gap	integer	5	*	Maximum number of permitted consecutive gaps/0-coverage positions during the determination of the genome start/end
ref	file		*	Path to reference genome
fir_ws	double	0.0	*	Primary threshold for the score, which is used for the verification of deletions with clipped sequences
sec_ws	double	1.0	*	Secondary, more stringent threshold for the score, which is used for the verification of deletions with clipped sequences
con_path	file		*	Path to the file containing the consensus sequences
mpc	integer	1	*	Small pseudo count for the log used for the computation of the PWMs
min_length	integer	10	*	The minimum length of a consensus sequence
clp_ver_range	integer	100	*	The range of clipped positons of deletions, where consesus sequences are tried to match on

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

svCaller was used to call deletions, insertions as well as consensus sequences of insertions and to verify the predicted deletions.

Pubmed references:

trimmedFastqPairFilter

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

extracts paired reads from 2 fastq files.

Dependencies

python3

Parameter

name	type	restrictions	occurrence	description
inReads1	file	file exists, fastq format (read names without read numbers as /1)	*	path to first fastq file with reads
inReads2	file	file exists, fastq format (read names without read numbers as /2)	*	path to second fastq file with reads
inPrefix	string	prefix1.[fastq\|fq] and prefix2.[fastq\|fq] exist and meet the restrictions of inReads1 and inReads2	*	reads in two fastq files: prefix1.[fastq\|fq], prefix2.[fastq\|fq], can be used instead of inReads1 and inReads2
outReads1	string		*	output file for first reads of paired data
outReads2	string		*	output file for second reads of paired data
outSingletons	string		*	output file for singleton reads without a mate
outPrefix	string		*	writes output to three files: prefix1.fastq, prefix2.fastq, prefixsingleton.fastq, can be used instead of outReads1, outReads2 and outSingleton

Return values

name	type	description
pairedReads1	string	output file for first reads of paired data given in the parameters via outReads1 or outPrefix
singletonReads	string	output file for second reads of paired data given in the parameters via outReads2 or outPrefix
pairedReads2	string	output file for singleton reads without a mate given in the parameters via outReads2 or outPrefix

Citation info

We removed all reads with missing mates from the paired-end fastq files.

Pubmed references:

umiDedup

by Michael Kluge - version 1

version {@VERSION_LINKS@}

unique molecular identifiers (UMIs) can be used to remove PCR duplicates

Dependencies

umi_tools
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
bamFile	file path	absolute		1	path to the BAM file; UMI must be a suffix of the fastq id separated with '_'
outputFile	file path	absolute		1	path to the de-duplicated BAM file
deleteOnSuccess	boolean		false	*	deletes the BAM file when deduplication was successfull

Return values

name	type	description	minV	maxV
deduplicatedFile	string	absolute path to the de-duplicated BAM file	0	0

Citation info

UMI-tools was used to remove PCR duplicates from the raw sequecing data based on UMIs [Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017;27(3):491-499.].

Pubmed references: 28100584,

untar

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

extracts *.tar, *.tar.gz and *.tar.bz2 archives

Dependencies

GNU tar

Parameter

name	type	restrictions	default	occurrence	description	minV	maxV
infile	file			1	input file, must be tar, tar.gz or *tar.bz2	0	0
outputdir	file			*	[optional] output directory for extracting archive	0	0

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

varScanMpileup

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

runs samtools mpileup followed by VarScan for multi-sample calling for variant detection.

Dependencies

samtools
VarScan

Parameter

name	type	default	occurrence	description
infile	file		1-	input file
method	string	mpileup2snp	*	method: mpileup2snp, mpileup2indel or mpileup2cns
jar	file		1	jar file
reference	file		1	reference sequence
minCoverage	integer		*	paramIntegerRange_varScanMpileup
minReads2	integer		*	Minimum supporting reads at a position to call variants
minAvgQual	integer		*	Minimum base quality at a position to count a read
minVarFreq	double		*	Minimum variant allele frequency threshold
minFreqForHom	double		*	Minimum frequency to call homozygote
pValue	double		*	Default p-value threshold for calling variants
strandFilter	integer		*	Ignore variants with &gt;90% support on one strand
outputVcf	integer		*	If set to 1, outputs in VCF format
vcfSampleList	file		*	For VCF output, a list of sample names in order, one per line
variants	integer		*	Report only variant (SNP/indel) positions
output	file		1	output file

Return values

{@RETURN_VALUES@}

name	type	description	minV	maxV

Citation info

Variant calling was performed with Varscan ((%SOFTWARE_VERSION%)) [Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009 Sep 1;25(17):2283-5]

Pubmed references: 19542151,

wget

by Michael Kluge - version 1

version {@VERSION_LINKS@}

wget is used to locate and download URI resources

Dependencies

GNU wget
GNU Core Utilities

Parameter

name	type	restrictions	default	occurrence	description
uri	string			1-	one ore more URI(s) pointing to the resource(s) to download
output	folder path	absolute		1	path to a folder in which the downloaded files should be stored; filename remains untouched
rename	string			0-null	renames the file to that name; multiple names must be provided in case of multiple URIs
disableSizeCheck	boolean		false	*	flag that can be used to disable the size check that checks if a file is greater than 1KB

Return values

name	type	description
downloadedFolder	string	path to the folder in which the files were stored
numberOfFiles	integer	number of files that were downloaded
downloadedFiles	string	absolute path to the downloaded file(s) separated by ','

Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

https://www.gnu.org/software/wget/

AttachAnno

by Michael Kluge - version 1 version {@VERSION_LINKS@}

BWA

by Caroline Friedel - version 1 version {@VERSION_LINKS@}

ChIPSeeker

by Michael Kluge - version 1 version {@VERSION_LINKS@}

DETest

by Michael Kluge - version 1 version {@VERSION_LINKS@}

DEXSeq

by Michael Kluge - version 1 version {@VERSION_LINKS@}

DaPars

by Michael Kluge - version 1 version {@VERSION_LINKS@}

EnrichAnno

by Michael Kluge - version 1 version {@VERSION_LINKS@}

GEM

by Michael Kluge - version 1 version {@VERSION_LINKS@}

HISAT2

by Daniel Strobl - version 1 version {@VERSION_LINKS@}

SPades

by Florian Röckl - version 1 version {@VERSION_LINKS@}

STARgenomeGenerate

by Caroline Friedel - version 1 version {@VERSION_LINKS@}

addSequence2Sam

by Michael Kluge - version 1 version {@VERSION_LINKS@}

amss

by Elena Weiß - version 1 version {@VERSION_LINKS@}

assemblyAnalyzer

by Florian Roeckl - version 1 version {@VERSION_LINKS@}

bam2wiggle

by Michael Kluge - version 1 version {@VERSION_LINKS@}

bamContigtDistribution

by Michael Kluge - version 1 version {@VERSION_LINKS@}

bamToBed

by Sophie Friedl - version 1 version {@VERSION_LINKS@}

bamToBigWig

by Sophie Friedl - version 1 version {@VERSION_LINKS@}

bamstats

by Michael Kluge - version 1 version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Daniel Strobl - version 1

version {@VERSION_LINKS@}

by Florian Röckl - version 1

version {@VERSION_LINKS@}

by Caroline Friedel - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Elena Weiß - version 1

version {@VERSION_LINKS@}

by Florian Roeckl - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

by Sophie Friedl - version 1

version {@VERSION_LINKS@}

by Michael Kluge - version 1

version {@VERSION_LINKS@}