GENERAL
2019-08-09
AttachAnno

attaches annotations to a CSV file

AttachAnno

by Michael Kluge - version 1
version {@VERSION_LINKS@}

attaches annotations to a CSV file

Dependencies

  • GNU core utilities
  • GNU R
  • packages: getopt, stringi


Parameter

name type restrictions default occurrence description minV maxV
targetFilefileabsolute1path to char-separated table file with header00
targetSepstring\t1-separating char in the annotation file(s)00
outputFilefileabsolute1path to the annotated output file00
targetIDcolumnstring1-name of the column of the target file that should be used to merge the table with the annotation file(s)00
annotationIDcolumnstring1-name(s) of the column(s) of the annotation file(s) that should be used to merge the table with the annotation file(s)00
annotationFilefileabsolute1-path(s) to annotation table file(s) that should be attached00
annotationSepstring\t1-separating char in the target file00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2019-11-14
ChIPSeeker

ChIPSeeker can be used to visualize called peaks in ChIP-seq data

ChIPSeeker

by Michael Kluge - version 1
version {@VERSION_LINKS@}

ChIPSeeker can be used to visualize called peaks in ChIP-seq data

Dependencies

  • GNU R
  • packages: GenomicFeatures, ChIPseeker, getopt


Parameter

name type restrictions default occurrence description minV maxV
bedFilesfileabsolute1-path to *.bed or *.narrowPeak files that contain called peaks00
annoDbstring1name of the R genome annotation database (e.g .org.Hs.eg.db)00
txdbstring1file or name of R library containing transcript-related features of a particular genome (e.g. TxDb.Hsapiens.UCSC.hg38.knownGene)00
outputDirfileabsolute1path to an output folder in which the plots will be stored00
promotorUpstreaminteger3000*size in bp used to define the promotor region upstream of the annotated TSS (transcription start site)00
promotorDownstreaminteger3000*size in bp used to define the promotor region downstream of the annotated TSS (transcription start site)00
resampleinteger1000*number of resample iterations for confidence interval estimation00
confstring0.95*confidence interval to be estimated00


Return values

name type description minV maxV
ChIPSeekerOutputFolderstringpath to the output folder containing the plots00


Citation info

Pubmed references: 25765347,

SEQUENCING
2021-10-09
DETest

performs differential gene expression tests based on count tables

DETest

by Michael Kluge - version 1
version {@VERSION_LINKS@}

performs differential gene expression tests based on count tables

Dependencies

  • GNU R
  • packages: edgeR, DESeq, DESeq2, limma, Biobase, RColorBrewer, gplots, getopt, genefilter, lattice


Parameter

name type restrictions default occurrence description minV maxV
controlConditionstring1name of the control condition00
testConditionstring1name of the test condition00
countFilefileabsolute1count file with features in rows and samples in columns00
sampleAnnotationfileabsolute1annotation file with sample names in the first colum and sample condition in the second condition; (header: sample\tcondition)00
featureAnnotationfileabsolute*annotation file which is joined with the count file00
featureAnnotationIDstringFeatureID*name of the column used for joining00
featureAnnotationTypestringtype*name of the column in the annotation file for which a distribution plot is created00
excludeSamplesstring0-nullnames of samples that should be excluded from the analysis00
pValueCutoffdouble[0-1]0.01*p-Value cutoff for significant results00
minKeepReadsinteger>=025*number of reads a feature must own in average per sample to pass filtering step before DE test is performed00
foldchangeCutoffinteger0.0,0.415,1.00-nulllog2 foldchange cutoffs for which a own result file will be created; will be used for both directions (+/-)00
foldchangeCutoffNamesstringsignificant,0.33-fold,2-fold0-nullcorresponding names to the foldchange cutoffs00
foldchangeCutoffdouble1*log2 foldchange cutoffs the two-colored volcano plot; will be used for both directions (+/-)00
downregColorstringred*color for down-regulated genes in the two-colored volcano plot00
upregColorstringblue*color for down-regulated genes in the two-colored volcano plot00
outputfileabsolute1path to output folder00
methodstringall*method that should be applied; one of: limma, DESeq, DESeq2, edgeR, all00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Differential gene expression analysis was performed using %method% (%SOFTWARE_VERSION%).

Pubmed references: 20979621, 19910308, 25605792, 25516281,

SEQUENCING
2019-08-08
DEXSeq

tests RNA-seq data for differential exon usage

DEXSeq

by Michael Kluge - version 1
version {@VERSION_LINKS@}

tests RNA-seq data for differential exon usage

Dependencies

  • GNU core utils
  • GNU R
  • packages: getopt, DEXSeq, GenomicFeatures, BiocParallel, GenomicRanges, GenomicFeatures


Parameter

name type restrictions default occurrence description minV maxV
controlConditionstring1name of the control condition00
testConditionstring1name of the test condition00
countFilefileabsolute1count file with features in rows and samples in columns00
flattedGTFAnnotationfileabsolute1flatted GTF file which was used to create the count file; created by dexseq_prepare_annotation.py that comes with DEXSeq00
sampleAnnotationfileabsolute1annotation file with sample names in the first colum and sample condition in the second condition (header: sample\tcondition)00
featureAnnotationfileabsolute*annotation file which is joined with the count file00
featureAnnotationIDstringGeneid*name of the column with is used for joining00
featureAnnotationNamestringname*name of the column in the annotation file that contains the name of the feature00
excludeSamplesstring0-nullnames of samples that should be excluded from the analysis00
pValueCutoffdouble[0,1]0.01*p-Value cutoff for significant results00
minKeepReadsinteger[1,]25*number of reads a feature must own in average per sample to pass filtering step before DE test is performed00
outputfileabsolute1output folder00
threadsinteger[1,]1*number of threads to use for testing00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Differential exon usage was determined using DEXSeq (%SOFTWARE_VERSION%).

Pubmed references: 22722343,

SEQUENCING
2019-08-08
DaPars

dynamic analysis of alternative polyadenylation from RNA-seq

DaPars

by Michael Kluge - version 1
version {@VERSION_LINKS@}

dynamic analysis of alternative polyadenylation from RNA-seq

Dependencies

  • DaPars
  • python
  • GNU core utilities


Parameter

name type restrictions default occurrence description minV maxV
controlConditionstring1name of the control condition00
testConditionstring1name of the test condition00
sampleAnnotationfileabsolute1annotation file with sample names in the first colum and sample condition in the second condition (header: sample\tcondition)00
excludeSamplesstring1-names of samples that should be excluded from the analysis00
wigFolderfile1folder containing the wig files (format: folder/samplename.bedgraph)00
wigEndingstringbedgraph*ending of the wig files00
annotated3UTRfileabsolute1path to annotated 3' UTR regions created with DaPars_Extract_Anno.py00
outputFilefileabsolute1path to the output file00
coverageCutoffinteger[1,]30*coverage threshold00
FDRCutoffdouble[0,1]0.01*FDR cutoff00
PDUICutoffdouble[0,100]0.5*degree of difference in APA usage in percent00
FoldChangeCutoffdouble0.5*log2 foldchange cutoff between the two conditions00
numberOfCondASamplesReachingCutoffinteger[1,]*number of samples from condition A that must pass the coverage cutoff; default: all samples00
numberOfCondBSamplesReachingCutoffinteger[1,]*number of samples from condition B that must pass the coverage cutoff; default: all samples00


Return values

name type description minV maxV
wiggleFilestringpath to the output file in WIG format00


Citation info

DaPars (%SOFTWARE_VERSION%) was used to identify alternative polyadenylation.

Pubmed references: 25409906,

SEQUENCING
2019-08-08
EnrichAnno

gene set enrichment analysis on GO and KEGG

EnrichAnno

by Michael Kluge - version 1
version {@VERSION_LINKS@}

gene set enrichment analysis on GO and KEGG

Dependencies

  • GNU R
  • packages: getopt, clusterProfiler, pathview, KEGGREST


Parameter

name type restrictions default occurrence description minV maxV
backgroundFilefileabsolute1path to file with header, which contains a list of ENSEMBL or GENDCODE identifiers that should be used as backgroud00
testFilesstringabsolute1-path to file(s) with header, which contain a list of ENSEMBL or GENDCODE identifiers that should be used for enrichment testing00
orgDBstring1name of the organism database (orgDB) that should be used as GO annotation; if package is missing it is installed via biocLite00
keggDBNamestring*organism code for KEGG (e.g. mmu / hsa); http://www.genome.jp/kegg/catalog/org_list.html; if not supported by KEGGREST parameter will be ignored00
pValueCutoffdouble0.01*p-Value cutoff for significant results00
plotKeggbooleantrue*if enabled, plots are created for KEGG pathways00
outputfileabsolute1path to output basename; folder is created if not existent00
suffixstring*suffix that is inserted before basename of output; if a absolute path basename is applied00
foldchangeColstring*name of the colum that contains the log2FC00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Afterwards gene set enrichment analysis was performed on gene sets defined by GO (%orgDB%) and KEGG (%keggDBName%) enrichment on up-/down-regulated genes using clusterProfiler (%SOFTWARE_VERSION%).

Pubmed references: 22455463,

SEQUENCING
2019-11-08
GEM

identifies protein-DNA interaction at high resolution in ChIP-seq data

GEM

by Michael Kluge - version 1
version {@VERSION_LINKS@}

identifies protein-DNA interaction at high resolution in ChIP-seq data

Dependencies

  • GEM
  • java
  • GNU core utilities


Parameter

name type restrictions default occurrence description minV maxV
jarPathfileabsolute1path to GEM jar file00
exptfileabsolute1aligned read file00
readDistributionfileabsolute1read spatial distribution file00
gpsOnlybooleantrue*run in GPS only mode00
kinteger8*length of the k-mer for motif finding, use --k or (--kmin & --kmax); GEM parameter00
kMininteger6*min value of k, e.g. 6; GEM parameter00
kMaxinteger13*max value of k, e.g. 13; GEM parameter00
seedstring*exact k-mer string to jump start k-mer set motif discovery; GEM parameter00
genomefolderabsolute*the path to the genome sequence directory, for motif finding; GEM parameter00
outputPrefixfileabsolute1output folder name and file name prefix00
controlfileabsolute*aligned reads file for control00
chrSizefileabsolute*genome chrom.sizes file with chr name/length pairs00
formatstringBED*read file format: BED/SAM/BOWTIE/ELAND/NOVO00
sizeInBpinteger *size of mappable genome in bp (default is estimated from genome chrom sizes)00
alphaValuedouble *minimum alpha value for sparse prior (default is esitmated from the whole dataset coverage)00
qValuedouble2*significance level for q-value, specify as -log10(q-value) (default=2, q-value=0.01)00
threadsinteger#CPU*maximum number of threads to run GEM in paralell00
kSeqsinteger5000*number of binding events to use for motif discovery; GEM parameter00
memoryPerThreadinteger2048*total memory per thread in MB if running on local host; otherwise memory limit of executor might be set00
useFixedAlphabooleanfalse*use a fixed user-specified alpha value for all the regions00
JASPAROutputbooleantrue*output motif PFM in JASPAR format; GEM parameter00
MEMEOutputbooleantrue*output motif PFM in MEME format; GEM parameter00
HOMEROutputbooleantrue*output motif PFM in HOMER format; GEM parameter00
BEDOutputbooleantrue*output binding events in BED format for UCSC Genome Browser00
NarrowPeakOutputbooleantrue*output binding events in ENCODE NarrowPeak format 00
workingDirfolder pathabsolute/usr/local/storage/*path to working directory00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

GEM (%SOFTWARE_VERSION%) was used to call peaks in the ChIP-seq data [Y. Guo, S. Mahony, D.K. Gifford, High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLoS Computational Biology, (2012) 8(8): e1002638].

Pubmed references: 22912568,

sequencing
2019-10-16
HISAT2

Performs spliced RNA-seq read mapping using HISAT2.

HISAT2

by Daniel Strobl - version 1
version {@VERSION_LINKS@}

Performs spliced RNA-seq read mapping using HISAT2.

Dependencies

  • HISAT2


Parameter

name type restrictions default occurrence description minV maxV
unpairedfile*Files with unpaired reads. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).00
sinteger*skip the first <int> reads/pairs in the input (none)00
uinteger*stop after first <int> reads/pairs (no limit)00
trim5string*trim <int> bases from 5'/left end of reads (0)00
trim3string*trim <int> bases from 3'/right end of reads (0)00
nceilstring*func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)00
pencanspliceinteger*penalty for a canonical splice site (0)00
pennoncanspliceinteger*penalty for a non-canonical splice site (12)00
pencanintronlenstring*penalty for long introns (G,-8,1) with canonical splice sites00
pennoncanintronlenstring*penalty for long introns (G,-8,1) with noncanonical splice sites00
minintronleninteger*minimum intron length (20)00
maxintronleninteger*maximum intron length (500000)00
knownsplicesiteinfilefile*provide a list of known splice sites00
novelsplicesiteoutfilefile*report a list of splice sites00
rnastrandnessstring*Specify strand-specific information (unstranded)00
mainteger*match bonus (0 for --end-to-end, 2 for --local)00
mpstring*max and min penalties for mismatch; lower qual = lower penalty <6,2>00
spstring*max and min penalties for soft-clipping; lower qual = lower penalty <2,1>00
npinteger*penalty for non-A/C/G/Ts in read/ref (1)00
rdgstring*read gap open, extend penalties (5,3)00
rfgstring*reference gap open, extend penalties (5,3)00
scoreminstring*min acceptable alignment score w/r/t read length (L,0.0,-0.2)00
kinteger*report up to <int> alns per read; MAPQ not meaningful00
ainteger*report all alignments; very slow, MAPQ not meaningful00
unfile*write unpaired reads that didn't align to <path>00
alfile*write unpaired reads that aligned at least once to <path>00
unconcfile*write pairs that didn't align concordantly to <path>00
alconcfile*write pairs that aligned concordantly at least once to <path>00
metfilefile*send metrics to file at <path> (off)00
metinteger*report internal counters & metrics every <int> secs (1)00
rgidstring*set read group id, reflected in @RG line and RG:Z: opt field00
rgstring*add <text> (\"lab:value\") to @RG line of SAM header.00
offrateinteger*override offrate of index; must be >= index's offrate00
threadsinteger*number of alignment threads to launch (1)00
seedinteger*seed for random number generator (0)00
indexstring1Index filename prefix (minus trailing .X.ht2)00
paired1file*Files with #1 mates, paired with files in <m2>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).00
paired2file*Files with #2 mates, paired with files in <m1>. Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).00
outputfile1File for SAM output00
fastqbooleanfalse*query input files are FASTQ .fq/.fastq (default)00
qseqbooleanfalse*query input files are in Illumina's qseq format00
fastabooleanfalse*query input files are (multi-)FASTA .fa/.mfa00
rawbooleanfalse*query input files are raw one-sequence-per-line00
cbooleanfalse*paired1, paired2, unpaired are sequences themselves, not files00
phred33booleanfalse*qualities are Phred+33 (default)00
phred64booleanfalse*qualities are Phred+6400
intqualsbooleanfalse*qualities encoded as space-delimited integers00
ignorequalsbooleanfalse*treat all quality values as 30 on Phred scale (off)00
nofwbooleanfalse*do not align forward (original) version of read (off)00
norcbooleanfalse*do not align reverse-complement version of read (off)00
novelsplicesiteinfilebooleanfalse*provide a list of novel splice sites00
notempsplicesitebooleanfalse*disable the use of splice sites found00
nosplicedalignmentbooleanfalse*disable spliced alignment00
tmobooleanfalse*Reports only those alignments within known transcriptome00
dtabooleanfalse*Reports alignments tailored for transcript assemblers00
dtacufflinksbooleanfalse*Reports alignments tailored specifically for cufflinks00
frbooleanfalse*-1, -2 mates align fw/rev00
nomixedbooleanfalse*suppress unpaired alignments for paired reads00
nodiscordantbooleanfalse*suppress discordant alignments for paired reads00
tbooleanfalse*print wall-clock time taken by search phases00
quietbooleanfalse*print nothing to stderr except serious errors00
metstderrbooleanfalse*send metrics to stderr (off)00
noheadbooleanfalse*supppress header lines, i.e. lines starting with @00
nosqbooleanfalse*supppress @SQ header lines00
omitsecseqbooleanfalse*put '*' in SEQ and QUAL fields for secondary alignments.00
reorderbooleanfalse*force SAM output order to match order of input reads00
mmbooleanfalse*use memory-mapped I/O for index; many 'bowtie's can share00
qcfilterbooleanfalse*filter out reads that are bad according to QSEQ filter00
nondeterministicbooleanfalse*seed rand. gen. arbitrarily instead of using read attributes00
removechrnamebooleanfalse*remove 'chr' from reference names in alignment00
addchrnamebooleanfalse*add 'chr' to reference names in alignment00
rfbooleanfalse*-1, -2 mates align rev/fw00
ffbooleanfalse*-1, -2 mates align fw/fw00


Return values

name type description minV maxV
SAMFilestringoutput SAM file (= value for parameter output)00


Citation info

Sequencing reads were mapped using HISAT2 (version (%SOFTWARE_VERSION%)) [Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015 Apr;12(4):357-60].

Pubmed references: 25751142,

Sequencing
2022-03-23
STARgenomeGenerate

Generation of genome indices for STAR

STARgenomeGenerate

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

Generation of genome indices for STAR

Dependencies

  • STAR


Parameter

name type restrictions default occurrence description minV maxV
runThreadNinteger*[optional] int: number of threads to run STAR00
genomeDirfile1string: path to the directory where genome files will be generated00
genomeFastaFilesfile1-string(s): path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped.00
sjdbGTFfilefile*[optional] string: path to the GTF file with annotations00
sjdbOverhanginteger100*[optional] int&amp;gt;0: length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)00
sjdbGTFtagExonParentTranscriptstringtranscript_id*[optional] string: GTF attribute name for parent transcript ID (default &amp;quot;transcript_id&amp;quot; works for GTF files)00
sjdbFileChrStartEndfile0-null[optional] string(s): path to the files with genomic coordinates (chr &amp;lt;tab&amp;gt; start &amp;lt;tab&amp;gt; end &amp;lt;tab&amp;gt; strand) for the splice junction introns.00
genomeSAindexNbasesinteger*[optional] int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).00
genomeChrBinNbitsinteger*[optional] int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

STAR indices were created for the XXX genom using STAR (%SOFTWARE_VERSION%).

Pubmed references: 23104886,

Sequencing
2019-03-13
addSequence2Sam

sequences (and qualities) of FASTQ files can be added to SAM files

addSequence2Sam

by Michael Kluge - version 1
version {@VERSION_LINKS@}

sequences (and qualities) of FASTQ files can be added to SAM files

Dependencies

  • perl


Parameter

name type restrictions default occurrence description minV maxV
samfile pathabsolute1path to the SAM file00
fastqfile pathabsolute1-path to the FASTQ file(s)00
outputfile pathabsolute1path to the output SAM file in which the sequences are added00
unmappedfile pathabsolute*path to a FASTQ file in which the unmapped sequences will be written to; exclusive with --preread flag00
noqualitybooleanfalse*does not add the read quality values00
updatebooleanfalse*overrides already existing output files00
prereadbooleanfalse*does only index reads stored in the FASTQ file that are part of the SAM file; exclusive with --unmapped parameter00


Return values

name type description minV maxV
SAMFileWithSequencesstringabsolute path to the SAM file with added sequences and, if enabled, qualities00
UnmappedReadFilestringabsolute path to file containing all unmapped reads in FASTQ format00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2023-01-09
amss

computes AMSS per input window

amss

by Elena Weiß - version 1
version {@VERSION_LINKS@}

computes AMSS per input window

Dependencies

  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
inputregsfile1file with specified genomic regions to analyze00
bamsfile1path to bam files00
patternstring1pattern to grep for bam files00
strandnessstring1strandness of experiment00
outfile1output directory00
sampleAnnotationfile1file specifying two conditions00
pseudocountinteger*pseudocount to subtract from counts00
numrandomizationsinteger*number of randomizations00
everyPosstring*every position of read is counted00


Return values

name type description minV maxV
outstringoutput directory to computed AMSS00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2019-08-08
bam2wiggle

converts BAM files to WIG files

bam2wiggle

by Michael Kluge - version 1
version {@VERSION_LINKS@}

converts BAM files to WIG files

Dependencies

  • bedtools
  • GNU core utilities


Parameter

name type restrictions default occurrence description minV maxV
bamfileabsolute1path to the position-based-sorted BAM file00
outputfileabsolute1path to BEDGRAPH file00
contigSizesfileabsolute*file containing the sizes of the contigs used in the BAM file if ranges should be extended (format: <chrName><TAB><SIZE>)00


Return values

name type description minV maxV
wiggleFilestringabsolute path to the converted output file00


Citation info

Bedtools (%SOFTWARE_VERSION%) was used to convert BAM files to WIG files.

Pubmed references: 20110278,

SEQUENCING
2019-08-08
bamContigtDistribution

creates plots based on statistics of BAM files

bamContigtDistribution

by Michael Kluge - version 1
version {@VERSION_LINKS@}

creates plots based on statistics of BAM files

Dependencies

  • GNU R
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
bamMergedStatsfileabsolute1path to the merged bam stats file00
outputFilefileabsolute1path to a output pdf file00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

sequencing
2019-02-11
bamToBed

converts bam format into bed format

bamToBed

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

converts bam format into bed format

Dependencies

  • python3
  • bedtools (version 2.x)


Parameter

name type restrictions default occurrence description minV maxV
inBamstringvalid file path, bam format-1-Path to the bam file that will be converted into bed format. An index of the bam file is not required.00
outBedstring-1-Path for saving the resulting bed file.00
bedtoolsPathstringvalid file path to executablebedtools-1-Path to the bedtools executable. Per default, it is assumed that bedtools is in the PATH variable.00
splitbooleantrue-1-Defines how split alignments (cigar string that contains N) are handled. If true, the skipped region is not included in the bed regions. If false, the skipped region is included in the bed region, i.e. there is only one interval from alignment start to alignment end.00


Return values

name type description minV maxV
bedFilestringpath to the bed file that is created (same value as outBed parameter)00


Citation info

Bed files were created from the bam files using bedtools bamtobed.

Pubmed references: 20110278,

sequencing
2019-02-11
bamToBigWig

converts bam format into bigwig format

bamToBigWig

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

converts bam format into bigwig format

Dependencies

  • python3
  • deeptools >=2.0


Parameter

name type restrictions default occurrence description minV maxV
inBamstringvalid file path, file ending .bam, indexed-1-Path to the bam file that will be converted into bigWig format. The bam file has to be indexed.00
outBwstring-1-Path for saving the resulting bigWig file.00
bamCoveragePathstringvalid file pathbamCoverage-1-Path to the executable bamCoverage which is part of deepTools. Per default, it is assumed that bamCoverage is in the PATH variable.00
binSizeintegerpositive, not zero1-1-Resolution of the bigWig file. Increasing the binSize causes loss of information but decreases the size of the bigWig file. Highest resolution (at single basepair level) is achieved for binSize=1 (default).00
numberOfProcessorsintegerpositive, not zero1-1-Number of processors to use (parallelization)00


Return values

name type description minV maxV
bigWigFilestringPath to the output bigWig file00


Citation info

BigWig files were created from the bam files using the tool bamCoverage from the deepTools tool suite.

Pubmed references: 27079975,

Sequencing
2019-03-15
bamstats

creates various statistics on BAM files using RSeQC and samtools, which can be used for quality assessment

bamstats

by Michael Kluge - version 1
version {@VERSION_LINKS@}

creates various statistics on BAM files using RSeQC and samtools, which can be used for quality assessment

Dependencies

  • python3
  • rseqc
  • samtools
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
bamfile pathabsolute1-path to one or more BAM file(s)00
outdirfolder pathabsolute1path to the output folder; individual files will be stored in a sub-folder (using the basename of the BAM file as folder name)00
readLengthinteger1maximal length of the reads00
sampleDepthinteger100000*number of reads which are used for sampling00
annotationfile pathabsolute*gene annotation in BED format00
geneBodyAnnotationfile pathabsolute*genes that are used to calculate the gene body coverage; should contain house keeping genes00
idxstatsbooleantrue*enables calculation of number of reads mapped on each chromosome00
flagstatbooleantrue*enables calculation of flags of mapped reads00
countbooleanfalse*enables calculation of raw and rpkm count table for exons, introns and mRNAs00
saturationbooleantrue*enables down-sampling of the mapped reads to infer the sequencing depth00
statisticsbooleantrue*calculates reads mapping statistics00
clippingbooleantrue*enables clipping statistic of the mapped reads00
insertionbooleantrue*enables insertion statistic of the mapped reads00
deletionbooleantrue*enables deletion statistic of the mapped reads00
inferExperimentbooleantrue*tries to infer if the sequencing was strand specific or not00
junctionAnnotationbooleantrue*enables checking of how many of the splice junctions are novel or annotated00
junctionSaturationbooleantrue*enables down-sampling of the spliced reads to infer if sequencing depth is enough for splicing analyses00
distributionbooleantrue*calculates how mapped reads are distributed among different genomic features00
duplicationbooleantrue*calculates sequence duplication levels00
gcbooleantrue*calculates GC-content of the mapped reads00
nvcbooleantrue*checks if a nucleotide composition bias exist00
insertSizebooleantrue*calculates the insert size between two paired RNA reads00
fragmentSizebooleantrue*calculates the fragment size for each transcript00
tinbooleantrue*calculates the transcript integrity number which is similar to the RNA integrity number00
pairedbooleanfalse*must be set if paired-end data is analyzed00
strandedbooleanfalse*must be set if strand-specific data is analyzed00
disableAllDefaultbooleanfalse*disables all options which are not explicitly activated00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Quality of the resulting mappings was assessed using RSeQC [Liguo Wang, Shengqin Wang, Wei Li; RSeQC: quality control of RNA-seq experiments, Bioinformatics, Volume 28, Issue 16, 15 August 2012, Pages 2184–2185].

Pubmed references: 22743226,

Sequencing
2019-03-15
bedgraphReplicateMerger

combines expression of biological or technical replicates; all replicates are scaled to the same number of reads and averaged afterwards

bedgraphReplicateMerger

by Michael Kluge - version 1
version {@VERSION_LINKS@}

combines expression of biological or technical replicates; all replicates are scaled to the same number of reads and averaged afterwards

Dependencies

  • python3


Parameter

name type restrictions default occurrence description minV maxV
bedgraphFilesfile pathabsolute2-path to sorted BEDGRAPH files; at least two files must be given; all files must contain the same chromosomes in the same order00
outputFilefile pathabsolute1path to the output file00
mergedIdxstatsFilefile pathabsolute1path to a tab-separated file that contains the output generated by samtools idxstats for all samples (columns 1-4) and in the 5th column the sample name; used columns: 1 -> chr name; 3 -> number of mapped reads; 5 -> name of the sample00
notSkipHeadbooleanfalse*disables the skipping of the first line of the idxstats file (--mergedIdxstatsFile); default: first line is skipped00
numberOfDigitsinteger5*number of decimal places to round the calculated values00
normByReadCountinteger1000000*number of reads to which each replicate is normed (based on the idxstats output) before values are averaged00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2019-08-13
bedgraphShrinker

shrinks regions with the same score to one region in a bedgraph file or expands the file to a region size of one basepair

bedgraphShrinker

by Michael Kluge - version 1
version {@VERSION_LINKS@}

shrinks regions with the same score to one region in a bedgraph file or expands the file to a region size of one basepair

Dependencies

  • python3


Parameter

name type restrictions default occurrence description minV maxV
bedgraphFilefileabsolute1path to a sorted, not overlapping BEDGRAPH file00
outputFilefileabsolute1path to the output file00
genomeSizefileabsolute*path to file containing the size of the contigs00
expandbooleanfalse*expand the ranges instead of shrinking them00
addZeroRangesbooleanfalse*adds ranges that are missing with a zero value00
omitZeroRangesbooleanfalse*suppress the output of ranges with a zero value00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2019-08-08
binGenome

partitions regions into a fixed number of bins and calculates coverage in that bin

binGenome

by Michael Kluge - version 1
version {@VERSION_LINKS@}

partitions regions into a fixed number of bins and calculates coverage in that bin

Dependencies

  • java


Parameter

name type restrictions default occurrence description minV maxV
bedgraphfileabsolute0-nullbedgraph or bigwig file(s)00
bedgraphPosfileabsolute0-nullbedgraph or bigwig file(s) for positive strand00
bedgraphNegfileabsolute0-nullbedgraph or bigwig file(s) for negative strand00
annotationfileabsolute1-region annotation file(s); (see writeGRangesToBed() in R/binGenome.lib.R for format info)00
bedgraphNamesstring0-nullsample names for generation of output filenames00
annotationNamesstring0-nullannotation names for generation of output filenames00
binsinteger>00-nullnumber of bins to partition each region00
quantilesinteger[0-100]0-nulldetermines the position at which expression exceeds specific quantiles in percent00
outputDirfileabsolute1path to output folder; files will be named automatically based on the used parameters00
coresstring>01*number of cores to use in parallel00
normalizebooleantrue0-nullwrite in addition a per-gene normalized version of the data00
fixedBinSizeUpstreamstring*creates bins with a fixed size upstream of the region; format: 'binsize:binnumber'00
fixedBinSizeDownstreamstring*creates bins with a fixed size downstream of the region; format: 'binsize:binnumber'00
tmpDirstring*path to tmp folder00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Each region was binned into a fixed number of bins (x/x/x), and average coverage for each bin was calculated for each transcript in each sample.

Pubmed references:

Links

{@LINK_LIST@}

Sequencing
2019-03-13
bowtie2Docker

technical demo which shows how docker containers can be used in combination with Watchdog; basic bowtie mapper

bowtie2Docker

by Michael Kluge - version 1
version {@VERSION_LINKS@}

technical demo which shows how docker containers can be used in combination with Watchdog; basic bowtie mapper

Dependencies

  • docker


Parameter

name type restrictions default occurrence description minV maxV
genomefile pathabsolute1path to indexed reference genome (withouth trailing .X.bt2 ending)00
readsfile pathabsolute*path to reads in FASTQ format; for mapping of paired-end data, two files are required00
outfilefile pathabsolute1path to output file, which is written in SAM format; a log file with .log suffix will also be written00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

ChIP-seq
2019-02-11
bwaAln

maps reads with bwa aln

bwaAln

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

maps reads with bwa aln

Dependencies

  • python3
  • bwa


Parameter

name type restrictions default occurrence description minV maxV
inReadsfilefile exists, fastq format, ending .fq or .fastq1fastq file with the sequenced reads00
bwaIndexstring1Common prefix of bwa index files for the reference genome00
outSaistring1file for writing mapped reads in bwa format00
bwaPathfilefile exists, bwa executablebwa*path to BWA executable (default: use executable from PATH)00
threadsinteger>01*number of threads to use for bwa aln (-t option of bwa)00
stopIfMoreThanBestHitsinteger>0*stop searching when there are more than that many best hits (default: use bwa default)00


Return values

name type description minV maxV
bwaSaiFilestring*.sai file created by the module (same value as given by the parameter outSai)00


Citation info

We mapped the reads with bwa aln.

Pubmed references: 19451168,

ChIP-seq
2019-02-11
bwaSampe

creates a sam file with bwa sampe from mappings of bwa aln for paired reads

bwaSampe

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

creates a sam file with bwa sampe from mappings of bwa aln for paired reads

Dependencies

  • python3
  • bwa


Parameter

name type restrictions default occurrence description minV maxV
inReads1filefile exists, fastq format, ending .fq or .fastq1uncompressed fastq (.fq, .fastq) file with the sequenced reads00
inReads2filefile exists, fastq format, ending .fq or .fastq1uncompressed fastq (.fq, .fastq) file with the sequenced reads (mates)00
inSai1filefile exists1output of bwa aln for the file given by inReads100
inSai2filefile exists1output of bwa aln for the file given by inReads200
bwaIndexstring1Common prefix of bwa index files for the reference genome00
outSamstring1file for writing mapped reads in sam format00
bwaPathfilefile exists, bwa executablebwa*path to BWA executable (default: use executable from PATH)00
indexInRambooleanFalse*option to load complete index into main memory (default: false)00


Return values

name type description minV maxV
bwaPairedSamFilestringsam file created by the module (same value as given by the parameter outSam)00


Citation info

We created a sam file with bwa sampe.

Pubmed references: 19451168,

sequencing
2023-03-23
calcDOCRs

Calculates dOCR lengths for genes from open chromatin regions in BED format.

calcDOCRs

by Katharina Reinisch - version 1
version {@VERSION_LINKS@}

Calculates dOCR lengths for genes from open chromatin regions in BED format.

Dependencies

  • java
  • picard (jar included with module)
  • Apache Commons CLI library (jar included with module)


Parameter

name type restrictions default occurrence description minV maxV
inputfile1input file (in BED format)00
namestring1sample name used for output files00
outputstring1output directory00
annotationfile1genome annotation file (in GTF format)00
d1integer10000*[optional] maximum distance of OCR to gene end for this OCR to be added to this gene in the first step 00
d2integer5000*[optional] maximum distance of OCR to last added OCR for a gene for this OCR to be added in the second step00
genestring*[optional] get total length of OCRs within gene (in_gene_length) and fraction of gene body covered by OCRs, default false00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

dOCR lengths were calculated as previously described in Hennig T et al, 2028, PLOS Pathogens 14(3): e1006954

Pubmed references: 29579120,

Links

{@LINK_LIST@}

sequencing
2023-03-23
calcDownsampleRate

Calculates the downsampling rate for each sample, such all samples will have approximately the same number of reads after downsampling with this rate.

calcDownsampleRate

by Katharina Reinisch - version 1
version {@VERSION_LINKS@}

Calculates the downsampling rate for each sample, such all samples will have approximately the same number of reads after downsampling with this rate.

Dependencies

  • Python 3


Parameter

name type restrictions default occurrence description minV maxV
idxstatsfile1idxstats file00
excludestring*[optional] chromosomes to be excluded, comma separated00
samplesstring*[optional] samples to be used, comma separated00
outputstring1output table file00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Downsampling rates were determined such that all included samples will have approximately the same number of reads after downsampling with this rate.

Pubmed references:

Links

{@LINK_LIST@}

General
2018-10-31
checksum

creates a md5 checksum of a file or verifies file integrity based on a md5 checksum using md5sum

checksum

by Michael Kluge - version 1
version {@VERSION_LINKS@}

creates a md5 checksum of a file or verifies file integrity based on a md5 checksum using md5sum

Dependencies

  • GNU md5sum
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
inputfile pathabsolute1absolute path to file for which a checksum should be calculated or which should be verified00
oldChecksumNamefile pathabsolute*absolute path to a (non-existent) file used to identify the correct checksum line for cases in which the file was renamed or moved after checksum creation; can only be used in verify mode00
checksumfile pathabsolute.checksum.md5*absolute path to the checksum file; by default '.checksum.md5' located in the same directory as the input file00
verifybooleanfalse*flag to verify integrity of a file based on the checksum file00
updatebooleanfalse*flag to update an already existing checksum in the checksum file00
absolutePathbooleanfalse*flag to store an absolute path in the checksum file instead of a relative one00
ignorePathbooleanfalse*flag to use only the name of the file for identification of the corresponding checksum line (ignores the location of the file); can only be used in verify mode00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

circRNA
2019-02-11
circCombination

combines the predictions of circularRNAs made with the modules for CIRI2 and circRNA_finder.

circCombination

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

combines the predictions of circularRNAs made with the modules for CIRI2 and circRNA_finder.

Dependencies

  • Python3


Parameter

name type restrictions default occurrence description minV maxV
inCircs1filefile exists1First prediction file with circRNAs and junction reads (tab-separated, 5 columns: chromosome, start, end, strand, list of reads)00
inCircs2filefile exists1Second prediction file with circRNAs and junction reads (tab-separated, 5 columns: chromosome, start, end, strand, list of reads)00
outUnionfile1Output path for the union of the predictions (coordinates and reads)00
outIntersectionfile1Output path for the intersection of the predictions (coordinates and reads00
outIntersectedUnionfile1Output path for the intersected union of the predictions (intersection of coordinates, union of reads)00
minReadsinteger>=12* Minimum number of predicted junction reads required for writing a circRNA into the output files. The cutoff is applied independently to the intersection, union and intersected union of the predictions. 00


Return values

name type description minV maxV
circUnionfileOutput path for the union of the predictions (same as input parameter outUnion)00
circIntersectionfileOutput path for the intersection of the predictions (same as input parameter outIntersection)00
circIntersectedUnionfileOutput path for the intersected union of the predictions (same as input parameter outIntersectedUnion)00


Citation info

Predictions of circular RNAs were combined by forming the union/intersection of the individual predictions. The circular reads were combined by forming the union/intersection of the predictions.

Pubmed references:

Links

{@LINK_LIST@}

circRNA
2019-02-11
circRNAfinder

runs circRNA_finder to detect circular RNAs in single-end or paired-end sequencing data.

circRNAfinder

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

runs circRNA_finder to detect circular RNAs in single-end or paired-end sequencing data.

Dependencies

  • python3
  • perl
  • awk
  • samtools


Parameter

name type restrictions default occurrence description minV maxV
inReads1filefile exists, fastq format*path to single-end fastq file or path to first fastq file with paired reads00
inReads2filefile exists, fastq format*path to second fastq file with paired reads (paired-end data only)00
strandedLibraryintegerallowed values: 0,1,20*indicates if the library is strand specific, 0 = unstranded/unknown, 1 = stranded (first read), 2 = stranded (second read), (default: 0),if the library type is unstranded/unknown the strand is guessed from the strand of the AG-GT splice site00
referencefilefile exists, fasta format*path to (multi-)fasta file with the reference genome (not required if STAR index or a STAR results is provided)00
inSTARstring*output prefix of a STAR mapping that was created with STAR run with chimeric segment detection00
outPrefixstring1path and file name prefix for all files produced by this module; the final file is named out/prefixcfCirc.txt00
outCircstring*final output of predicted CircRNAs (can be used to save the final prediction in a different place than given in outPrefix)00
starPathfileSTAR*specify a path to the STAR executable if STAR is not part of your PATH variable00
starIndexfile*STAR index for the reference genome, if no index is provided it is automatically created by the module using the file given by --reference00
starThreadsinteger>=11*number of threads to use with STAR00
cfPathfilepostProcessStarAlignment.pl*path to circRNA_finder perl script postProcessStarAlignment.pl00


Return values

name type description minV maxV
cfCircsstringpath to file with predicted circRNAs, it corresponds to the value of the parameter outCirc if it is set, otherwise the file path is derived from outPrefix00


Citation info

We predicted circular RNAs using circRNA_finder.

Pubmed references: 25544350,

circRNA
2019-02-11
ciri2

runs CIRI2 to detect circular RNAs in single-end or paired-end sequencing data.

ciri2

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

runs CIRI2 to detect circular RNAs in single-end or paired-end sequencing data.

Dependencies

  • python3
  • perl


Parameter

name type restrictions default occurrence description minV maxV
inReads1filefile exists, fastq format*path to first fastq file with reads (for single-end or paired-end data first reads)00
inReads2filefile exists, fastq format*path to second fastq file with reads (for paired-end data second reads only)00
inSAMfilefile exists, SAM format*path to SAM file that was created with BWA Mem (can be used as input instead of fastq files)00
referencefilefile exists, fasta format1path to (multi-)fasta file with the reference genome00
outPrefixstring1path and file name prefix for all files produced by this module, the final file is named out/prefixciriCirc.txt00
outCircstring*final output of predicted CircRNAs (can be used to save the final prediction in a different place than given in outPrefix)00
bwaPathfilebwa*specify a path to the BWA executable if bwa is not part of your PATH variable00
bwaThreadsinteger>=11*number of threads to use with BWA, default:100
bwaIndexstringvalid bwa index*BWA index for the reference genome provided by the --reference option, if no index is provided it is automatically created by the module00
bwaSeedSizeinteger>=119*BWA -k parameter for the minimum seed length00
bwaScoreThresholdinteger>=130*BWA -T parameter for the minimum alignment score; default is 30, but 19 recommended for CIRI200
ciriPathfileCIRI2.pl*path to CIRI2 perl script00
ciriThreadsinteger>=11*number of threads to use for CIRI200
ciriAnnotationfilefile exists, GTF format*GTF file with gene annotations for the genome given in the --reference option, if a GTF file is passed to this module, CIRI annotates all circRNAs with the corresponding gene00
ciriStringencystring3 allowed values: high, medium or lowhigh*Controls how stringent CIRI2 filters the circRNAs based on circular reads, cigar strings and false positive reads00
ciriKeepTmpFilesbooleanFalse*if this flag is set, CIRI2 does not delete the temporary files at the end00


Return values

name type description minV maxV
ciriCircsstringpath to file with predicted circRNAs, it corresponds to the value of the parameter outCirc if it is set, otherwise the file path is derived from outPrefix00


Citation info

We predicted circular RNAs using CIRI2.

Pubmed references: 28334140,

General
2022-08-24
classifyPeaks

classifies peaks

classifyPeaks

by Elena Weiß - version 1
version {@VERSION_LINKS@}

classifies peaks

Dependencies

  • DEPENDENCY [0-]


Parameter

name type restrictions default occurrence description minV maxV
outdirfile1output directory00
genelistfile1list of genes to consider00
coveragefilesfile1path to coverage files00
expstring1type of experiment00


Return values

name type description minV maxV
classifyPeaksOutputFolderstringoutput directory00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2022-08-24
clustering

clusters coverage files and creates heatmap

clustering

by Elena Weiß - version 1
version {@VERSION_LINKS@}

clusters coverage files and creates heatmap

Dependencies

  • binGenome
  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
bedgraphTablefile1path to bedgraph table00
clusterinteger1number of clusters00
factorstring0-nullfactor to consider00
coverageFilesfile1path to coveragefiles00
bednamestring1name of bed file00
aggregateFUNstring1function to aggregate00
normShapeSumboolean1how to norm shape00
normLibSizeboolean1how to norm lib isze00
normBinLengthboolean1how to norm bin length00
binsinteger1number of bins00
cpmfile1path to cpm file00
plotnamestring*name of plot00


Return values

name type description minV maxV
coverageFilesstringpath to coveragefiles00
bednamestringname of bed file00
clusterfilesstringpath to cluster files00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

File Utils
2019-02-11
concatenateFiles

concatenates 2 or more files.

concatenateFiles

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

concatenates 2 or more files.

Dependencies

  • Python3


Parameter

name type restrictions default occurrence description minV maxV
inFilefilefile exists*input files given in the order of concatenation, files with ending .gz are interpreted as compressed files and are extracted00
outFilefile1path to save the concatenated files00


Return values

name type description minV maxV
concatenatedFilestringpath of the concatenated file, this is the same value as given by the parameter outFile00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2019-08-08
contextMap

context-based RNA-seq read mapping

contextMap

by Michael Kluge - version 1
version {@VERSION_LINKS@}

context-based RNA-seq read mapping

Dependencies

  • java
  • bwa / bowtie / bowtie2
  • GNU core utilities


Parameter

name type restrictions default occurrence description minV maxV
jarPathfileabsolute*path to ContextMap jar file; if not given internal version will be used00
readsfileabsolute*path to reads in fasta or fastq format00
alignerNameenum1name of short-read alignment tool; supported values: 'bwa', 'bowtie1' or 'bowtie2'00
alignerBinfileabsolute*path to the executable of the chosen aligner tool00
indexerBinfileabsolute*path to the executable of the aligner's indexing tool (not needed for BWA)00
indicesfileabsolute1-comma separated list of paths to basenames of indices, which can be used by the chosen aligner00
genomefileabsolute1path to a directory with genome sequences in fasta format (each chromosome in a separate file)00
outputfileabsolute1path to the output directory00
skipsplitstring*comma separated list of booleans, each element refers to a given aligner index (same ordering); 'true' for no split detection, 'false' otherwise (req. in mining mode).00
skipmultisplitstring*comma separated list of booleans, each element refers to a given aligner index (same ordering); 'true' for no multisplit detection, 'false' otherwise (req. in mining mode).00
speciesindexstring*path to a directory containing index files created with the 'indexer' tool (req. in mining mode)00
alignerTmpstring*path to a directory for temporary alignment files00
seedinteger>0*seed length for the alignment (default: Bwt1: 30, BWA/Bwt2: 20)00
splitseedsizesinteger>015*seed size for the split search seed (default: 15)00
mismatchesinteger>=04*allowed mismatches in the whole read00
seedmismatchesinteger>=0*allowed mismatches in the seed region (default: Bwt1: 1, BWA/Bwt2: 0)00
splitseedmismatchesinteger>=00*allowed mismatches for the split seed search (default: 0)00
mmdiffinteger>=10*maximum allowed mismatch difference between the best and second best alignment of the same read00
maxhitsinteger>=1*maximum number of candidate alignments per read; reads with more hits are skipped (bwa/bwt1) or the already found hits are reported (bwt2) (default for bwa/bwt1:10, bwt2: 3)00
minsizeinteger>=110*minimum number of reads a genomic region has to contain for being regarded as a local context00
maxindelsizeinteger>=010*maximum allowed size of insertions or deletions (default: 10)00
gtffileabsolute*path to an annotation file in gtf format00
threadsstring*number of threads used for mapping00
localTmpFolderfolderabsolute/usr/local/storage/*path to a local storage that is used for temporary data00
miningbooleanfalse*enables the mining for infections or contaminations00
noclippingbooleanfalse*disables the calculation of clipped alignments00
noncanonicaljunctionsbooleanfalse*enables the prediction of non-canonical splice sites00
strandspecificbooleanfalse*enables strand specific mapping00
pairedendbooleanfalse*enables mapping of paired-end reads; nomenclature for mates from the same fragment: base_name/1 and base_name/2, respectively; only valid for versions smaller than 2.7.200
polyAbooleanfalse*enables the search for polyA-tails (mutually exclusive with --noclipping)00
verbosebooleanfalse*verbose mode00
keeptmpbooleanfalse*does not delete some temporary files00
sequenceDBbooleanfalse*sequence mapping to disk; recommended for very large data sets.00
memoryScaleFactorinteger[0,100]75*scale factor in percent that defines the proportion of the memory that is used for java; default memory: 3GB*threads*(scaleFactor/100)00
memoryPerThreadinteger3072*total memory per thread in MB if running on local host; otherwise memory limit of Watchdog executor might be set; default: 307200


Return values

name type description minV maxV
contextMapSAMFilestringpath to mapped SAM file00
contextMapPolyAFilestringpath to detected polyA tails00


Citation info

RNA-seq reads were mapped against the XXX genome using ContextMap (%SOFTWARE_VERSION%) with BWA as short read aligner and default parameters.

Pubmed references: 25928589,

File Utils
2019-02-11
copyFile

copies a given file to a new location.

copyFile

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

copies a given file to a new location.

Dependencies

  • Python3


Parameter

name type restrictions default occurrence description minV maxV
sourcePathfilefile exists1path of the file to copy00
targetPathfile1path of the new location of the file, all non-existing parent directories of the file are created00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

General
2022-08-24
createBEDandSAF

creates bed and saf files given a tss file

createBEDandSAF

by Elena Weiß - version 1
version {@VERSION_LINKS@}

creates bed and saf files given a tss file

Dependencies

  • java


Parameter

name type restrictions default occurrence description minV maxV
gtffile1path to gtf file00
tssfile1path to tss file00
outdirfile1path to output dir00
namestring1name00
infoboolean*if info should be written00
bedboolean*if bed file should be written00
safboolean*if saf file should be written00
bedwindowboolean*if bedwindow should be written for scaled metagenes00
antisenseboolean*if experiment is antisense00
filterDistinteger*if distance to annotated tss should be limited00
noMappingboolean*if mapping should be avoided00
minDistboolean*minimum distance00
genelistfile*list of genes00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

File Utils
2019-02-11
createFolder

creates a folder and its parent directories.

createFolder

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

creates a folder and its parent directories.

Dependencies

  • Python3


Parameter

name type restrictions default occurrence description minV maxV
folderPathfolder1folder that will be created00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

Sequencing
2019-03-13
cutadapt

sequence adapters can be removed and sequences can be trimmed based on length or base-call quality scores

cutadapt

by Michael Kluge - version 1
version {@VERSION_LINKS@}

sequence adapters can be removed and sequences can be trimmed based on length or base-call quality scores

Dependencies

  • cutadapt
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
fastqfile pathabsolute1path to one FASTQ file00
prim3string*adapter that was ligated at the 3' end; '$' at the end will cause that the adapter is anchored at the end of the read00
prim5string*adapter that was ligated at the 5' end; '^' at the start will cause that the adapter is anchored at the beginning of the read00
adapterstring*adapter that can be located at the 3' and 5' end00
errorRatedouble[0, 1]0.05*maximum allowed error rate00
repeatinteger[1, 100]1*try to remove adapters at most N times00
minOverlapinteger>06*minimum overlap length00
minLengthinteger[1, 100000]40*minimum read length after trimming00
maxLengthinteger[1, 100000]-1*maximum read length after trimming00
outfilefile pathabsolute1path to an output file00
infofilefile pathabsolute*path to a file which will contain trimming statistics00
shortenReadsinteger0*shorten reads to a maximal length after trimming; positive values keep the beginning of reads; negative ones the ends (starting from cutadapt version 1.17)00
cutFixedLengthinteger[-1000000, 1000000]0*trimmes a fixed length from the beginning (positive numbers) or the end of the reads (negative numbers)00
qualityCutoffdouble0*trimmes reads at the ends using a sliding window approach00
qualityBaseinteger33*base quality value00
noIndelsbooleanfalse*does not allow indels between read and adapter00
discardTrimmedbooleanfalse*discard sequences which were trimmed00
discardUntrimmedbooleanfalse*discard sequences which were not trimmed00
maskAdaptersbooleanfalse*does not cut the adapters but replace the corresponding regions with N00


Return values

name type description minV maxV
cutadaptTrimFilestringabsolute path to the trimmed output file00
cutadaptInfoFilestringabsolute path to a file containing statistical values00


Citation info

Cutadapt (%SOFTWARE_VERSION%) was used to remove adapters and trim sequences [Martin, Marcel. "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet.journal [Online], 17.1 (2011): pp. 10-12. Web. 14 Mar. 2019].

Pubmed references:

File Utils
2019-02-11
deleteFolder

deletes a folder and all its content.

deleteFolder

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

deletes a folder and all its content.

Dependencies

  • Python3


Parameter

name type restrictions default occurrence description minV maxV
folderfolderpath to existing folder1path to the folder that will be deleted00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

sequencing
2023-03-23
downsampleSam

Performs downsampling of reads

downsampleSam

by Katharina Reinisch - version 1
version {@VERSION_LINKS@}

Performs downsampling of reads

Dependencies

  • Java
  • Picard


Parameter

name type restrictions default occurrence description minV maxV
inputfile1input file00
probabilitydouble1probability of keeping a read (pair)00
outputstring1output file00
pathToPicardstring1path to picard jar-file00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Downsampling of reads was performed with the DownsampleSam command line tool of the Picard library.

Pubmed references:

General
2018-10-31
env

prints the currently set environment variables to the standard output stream

env

by Michael Kluge - version 1
version {@VERSION_LINKS@}

prints the currently set environment variables to the standard output stream

Dependencies

  • GNU env


Parameter

{@PARAMETER@}
name type restrictions default occurrence description minV maxV


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Sequencing
2018-10-31
fastQC

generates quality reports for sequencing data using fastQC

fastQC

by Michael Kluge - version 1
version {@VERSION_LINKS@}

generates quality reports for sequencing data using fastQC

Dependencies

  • fastQC (tested with 0.11.3)
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
contaminantsfile pathabsolute*absolute path to a file containing non-default contaminants to screen for overrepresented sequences; format: name[TAB]sequence00
adaptersfile pathabsolute*absolute path to a file containing non-default adapters to screen against the library; format: name[TAB]sequence00
threadsinteger[1,128]1*number of threads to use; each will consume about 256 megabyte of memory00
fastqfile pathabsolute1absolute path to fastq file which should be analyzed00
limitsfile pathabsolute*absolute path to a file containing non-default limits for warnings/errors; must be in the same format as the limits.txt shipped with fastQC00
outdirfolder pathabsolute1absolute path to output folder00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Quality of the sequencing data was checked using FastQC (%SOFTWARE_VERSION%) [Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc].

Pubmed references:

sequencing
2019-10-16
fastqDump

Downloads fastq files from the NCBI Sequence Read Archive (SRA) using the SRA toolkit. First performs prefetch and then fastq-dump. Can optionally use Aspera client ascp for much faster download (Aspera client should be installed).

fastqDump

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

Downloads fastq files from the NCBI Sequence Read Archive (SRA) using the SRA toolkit. First performs prefetch and then fastq-dump. Can optionally use Aspera client ascp for much faster download (Aspera client should be installed).

Dependencies

  • SRA toolkit


Parameter

name type restrictions default occurrence description minV maxV
sraIdstring1SRA id00
outputFolderfile1folder to which fastq files should be extracted00
pathToAsperafile*[optional] path to Aspera client to use Aspera to speedup download00
checkPresentbooleanfalse*[optional] check if files already present in output folder and download previously successful. Tests if output fastq files exist, the log file from a previous download is present, fastq files are created not later than the lof file and the log files shows a succesful download. 00


Return values

name type description minV maxV
isPairedEndbooleanIndicates whether paired-end (two fastq files) or single-end (one fastq file) sequencing data was downloaded00
readFile1stringpath to first fastq file00
readFile2stringpath to second fastq file (identical to first fastq file for single-end sequencing data)00


Citation info

Sequencing data was downloaded from the NCBI Sequence Read Archive (SRA) using the SRA toolkit (version (%SOFTWARE_VERSION%)) [Leinonen R, Sugawara H, Shumway M; International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21.]

Pubmed references: 21062823,

Sequencing
2019-03-13
featureCounts

reads or fragments per gene, exon or any other feature are counted using featureCounts

featureCounts

by Michael Kluge - version 2
version 1 2

reads or fragments per gene, exon or any other feature are counted using featureCounts

Dependencies

  • featureCounts (v. 1.4.6)
  • featureCounts (v. 1.6.1)
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
annotationfile pathabsolute1feature annotation in GTF or SAF format00
inputfile pathabsolute1indexed BAM file which should be used for counting00
outputfile pathabsolute1path to output file00
annotationTypeenumSAF|GTF* disables automatic type detection based on the file ending of the input file; valid values: GTF or SAF;00
featureTypestringexon*feature type (e.g. exon or intron) which is used for counting in GTF mode00
groupTypestringgene_id*attribute which is used for summarization in GTF mode00
strandedinteger0*indicates strand-specific read counting; possible values: 0 (unstranded), 1 (stranded) and 2 (reversely stranded)00
threadsinteger1*number of threads used for counting00
disableGroupSummarizationbooleanfalse*flag that can be used to turn summarization on groupType off00
multiMappingbooleanfalse*flag that enables counting of multi mapped reads00
primarybooleantrue*when enabled only alignments which are flagged as primary alignments are counted00
countFragmentsbooleanfalse*counts fragments instead of reads; only for paired end data00
multiCountMetaFeaturesbooleanfalse*allows a read to be counted for more than one meta-feature00
detailedReadAssignmentsbooleanfalse*saves for each read if it was assigned or not; filename: {input_file_name}.featureCounts; format: read name<TAB>status<TAB>feature name<TAB>number of counts for that read00
minOverlapinteger1*minimum number of overlapping bases required to assign a read to a feature; also negative values are allowed22
minReadOverlapinteger1*minimum number of overlapping bases required to assign a read to a feature; also negative values are allowed11
minFracOverlapdouble0*assign reads to the meta-feature/feature which has the largest number of overlapping bases22
readExtension5integer0*extend reads at the 5' end22
readExtension3integer0*extend reads at the 3' end22
fractionbooleanfalse*count fractional; only in combination with the --assignToAllOverlappingFeatures or/and --multiMapping flag(s)22
largestOverlapbooleanfalse*assign reads to the meta-feature/feature that has the largest number of overlapping bases.22
longReadsbooleanfalse*mode for long read counting (e.g. Nanopore or PacBio)22


Return values

name type description minV maxV
FeatureCountSummaryFilestringabsolute file path to the summary file00
FeatureCountCountFilestringabsolute file path to the count file00


Citation info

FeatureCounts (%SOFTWARE_VERSION%) was applied to count read/fragment counts per gene/exon/other feature according to %annotation§N% annotation [Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014].

Pubmed references: 24227677,

ChIP-seq
2019-02-11
filterBwaSampe

removes read pairs from sam/bam files created by bwa sampe

filterBwaSampe

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

removes read pairs from sam/bam files created by bwa sampe

Dependencies

  • python3
  • pysam package


Parameter

name type restrictions default occurrence description minV maxV
inSamBamfilefile exsists, ending .sam or .bam1path to mapped paired reads in sam or bam format (recognized by file ending) created by bwa sampe00
outSamBamstring1path to write remaining paired reads in sam or bam format (recognized by file ending)00
removeUnmappedbooleanTrue*use this flag to remove pairs with at least one unmapped read00
removeImproperPairsbooleanTrue*use this flag to remove pairs that are not properly paired according to bwa sampe00
removeMapqBelowinteger>=020*remove all read pairs with at least one mate of mapping quality smaller than minQuality (taken from field "MAPQ" in SAM file), setting the option to 0 deactivates filtering based on mapping quality00
removeMoreThanOptimalHitsinteger>=01*remove all read pairs with more than maxHits optimal alignment positions for at least one mate (based bwa aln specific tag "X0"), setting the option to 0 deactivates filtering based on hit number00
isSingleEndbooleanFalse*use this flag to indicate that single end data should be filtered00


Return values

name type description minV maxV
filteredPairsfilepath of the sam/bam file with the remaining read pairs (same value as given in parameter outSamBam)00


Citation info

We removed read pairs with unmapped reads/ improper pair classification/ low mapping quality/ multi-mappings (adjust to options used)

Pubmed references:

Links

{@LINK_LIST@}

sequencing
2023-03-23
fseq

Identifies open chromatin regions from BAM files using F-Seq. For this purpose, BAM files are first converted to BED input format for F-Seq using bedtools.

fseq

by Katharina Reinisch - version 1
version {@VERSION_LINKS@}

Identifies open chromatin regions from BAM files using F-Seq. For this purpose, BAM files are first converted to BED input format for F-Seq using bedtools.

Dependencies

  • F-Seq
  • bedtools


Parameter

name type restrictions default occurrence description minV maxV
bamfile1bam file00
namestring1sample name used in output files00
dirfile1output directory00
pathToFseqstring1path to Fseq jar00
mergeDistinteger0*[optional] distance for merging00
heapSizeinteger-Xmx32000M*[optional] adjust JAVA OPTS heap size00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Open chromatin regions were determined using F-Seq [Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8].

Pubmed references: 18784119,

SEQUENCING
2022-08-24
generateCoverageFiles

generates coverage files

generateCoverageFiles

by Elena Weiß - version 1
version {@VERSION_LINKS@}

generates coverage files

Dependencies

  • binGenome


Parameter

name type restrictions default occurrence description minV maxV
outputDirfile1path to output folder00
bedgraphTablefile1path to table with bedgprahp paths00
bedfilefile1path to bed file00
binsinteger1number of bins to divide region00
fixedBinSizeUpstreamstring*[optional] can be used to create fixed bins upstream; format: 'binsize:binnumber'00
fixedBinSizeDownstreamstring*[optional] can be used to create fixed bins downstream; format: 'binsize:binnumber'00
factorstring0-null[optional] factor to generate files for only that factor00


Return values

name type description minV maxV
coverageFilesstringpath to coverage files00
bednamestringname of bed file00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2022-08-24
generateMetagenePlots

generates metagene plots

generateMetagenePlots

by Elena Weiß - version 1
version {@VERSION_LINKS@}

generates metagene plots

Dependencies

  • binGenome
  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
bedgraphTablefile1path to bedgraph table00
geneliststring*list of genes to consider00
experimentstring*type of experiment00
metaFrameinteger1frame to plot00
binsinteger1number of bins00
aggregateFUNstring1function to aggegate00
normShapeSumboolean1how to norm shape00
normLibSizeboolean1how to norm lib size00
normBinLengthboolean1how to norm bin length00
wilcoxboolean*should wilcox test be done00
factorstring0-nullwhich factor to consider00
coverageFilesfile1path to coverage files00
bednamestring1name of bed file00
plotnamestring*name of plot00
configfile1path to config file00
clusterPositionsfile*positions to draw line00


Return values

name type description minV maxV
generateMetagenePlotsOutputFolderstringpath where metagene plot is00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

sequencing
2019-02-11
genomeCoverage

converts bam format to bedgraph and tdf format

genomeCoverage

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

converts bam format to bedgraph and tdf format

Dependencies

  • python3
  • bedtools
  • igvtools


Parameter

name type restrictions default occurrence description minV maxV
bamfilefile exists, BAM format, reads sorted by coordinates1path to bam file whose genome coverage should be analyzed00
genomefilefile exists, ending *.genome for an IGV genome file or ending *.chrom.sizes for a simple text file with genome sizes*genome file or file with chromosome sizes for the genome that was used to create the bam file, the file is required only if the tdf option is set00
outPrefixstring1file name prefix for saving the bedgraph file (outPrefix.bedgraph) and the tdf file (outPrefix.bedgraph.tdf)00
tdfbooleantrue*transform bedgraph file into tdf format using igvtools00
bedtoolsPathfileexisting executablebedtools (in PATH)*path to bedtools executable, use if bedtools is not in PATH00
igvtoolsPathfileexisting executableigvtools (in PATH)*path to igvtools executable, use if igvtools is not in PATH00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

We created files for visualizing mapped reads with bedtools and igvtools.

Pubmed references: 20110278, 21221095,

General
2022-08-24
getSplittedTables

splits bedgraph table in order to process parallel

getSplittedTables

by Elena Weiß - version 1
version {@VERSION_LINKS@}

splits bedgraph table in order to process parallel

Dependencies

    {@DEPENDENCIES@}


Parameter

name type restrictions default occurrence description minV maxV
outputDirfile1path to output folder00
tablefile1line entry of bedgraph table00
forstring1gives type coverage or metagenes to split table into00
factorstring0-null[optional] factor to generate files for only that factor00


Return values

name type description minV maxV
liststringdir where tables are written00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

General
2018-10-31
grep

extracts information from text files using the exact or regex-based search of grep

grep

by Michael Kluge - version 1
version {@VERSION_LINKS@}

extracts information from text files using the exact or regex-based search of grep

Dependencies

  • GNU grep
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
outputFilefile pathabsolute1absolute path to file in which the output of grep is written00
filefile pathabsolute1absolute path to file to use as search input00
optionsstring*additional flags or parameters that are directly delivered to grep00
patternstring1pattern to search for; can also be a regex if parameter -P is set00


Return values

name type description minV maxV
grepResultFilestringpath to the output file00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

RNA-seq
2019-02-11
gseaPreranked

peforms gene set enrichment analysis with GSEAPreranked

gseaPreranked

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

peforms gene set enrichment analysis with GSEAPreranked

Dependencies

  • python3
  • java8
  • GSEA


Parameter

name type restrictions default occurrence description minV maxV
gseaJarfile1Path of the GSEA jar file00
labelstring1name of the analysis, e.g. sample name00
outdirstring1directory to store the results of GSEA00
geneTabfilefile exists1tab-separated table of genes with expression values/changes00
hasHeaderbooleanFalse*indicates if the first line of the geneTab should be interpreted as header00
geneColinteger>=00*0-based position of the column with gene names00
rankColinteger>=01*0-based position of the column with values to rank the genes, e.g. fold changes00
genesetstringallowed values: go, hallmark, transcription_factor, oncogenic_signatures, immunologic_signatureshallmark*gene sets to test for enrichment00
genesetVersionstring6.1*version of MSigDB to use00
scoringstringallowed values: weighted, unweightedunweighted*unweighted: classic score based on ranks, weighted: score includes values used for ranking00
plotNrintegergt;050*create plots for "plot_nr" top scoring genes00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

We performed gene set enrichment analysis with GSEAPreranked.

Pubmed references: 16199517,

SEQUENCING
2019-08-08
gtf2info

extracts information on genes and exons from GTF files and stores it in CSV format

gtf2info

by Michael Kluge - version 1
version {@VERSION_LINKS@}

extracts information on genes and exons from GTF files and stores it in CSV format

Dependencies

  • perl


Parameter

name type restrictions default occurrence description minV maxV
gtffileabsolute1path to the GTF file00
outputfileabsolute1path to the output file; for exons suffix '.exons' is added00


Return values

name type description minV maxV
geneInfoFilestringabsolute path to the resulting CSV file00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

General
2018-10-31
gzip

compresses and decompresses files using gzip; is able to verify file integrity using a md5 checksum file

gzip

by Michael Kluge - version 1
version {@VERSION_LINKS@}

compresses and decompresses files using gzip; is able to verify file integrity using a md5 checksum file

Dependencies

  • GNU gzip or pigz
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
outputabsolute file path${input}.gz*path to output file00
inputfile pathabsolute1path to input file00
decompressbooleanfalse*decompress the input file instead of compressing it00
verifybooleantrue*verify file integrity after decompression using the md5 checksum file00
oldPathMd5file pathabsolute*path where the files was stored when the md5 checksum was created00
limitLinesinteger[1,]*extract only the first N lines00
deletebooleanfalse*delete the file after compression was performed; enforces integrity check00
md5file pathabsolute*path to md5 checksum file to verify file integrity after decompression00
qualityinteger[1,9]9*compression quality ranging from 1 to 9; 9 being the slowest but best compression00
binaryNameenumgzip*name of the gzip binary; possible values: 'gzip' or 'pigz'00
threadsinteger[1,128]1*number of cores to use; only possible if 'pigz' is used as binary00


Return values

name type description minV maxV
processedGzipFilestringpath to the input file00
createdGzipFilestringpath to the output file00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Sequencing
2019-03-13
indexBam

creates an index for a BAM file using samtools index

indexBam

by Michael Kluge - version 1
version {@VERSION_LINKS@}

creates an index for a BAM file using samtools index

Dependencies

  • samtools
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
bamfile pathabsolute1path to the BAM file00
linkbooleantrue*creates a link called NAME.bam.bai because some tool expect the index under that name; use --nolink to disable it00


Return values

name type description minV maxV
BAMFilestringpath to the BAM file for which the index was created00


Citation info

Samtools (%SOFTWARE_VERSION%) was used to index the BAM files [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9].

Pubmed references: 19505943,

Sequencing
2022-03-23
insertSizeMetrics

This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries. The expected proportions of these metrics vary depending on the type of library preparation used, resulting from technical differences between pair-end libraries and mate-pair libraries. For a brief primer on paired-end sequencing and mate-pair reads, see the GATK Dictionary. The CollectInsertSizeMetrics tool outputs the percentages of read pairs in each of the three orientations (FR, RF, and TANDEM) as a histogram. In addition, the insert size distribution is output as both a histogram (.insert_size_Histogram.pdf) and as a data table (.insert_size_metrics.txt). Note: Metrics labeled as percentages are actually expressed as fractions!

insertSizeMetrics

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries. The expected proportions of these metrics vary depending on the type of library preparation used, resulting from technical differences between pair-end libraries and mate-pair libraries. For a brief primer on paired-end sequencing and mate-pair reads, see the GATK Dictionary. The CollectInsertSizeMetrics tool outputs the percentages of read pairs in each of the three orientations (FR, RF, and TANDEM) as a histogram. In addition, the insert size distribution is output as both a histogram (.insert_size_Histogram.pdf) and as a data table (.insert_size_metrics.txt). Note: Metrics labeled as percentages are actually expressed as fractions!

Dependencies

  • picard
  • java


Parameter

name type restrictions default occurrence description minV maxV
Histogram_FILEfile1File to write insert size Histogram chart to. Required.00
INPUTfile1Input SAM/BAM/CRAM file. Required.00
OUTPUTfile1The file to write the output to. Required.00
arguments_filefile0-null[optional] read one or more arguments files and add them to the command line This argument may be specified 0 or more times. Default value: null.00
COMPRESSION_LEVELinteger*[optional] Compression level for all compressed files created (e.g. BAM and VCF). Default value: 5.00
DEVIATIONSdouble*[optional] Generate mean, sd and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and sd grossly misleading regarding the real distribution. Default value: 10.0.00
GA4GH_CLIENT_SECRETSstring*[optional] Google Genomics API client_secrets.json file path. Default value: client_secrets.json.00
HISTOGRAM_WIDTHWinteger*null00
MAX_RECORDS_IN_RAMinteger*[optional] When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. Default value: 500000.00
METRIC_ACCUMULATION_LEVELstring*[optional] The level(s) at which to accumulate metrics. This argument may be specified 0 or more times. Default value: [ALL_READS]. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP}00
MIN_HISTOGRAM_WIDTHinteger*[optional] Minimum width of histogram plots. In the case when the histogram would otherwise betruncated to a shorter range of sizes, the MIN_HISTOGRAM_WIDTH will enforce a minimum range. Default value: null. 00
MINIMUM_PCTdouble*[optional] When generating the Histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this percentage of overall reads. (Range: 0 to 1). Default value: 0.05.00
REFERENCE_SEQUENCEfile*[optional] Reference sequence file. Default value: null.00
STOP_AFTERinteger*[optional] Stop after processing N reads, mainly for debugging. Default value: 0.00
TMP_DIRfile0-null[optional] One or more directories with space available to be used by this program for temporary storage of working files This argument may be specified 0 or more times. Default value: null.00
VALIDATION_STRINGENCYstringSTRICT*[optional] Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. Possible values: {STRICT, LENIENT, SILENT} 00
VERBOSITYstring*[optional] Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING, INFO, DEBUG}00
ASSUME_SORTEDbooleantrue*[optional] If true (default), then the sort order in the header file will be ignored. Default value: true. Possible values: {true, false}00
CREATE_INDEXbooleanfalse*[optional] Whether to create an index when writing VCF or coordinate sorted BAM output. Default value: false. Possible values: {true, false}00
CREATE_MD5_FILEbooleanfalse*[optional] Whether to create an MD5 digest for any BAM or FASTQ files created. Default value:false. Possible values: {true, false}00
INCLUDE_DUPLICATESbooleanfalse*[optional] If true, also include reads marked as duplicates in the insert size histogram. Default value: false. Possible values: {true, false}00
QUIETbooleanfalse*[optional] Whether to suppress job-summary info on System.err. Default value: false. Possible values: {true, false}00
USE_JDK_DEFLATERbooleanfalse*[optional] Use the JDK Deflater instead of the Intel Deflater for writing compressed output. Default value: false. Possible values: {true, false}00
USE_JDK_INFLATERbooleanfalse*[optional] Use the JDK Inflater instead of the Intel Inflater for reading compressed input. Default value: false. Possible values: {true, false}00
versionbooleanfalse*[optional] display the version number for this tool00


Return values

name type description minV maxV
outputHistogramFilestringoutput file containing the histogram of insert sizes00
outputBamFilestringtxt file containing insert size metrics00


Citation info

Insert size metrics were calculated with the picard library (%SOFTWARE_VERSION%).

Pubmed references:

General
2019-03-13
joinFiles

joins two or more files together

joinFiles

by Michael Kluge - version 1
version {@VERSION_LINKS@}

joins two or more files together

Dependencies

  • GNU cat
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
inputstring1-multiple input files (or input folders) in the order in which they should be joined; in pattern mode (--pattern) folder path(s) are expected00
outputfile pathabsolute1path to output file00
convertPairedEndbooleanfalse*special flag for joining of FASTQ files; adds /1 and /2 at the end of read names if casava format 1.8 or greater is used; default: disabled00
patternstring0-nullone ore more unix file pattern (e.g. *.txt) that are used to find files matching that pattern; one pattern corresponds to one input folder path; order of files to join can not be influenced00


Return values

name type description minV maxV
joinedFilestringabsolute file path to the joined file00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Sequencing
2018-10-31
leon

LEON is a reference-free method to compress high throughput sequencing data

leon

by Michael Kluge - version 1
version {@VERSION_LINKS@}

LEON is a reference-free method to compress high throughput sequencing data

Dependencies

  • LEON (tested with 1.0.0)


Parameter

name type restrictions default occurrence description minV maxV
inputfile pathabsolute1absolute path to input file; supported file formats: compress: *.fastq or *.fq; decompress: *.leon.tar00
threadsinteger1*number of cores to use00
kmerSizeinteger31*k-mer size that is used for compression00
outputFolderfolder pathabsolute1path to folder in which the compressed file is stored; resulting file will have *.leon.tar or *.fastq ending00
workingDirfolder pathabsolute/usr/local/storage/*path to working directory00


Return values

name type description minV maxV
createdFilestringpath to the compressed or decompressed file00


Citation info

Sequencing data was (de-)compressed using LEON (%SOFTWARE_VERSION%) [G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 2015, 16:288.].

Pubmed references: 26370285,

General
2019-11-12
listFiles

lists files in directories based on pattern

listFiles

by Michael Kluge - version 1
version {@VERSION_LINKS@}

lists files in directories based on pattern

Dependencies

  • GNU Core Utilities
  • GNU findutils


Parameter

name type restrictions default occurrence description minV maxV
folderfolderabsolute1-one ore more input folders; one for each pattern00
outputfile pathabsolute*write results to a file; one line per found file00
sepstring,*separator between entries00
maxdepthinteger0*descend at most n levels of folders00
patternstring1-one ore more unix file pattern (e.g. *.txt) that are used to find files matching that pattern; one pattern corresponds to one input folder path00


Return values

name type description minV maxV
foundFilesstringfound files joined with the separator00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

sequencing
2019-02-11
mappingSummary

summarizes read counts remaining after different analysis steps of sequencing data

mappingSummary

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

summarizes read counts remaining after different analysis steps of sequencing data

Dependencies

  • python3
  • matplotlib
  • seaborn


Parameter

name type restrictions default occurrence description minV maxV
basicStatsSummaryfilefile exists, output of mergeStatistics module*Output of the Watchdog Module mergeStatistics applied on the Basic Statistics reported by FASTQC (tab-separated table, column 0: type of count, column 1: read count, column 2: file name)00
rawRegexstringvalid regular expression in python re*regular expression with one group expression to extract the sample name from the name of a fastq file with untrimmed reads00
trimRegexstringvalid regular expression in python re*regular expression with one group expression to extract the sample name from the name of a fastq file with trimmed reads00
idxstatsSummaryfilefile exists, output of mergeStatistics module*Output of the Watchdog Module mergeStatistics applied on the Idxstatistics reported by the bamstats module (tab-separated table, column 0: chromosome, column 2: read count, column 4: file name)00
bamRegexstringvalid regular expression in python re*regular expression with one group expression to extract the sample name from the name of a bam file with mapped reads00
chromosomeGroupingTablefile*tab-separated table with a header with chromosome names in column 0 and groups in column 100
countTablestring1path for writing a table with all extracted read counts00
countPlotstring*path for saving a summary plot of total, trimmed and mapped reads, format is identified by file ending, all formats supported by pyplot are allowed00
groupPlotstring*path for saving a summary plot of the fraction of mapped reads for given groups of chromosomes, format is identified by file ending, all formats supported by pyplot are allowed00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

We created figures summarizing the number of reads in our sequencing experiments before and after adapter removal and mapping.

Pubmed references:

Links

{@LINK_LIST@}

sequencing
2019-10-15
mergeBam

merges 2 or more bam files using samtools

mergeBam

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

merges 2 or more bam files using samtools

Dependencies

  • samtools


Parameter

name type restrictions default occurrence description minV maxV
infilefile2-input bam file(s)00
outfilefile1output bam file00


Return values

name type description minV maxV
mergedBamFilestringoutput bam file (= value for parameter outfile)00


Citation info

bam files were merged using samtools (Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9)

Pubmed references: 19505943,

SEQUENCING
2019-08-08
mergeFeatureCounts

combines the output of multiple featureCounts runs in one CSV file

mergeFeatureCounts

by Michael Kluge - version 1
version {@VERSION_LINKS@}

combines the output of multiple featureCounts runs in one CSV file

Dependencies

  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
searchFolderfileabsolute1path to the folder in which *.counts files are located00
outputfileabsolute1path to the output file00
statsFolderfileabsolute*path to merged statistic folder required for plotting00
featureAnnotationfileabsolute*annotation file which is joined with the count file00
featureAnnotationIDstringGeneid*name of the column with is used for joining00
featureAnnotationTypestringtype*name of the column in the annotation file for which a distribution plot is created00
featureAnnotationExonLengthstringexon_length*name of the column that contains the exon length of the features00
noPlottingbooleanfalse*disables the execution of R scripts00
prefixNamesbooleanfalse*prefixes the names of the features with continuous numbers00


Return values

name type description minV maxV
mergedCountFilestringabsolute path to the merged count file in CSV format00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Sequencing
2019-03-15
mergeStatistics

takes a folder containing BAM statistics generated by the bamstats module and generates table-formated files

mergeStatistics

by Michael Kluge - version 1
version {@VERSION_LINKS@}

takes a folder containing BAM statistics generated by the bamstats module and generates table-formated files

Dependencies

  • java11


Parameter

name type restrictions default occurrence description minV maxV
typestring1type of the statistic merger that should be called; allowed values: FastQC, Star, BamstatsMerger, CutadaptMerger, FeatureCounts, FlagstatMerger00
inputDirfolder pathabsolute1path to input folder00
outputDirfolder pathabsolute1 path to output folder00


Return values

name type description minV maxV
mergedFilestringabsolute path to the merged file00
mergedTypestringtype of the merger (parameter: type)00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2022-08-24
normalizeCPM

normalizes CPM

normalizeCPM

by Elena Weiß - version 1
version {@VERSION_LINKS@}

normalizes CPM

Dependencies

  • binGenome
  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
sumsfile1files to sum00
countsfile1file of counts00
outputFilefile1path to output file00


Return values

name type description minV maxV
normedCountsstringfile of normed counts00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2022-08-24
pausingIndex

computes pausing index for given window frames

pausingIndex

by Elena Weiß - version 1
version {@VERSION_LINKS@}

computes pausing index for given window frames

Dependencies

  • createBEDandSAF
  • featureCounts


Parameter

name type restrictions default occurrence description minV maxV
outputDirfile1path to output folder00
gtffile1path to gtf file00
bamfile1path to bam file00
promStartinteger*start position of promoter window00
promEndinteger*end position of promoter window00
bodyStartinteger*start position of body window00
bodyLengthinteger*end position of body window00
geneliststring1list of genes to consider00
tssfile1path to tss file00


Return values

name type description minV maxV
pausingindicesstringdir where pausing indices are computed00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

ChIP-seq
2019-02-11
phantomPeak

analyzes strand cross-correlation in mapped reads from ChIP-seq experiments

phantomPeak

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

analyzes strand cross-correlation in mapped reads from ChIP-seq experiments

Dependencies

  • python3
  • R >=3.1
  • spp (phantompeakqualtools)


Parameter

name type restrictions default occurrence description minV maxV
inBamstringvalid file path, bam format, ending *.bam1Path to the bam file with mapped ChIP-seq reads. An index of the bam file is not required.00
outPrefixstring1Common prefix of all output files. The module produces 3 files: outPrefix.txt (summary file), outPrefix.pdf (cross-correlation plot) and outPrefix.Rdata (R session of the analysis).00
sppPathstringvalid file path to the script run_spp.R1Path to executable (R script) of phantompeakqualtools which is usually called run_spp.R00
rscriptPathstringvalid file path, executableRscript in PATH variable*Path to executable Rscript if not given in PATH variable00
tmpdirstringpath to existing folderreturn value of the tempdir() function of R*Folder for writing temporary files. The tool copies the whole bam file to this location. All temporary files are extended with a random suffix.00
threadsinteger>=11*Number of threads used for the calculations00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Phantompeakqualtools were used to perform quality control of the mapped ChIP-seq reads.

Pubmed references: 22955991,

SEQUENCING
2023-01-09
preDexseq

collects single amss files and creates annotation files for featurecounts and dexseq

preDexseq

by Elena Weiß - version 1
version {@VERSION_LINKS@}

collects single amss files and creates annotation files for featurecounts and dexseq

Dependencies

  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
indirfile1input directory00
annotfile1annotation file name to write in00
annot_fcfile1annotation file to write in for featurecounts00


Return values

name type description minV maxV
outstringpath to output directory00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2023-01-09
quantCurveScore

computes output score table

quantCurveScore

by Elena Weiß - version 1
version {@VERSION_LINKS@}

computes output score table

Dependencies

  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
controlConditionstring1name of control condition00
testConditionstring1name of test condition00
sampleAnnotationstring1path to sample annotation file with conditions00
outfile1output directory00


Return values

name type description minV maxV
outstringoutput directory00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

sequencing
2023-03-23
readthrough

Calculates readthrough and readin values and optionally downstream FPKM and expression in dOCR regions

readthrough

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

Calculates readthrough and readin values and optionally downstream FPKM and expression in dOCR regions

Dependencies

  • java
  • picard (jar included with module)
  • Apache Commons CLI library (jar included with module)


Parameter

name type restrictions default occurrence description minV maxV
annotationfile1annotation file path00
genecountsfile1gene read count file00
inputfile1input file00
outputfile1output file00
readthroughLengthinteger5000*[optional] length of downstream window in which read-through is calculated00
readinLengthinteger5000*[optional] length of upstream window in which read-in is calculated00
strandednessinteger0*strandedness: 0=not strandspecific, 1=first read indicates strand, 2=second read indicates strand00
overlapinteger25*[optional] minimum overlap of read to be counted for read-through/in window00
idxstatsfile*[optional] idxstats file with numbers of mapped reads per chromosome, necessary for calculating downstream FPKM and transcription in dOCR regions00
normFactorstring*[optional] factor for normalizing by mapped reads and gene length for downstream FPKM calculation00
excludestring*[optional] chromosomes to exclude from calculating total mapped reads, separated by ,00
excludeTypestring*[optional] gene types to exclude when determining genes with no other genes up- or down-stream, separated by ,00
dOCRFilestring*[optional] file containing dOCR lengths00
windowLengthinteger1000*[optional] number of steps for evaluating transcription on dOCRs00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Read-through was calculated as previously described in Hennig T et al, 2028, PLOS Pathogens 14(3): e1006954

Pubmed references: 29579120,

Links

{@LINK_LIST@}

SRA mining
2020-12-02
recountReadout

calculates readout for every sample in a project from recount.

recountReadout

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

calculates readout for every sample in a project from recount.

Dependencies

  • Python3
  • R 3.5.x
  • R packages recount and recount.bwtool


Parameter

name type restrictions default occurrence description minV maxV
projectIDstringmutually exclusive with projectFile*project id of a sra project indexed in recount2, it is possible to pass several project ids separated by ,00
projectFilefilefile exists, mutually exclusive with projectID*file with one line giving project ids (file content = all allowed values for projectID)00
geneTSVfilefile exists1 tab-separated file with genes, cooridantes, exonic basepairs and upstream and downstream regions (requires a line with column names chr, geneid, exonic_bps, upstream_start, upstream_end, downstream_start and downstream_end) 00
outfolderfolder1 folder for saving final results, creates a subfolder for the project with a table of coverage values for every sample in the project 00
tmpfolderfolder1folder for saving temporary data, creates a subfolder for the project (named projectID)00
Rscriptfileexecutable*path to Rscript executable (preferentially version 5.3)00
removeTmpSampleDatabooleantrue*if this flag is set, temporary files for samples are deleted at the end (default behaviour)00
removeTmpProjectDatabooleantrue*if this flag is set, temporary files for projects are deleted at the end (default behaviour)00
threadsinteger>=11*number of threads to use, equivalent to number of samples processed in parallel00
downloadParallelbooleanfalse*if this flag is set, big wig files are downloaded by in parallel (default: not set)00
localRecountFolderfolderabsolute*folder that can contain locally processed or already downloaded recount data; structure: projectID/rse_gene.Rdata and projectID/bw/sampleID.bw00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Normalized readcounts for genes, upstream regions and downstream regions were calculated from the bigwigfiles provided by the Recount2 project.

Pubmed references: 28398307,

circRNA
2019-02-11
removeLinearReads

removes linearly mappable reads from a circRNA prediction.

removeLinearReads

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

removes linearly mappable reads from a circRNA prediction.

Dependencies

  • python3
  • pysam (v0.14.1)


Parameter

name type restrictions default occurrence description minV maxV
mappingfileSAM or BAM format1 path to a SAM or BAM file with mapped reads from the sample for which circRNAs were predicted (file ending is used to decide if it is SAM or BAM format) 00
circRNAPredictionfilefile exists1 predicted circRNAs from the CIRI2, circRNAfinder or the circCombination module (tab-separated, 5 columns: chromosome, start, end, strand, list of reads) 00
circOutfile1 all circRNAs from the input file with at least minReads remaining circular reads after removing all linearly mappable reads from the lists circular junction reads 00
minReadsinteger>=12*Minimum number of predicted junction reads required for writing a circRNA to the outputfile, default:200
pairedstring'yes' or 'no'yes1indicates if SAM or BAM input file contains paired-end (yes) or single-end (no) data00


Return values

name type description minV maxV
filteredCircsfilepath to circRNA predictions with the filtered lists of circular reads (same as input parameter circOut)00


Citation info

We filtered the predicted circular reads by removing those reads that can be mapped elsewhere in a linear way.

Pubmed references:

Links

{@LINK_LIST@}

RNA-seq
2019-02-11
rrnaFilter

removes rrna reads from sequencing data

rrnaFilter

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

removes rrna reads from sequencing data

Dependencies

  • python3
  • pysam
  • bwa


Parameter

name type restrictions default occurrence description minV maxV
in1filefile exists, allowed file endings: fastq, fq, fq.gz1first (gzipped) fastQ file with the sequenced reads00
in2filefile exists, allowed file endings: fastq, fq, fq.gz*second (gzipped) fastQ file with the sequenced reads (for paired-end data only)00
rrnaIndexstringfilename prefix for a bwa index1Common prefix of bwa index files for the rRNA sequence00
out1stringfile path with file ending fastq, fq, fa or fq.gz1file for writing non-rRNA reads from in1 in fasta or (gzipped) fastq format00
out2stringfile path with file ending fastq, fq, fa or fq.gz*file for writing non-rRNA reads from in2 (for paired-end data) in fasta or (gzipped) fastq format00
samstringfile path with file ending sam1sam file for writing rRNA reads from in1 and in200
workdirfolderfolder existsos.getcwd()*path to directory for writing large temporary files (content is deleted at the end of execution), default: current directory00
keepTmpbooleanFalse*option to keep temporary files00
maxEditDistanceinteger>=0infinity*maximum allowed edit distance for a read alignment against rRNA00
maxMismatchesinteger>=0infinity*maximum allowed number of mismatches for a read alignment against rRNA00
maxIndelsinteger>=0infinity*maximum allowed number of indels for a read alignment against rRNA00
pairFilteringinteger1 or 22*Number of reads of a pair required to fulfil the options above (maxEditDistance, maxMismatches, maxIndels)00
bwaPathexecutablebwa*path to bwa executable00
seedSizeinteger>=125*size of initial seed for bwa (-k option of bwa)00
threadsinteger>=11*number of threads to use for bwa (-t option of bwa)00


Return values

name type description minV maxV
rrnaSAMFilestringpath to rRNA reads in SAM format (same value as given by the sam parameter)00
filteredFQ1stringpath to non rRNA reads in FASTQ format (same value as given by the out1 parameter)00
filteredFQ2stringpath to non rRNA reads in FASTQ format (same value as given by the out2 parameter), for single-end data the value of the return variable is set to "not_defined_for_single_end"00


Citation info

Before mapping the reads to the reference genome we removed reads originating from rRNAs

Pubmed references:

Links

{@LINK_LIST@}

Sequencing
2019-03-13
sam2bam

converts SAM files into compressed BAM format using samtools sort

sam2bam

by Michael Kluge - version 1
version {@VERSION_LINKS@}

converts SAM files into compressed BAM format using samtools sort

Dependencies

  • samtools
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
samfile pathabsolute1path to SAM file that should be compressed00
bamfile pathabsolute1path to ouput BAM file00
threadsinteger1*number of threads to use for compression00
qualityinteger[1, 9]9*compression level; 1 is the worst/fastest and 9 is the best/slowest compression00
memorystring768M*maximal memory that can be used per thread; only an estimation and might be exceeded!00
tmpFolderfolder pathabsolute*write temporary files to that folder00


Return values

name type description minV maxV
BAMFilestringabsolute path to the resulting BAM file00


Citation info

Samtools (%SOFTWARE_VERSION%) was used to convert SAM to BAM files [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9].

Pubmed references: 19505943,

sequencing
2019-11-12
samtoolsView

runs samtools view on BAM/SAM/CRAM files

samtoolsView

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

runs samtools view on BAM/SAM/CRAM files

Dependencies

  • samtools


Parameter

name type restrictions default occurrence description minV maxV
bamoutputbooleanfalse*output BAM00
cramoutputbooleanfalse*output CRAM (requires reference sequence)00
fastCompressionbooleanfalse*use fast BAM compression (implies bamoutput)00
uncompressedBambooleanfalse*uncompressed BAM output (implies bamoutput)00
includeHeaderbooleanfalse*include header in SAM output00
printOnlyHeaderbooleanfalse*print SAM header only (no alignments)00
printCountsbooleanfalse*print only the count of matching records00
outputfilestdout*output file name00
outputReadsNotSelectedfile*output reads not selected by filters to FILE00
referenceLengthsfile*FILE listing reference names and lengths (see long help)00
bedfilefile*only include reads overlapping this BED FILE00
readgroupstring*only include reads in read group STR00
readgroupFilefile*only include reads with read group listed in FILE00
mappingqualityinteger0*only include reads with mapping quality at least INT00
librarystring*only include reads in library STR00
minquerylengthinteger*only include reads with number of CIGAR operations consuming query sequence at least INT00
bitssetinteger0*only include reads with all bits set in INT set in FLAG00
bitsnotsetinteger0*only include reads with none of the bits set in INT set in FLAG00
readTagToStripstring*read tag to strip (repeatable)00
collapseCIGAROperationstring*collapse the backward CIGAR operation00
seeddouble0*integer part sets seed of random number generator, rest sets fraction of templates to subsample00
threadsstring*number of BAM/CRAM compression threads00
printLongHelpstring*print long help, including note about region specification00
inputfmtoptionstring*Specify a single input file format option in the form of OPTION or OPTION=VALUE00
outputfmtstring*Specify output format (SAM, BAM, CRAM)00
outputfmtoptionstring*Specify a single output file format option in the form of OPTION or OPTION=VALUE00
referencestring*Reference sequence FASTA FILE00
inbamfile*input bam file00
insamfile*input sam file00
incramfile*input cram file00
regionstring*region selected00


Return values

name type description minV maxV
outputFilestringoutput file (= value for parameter output)00


Citation info

Samtools was used to convert BAM/SAM/CRAM to BAM/SAM/CRAM [Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9]

Pubmed references: 19505943,

sequencing
2019-11-12
sashimiPlot

Performs visualization of splicing events across multiple samples using ggsashimi.

sashimiPlot

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

Performs visualization of splicing events across multiple samples using ggsashimi.

Dependencies

  • ggsashimi
  • python (2.7 or 3)
  • samtools (>=1.3)
  • R (>=3.3)
  • R package ggplot2 (>=2.2.1)
  • R package data.table (>=1.10.4)
  • R package gridExtra (>=2.2.1)
  • R package svglite (>=1.2.1), when generating output images in SVG format


Parameter

name type restrictions default occurrence description minV maxV
helpstring*show this help message and exit00
bamfile*Individual bam file or file with a list of bam files. In the case of a list of files the format is tsv: 1col: id for bam file, 2col: path of bam file, 3+col: additional columns00
coordinatesstring*Genomic region. Format: chr:start-end (1-based)00
outprefixstringsashimi*Prefix for plot file name00
outstrandstringboth*Only for --strand other than 'NONE'. Choose which signal strand to plot: both, plus, minus00
mincoverageinteger1*Minimum number of reads supporting a junction to be drawn00
junctionsbedfile*Junction BED file name00
gtffile*Gtf file with annotation (only exons is enough)00
strandstringNONE*Strand specificity: NONE, SENSE, ANTISENSE, MATE1_SENSE, MATE2_SENSE00
overlayinteger*Index of column with overlay levels (1-based)00
aggrstring*Aggregate function for overlay: mean, median, mean_j, median_j. Use mean_j | median_j to keep density overlay but aggregate junction counts00
colorfactorinteger*Index of column with color levels (1-based)00
alphadouble0.5*Transparency level for density histogram00
palettefile*Color palette file. tsv file with at least 1 column, where the color is the first column00
labelsinteger*Index of column with labels (1-based)00
heightdouble2*Height of the individual signal plot in inches00
annheightdouble1.5*Height of annotation plot in inches00
widthdouble10*Width of the plot in inches00
basesizeinteger14*Base font size of the plot in pch00
outformatstringpdf*Output file format: pdf, svg, png, jpeg, tiff00
outresolutioninteger300*Output file resolution in PPI (pixels per inch). Applies only to raster output formats00
shrinkbooleanfalse*Shrink the junctions by a factor for nicer display00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

Sashimi plots were created using ggsashimi [Garrido-Martín D, Palumbo E, Guigó R, Breschi A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 2018 Aug 17,14(8):e1006360. ]

Pubmed references: 30118475,

SEQUENCING
2022-08-24
scaledMetashape

creates metagene over whole body

scaledMetashape

by Elena Weiß - version 1
version {@VERSION_LINKS@}

creates metagene over whole body

Dependencies

  • binGenome
  • sharedUtils


Parameter

name type restrictions default occurrence description minV maxV
bedgraphTablefile1table with paths to bedgraph files and conditions/replicates00
geneliststring*list of genes to consider00
experimentstring*type of experiment00
metaFrameinteger1frame to plot00
binsinteger1number of fixed bins to scale00
aggregateFUNstring1function for aggregation00
normShapeSumboolean1how to norm shape00
normLibSizeboolean1how to norm lib size00
normBinLengthboolean1how to norm bin length00
factorstring0-nullfactor to consider00
coverageFilesfile1path to where coverage files are00
bednamestring1name of bed file00
plotnamestring*name of plot00
configfile1file to configs00


Return values

name type description minV maxV
scaledMetashapeOutputFolderstringfolder where plot is00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

SEQUENCING
2020-11-03
spring

SPRING is a reference-free method to compress high throughput sequencing data

spring

by Michael Kluge - version 1
version {@VERSION_LINKS@}

SPRING is a reference-free method to compress high throughput sequencing data

Dependencies

  • SPRING (tested with 1.0v1.0)
  • GNU core utilities


Parameter

name type restrictions default occurrence description minV maxV
fastqfile*path to one or two (PE datasets) fastq files; possible endings: *.fastq, *.fq, *.fastq.gz or *.fq.gz file00
springfile1path to compressed spring file; possible endings: *.spring or *.tar00
compressbooleantrue*if true the fastq files are compressed; otherwise the spring file is decompressed00
preserveOrderbooleantrue*preserve read order00
qualitybooleantrue*retain quality values during compression00
idsbooleantrue*retain read identifiers during compression00
qualityModeenumlossless*possible values: 'lossless', 'qvz qv_ratio', 'ill_bin' or 'binary thr high low'00
longbooleanfalse*use for compression of arbitrarily long reads00
decompressRangestring*decompress only reads (or read pairs for PE datasets) from start to end (both inclusive); e.g. '1 100'00
workingDirfile/usr/local/storage/*path to working directory00
threadsinteger1*number of cores to use00


Return values

name type description minV maxV
createdFilestringpath to the compressed or decompressed file (separated by ',' in case of PE datasets)00
isPairedEndbooleantrue if paired-end data was processed00


Citation info

The FASTQ files were compressed using SPRING.

Pubmed references: 30535063,

Sequencing
2019-03-13
sraDump

downloads and extracts FASTQ files from the Sequence Read Archive (SRA)

sraDump

by Michael Kluge - version 1
version {@VERSION_LINKS@}

downloads and extracts FASTQ files from the Sequence Read Archive (SRA)

Dependencies

  • fastq-dump or fasterq-dump
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
sraFilefile pathabsolute0-nullpath to the *.sra file(s); can not be used in combination with --sraID00
sraIDstring0-nullone or more SRA ID(s); can not be used in combination with --sraFile00
renamestring*new basename for the resulting fastq files;00
outputFolderfolder pathabsolute1path to folder in which the files should be extracted00
tmpFolderfolder pathabsolute/usr/local/storage*tmp folder; default: /usr/local/storage00
deleteOnSuccessbooleanfalse*deletes the SRA file when extraction was successfull00
disablePrefetchbooleanfalse*disables prefetching of the sra files00
binaryNameenumfastq-dump*name of the sra-toolkit binary; possible values: 'fastq-dump' or 'fasterq-dump'00
threadsinteger[1,128]1*number of cores to use; only possible if 'fasterq-dump' is used as binary00


Return values

name type description minV maxV
isPairedEndbooleantrue, if paired end data was downloaded from SRA00
baseNamestringabsolute base name path to the created files00
createdFilesstringabsolute path to all files that were downloaded separated by ','00


Citation info

Public samples were downloaded from the SRA (accession number: TODO %sraID%) [Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2010;39(Database issue):D19-21.].

Pubmed references: 21062823,

SEQUENCING
2022-08-24
sumIdxStat

sums up idxstats

sumIdxStat

by Elena Weiß - version 1
version {@VERSION_LINKS@}

sums up idxstats

Dependencies

    {@DEPENDENCIES@}


Parameter

name type restrictions default occurrence description minV maxV
inputFilefile1path to input file00
outputFilefile1path to output file00
excludeChromstring0-nullchromosome to exclude from sum00


Return values

name type description minV maxV
samplesSumstringsum of idxstats00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

Links

{@LINK_LIST@}

circRNA
2019-02-11
trimmedFastqPairFilter

extracts paired reads from 2 fastq files.

trimmedFastqPairFilter

by Sophie Friedl - version 1
version {@VERSION_LINKS@}

extracts paired reads from 2 fastq files.

Dependencies

  • python3


Parameter

name type restrictions default occurrence description minV maxV
inReads1filefile exists, fastq format (read names without read numbers as /1)*path to first fastq file with reads00
inReads2filefile exists, fastq format (read names without read numbers as /2)*path to second fastq file with reads00
inPrefixstringprefix1.[fastq|fq] and prefix2.[fastq|fq] exist and meet the restrictions of inReads1 and inReads2*reads in two fastq files: prefix1.[fastq|fq], prefix2.[fastq|fq], can be used instead of inReads1 and inReads200
outReads1string*output file for first reads of paired data00
outReads2string*output file for second reads of paired data00
outSingletonsstring*output file for singleton reads without a mate00
outPrefixstring*writes output to three files: prefix1.fastq, prefix2.fastq, prefixsingleton.fastq, can be used instead of outReads1, outReads2 and outSingleton00


Return values

name type description minV maxV
pairedReads1stringoutput file for first reads of paired data given in the parameters via outReads1 or outPrefix00
singletonReadsstringoutput file for second reads of paired data given in the parameters via outReads2 or outPrefix00
pairedReads2stringoutput file for singleton reads without a mate given in the parameters via outReads2 or outPrefix00


Citation info

We removed all reads with missing mates from the paired-end fastq files.

Pubmed references:

Links

{@LINK_LIST@}

Sequencing
2019-03-13
umiDedup

unique molecular identifiers (UMIs) can be used to remove PCR duplicates

umiDedup

by Michael Kluge - version 1
version {@VERSION_LINKS@}

unique molecular identifiers (UMIs) can be used to remove PCR duplicates

Dependencies

  • umi_tools
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
bamFilefile pathabsolute1path to the BAM file; UMI must be a suffix of the fastq id separated with '_'00
outputFilefile pathabsolute1path to the de-duplicated BAM file00
deleteOnSuccessbooleanfalse*deletes the BAM file when deduplication was successfull00


Return values

name type description minV maxV
deduplicatedFilestringabsolute path to the de-duplicated BAM file00


Citation info

UMI-tools was used to remove PCR duplicates from the raw sequecing data based on UMIs [Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017;27(3):491-499.].

Pubmed references: 28100584,

General
2019-10-16
untar

extracts *.tar, *.tar.gz and *.tar.bz2 archives

untar

by Caroline Friedel - version 1
version {@VERSION_LINKS@}

extracts *.tar, *.tar.gz and *.tar.bz2 archives

Dependencies

  • GNU tar


Parameter

name type restrictions default occurrence description minV maxV
infilefile1input file, must be *tar, *tar.gz or *tar.bz200
outputdirfile*[optional] output directory for extracting archive00


Return values

{@RETURN_VALUES@}
name type description minV maxV


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}

General
2019-03-13
wget

wget is used to locate and download URI resources

wget

by Michael Kluge - version 1
version {@VERSION_LINKS@}

wget is used to locate and download URI resources

Dependencies

  • GNU wget
  • GNU Core Utilities


Parameter

name type restrictions default occurrence description minV maxV
uristring1-one ore more URI(s) pointing to the resource(s) to download00
outputfolder pathabsolute1path to a folder in which the downloaded files should be stored; filename remains untouched00
renamestring0-nullrenames the file to that name; multiple names must be provided in case of multiple URIs00
disableSizeCheckbooleanfalse*flag that can be used to disable the size check that checks if a file is greater than 1KB00


Return values

name type description minV maxV
downloadedFolderstringpath to the folder in which the files were stored00
numberOfFilesintegernumber of files that were downloaded00
downloadedFilesstringabsolute path to the downloaded file(s) separated by ','00


Citation info

{@PAPER_DESC@}

Pubmed references: {@PMID_LIST@}