API: Reads
Exported functions
Making reads
Pseudoseq.make_reads
— Function.make_reads(::Type{PairedEnd}, p::MoleculePool, flen::Int, rlen::Int = flen)
Create a set of paired-end reads from a pool of DNA molecules p
.
flen
sets the length of forward read, and rlen
sets the length of the reverse read. If you only provide flen
, then the function sets rlen = flen
.
If a molecule in the pool is not long enough to create a forward and/or reverse read, then that molecule will simply be skipped.
make_reads(::Type{SingleEnd}, p::MoleculePool, len::Int)
Create a set of single-end reads from a pool of DNA molecules p
.
len
sets the length of the reads.
The end (strand) from which the reading begins for each DNA molecule in the pool is determined at random for each molecule, with 50:50 probability.
If you don't provide a value for len
, then the function will read each DNA molecule in it's entirety.
If a molecule in the pool is not long enough to create a forward and/or reverse read, then that molecule will simply be skipped.
make_reads(::Type{TaggedPairs}, p::MoleculePool, flen::Int, rlen::Int = flen)
Create a set of tagged paired-end reads from a pool of DNA molecules p
.
flen
sets the length of forward read, and rlen
sets the length of the reverse read. If you only provide flen
, then the function sets rlen = flen
.
When a set of TaggedPairs
is written to file, the tag information is contained in the R1 read of each read-pair. The first 16bp of each R1 read is a sequence that is the tag, and a following 7bp are a buffer between the 16bp tag, and the rest of the read sequence.
If a molecule in the pool is not long enough to create a forward and/or reverse read, then that molecule will simply be skipped.
Introducing errors
Pseudoseq.mark_errors
— Function.mark_errors(reads::Reads, rate::Float64)
Create a new set of reads, with errors, from an input set of reads.
When you first create a set of reads using a make_reads
method, all the reads in that set are perfect reads. In real sequencing experiments, sequencers make errors when reading a DNA molecule, and are characterised by an error rate. This function lets you simulate this characteristic of sequencers, by marking (at random) positions in the reads that are destined to be errors in the output FASTQ.
Currently, every position in every read is equally likely to be marked as an error.
The number of errors introduced into the reads can be calculated.
Where $E$ is the number of errors, $N$ is the total number of bases in your set of reads, and $R$ is the error rate you provide to this function.
Generating FASTQ files
Pseudoseq.generate
— Function.generate(filename::String, reads::Reads)
Write the reads
out to a FASTQ formatted file with the given filename
.
If this method is used with a paired-end read type, then the FASTQ file will be interleaved; all R1 reads will be odd records, and all R2 reads will be even records in the file.
Reads are named according to the sequence in the input genome they came from. e.g. @Reference_1_R1
means the first sequence in the genome, and @Reference_2_R1
means the second sequence in the genome.
generate(R1name::String, R2name::String, reads::Reads{<:PairedReads})
This method only works for paired reads. Instead of interleaving R1 and R2 reads in a single FASTQ file, R1 and R2 reads are partitioned into two seperate FASTQ files.