API: Reads

Exported functions

Making reads

Pseudoseq.make_reads — Function.

make_reads(::Type{PairedEnd}, p::MoleculePool, flen::Int, rlen::Int = flen)

Create a set of paired-end reads from a pool of DNA molecules p.

flen sets the length of forward read, and rlen sets the length of the reverse read. If you only provide flen, then the function sets rlen = flen.

Note

If a molecule in the pool is not long enough to create a forward and/or reverse read, then that molecule will simply be skipped.

source

make_reads(::Type{SingleEnd}, p::MoleculePool, len::Int)

Create a set of single-end reads from a pool of DNA molecules p.

len sets the length of the reads.

The end (strand) from which the reading begins for each DNA molecule in the pool is determined at random for each molecule, with 50:50 probability.

If you don't provide a value for len, then the function will read each DNA molecule in it's entirety.

Note

If a molecule in the pool is not long enough to create a forward and/or reverse read, then that molecule will simply be skipped.

source

make_reads(::Type{TaggedPairs}, p::MoleculePool, flen::Int, rlen::Int = flen)

Create a set of tagged paired-end reads from a pool of DNA molecules p.

flen sets the length of forward read, and rlen sets the length of the reverse read. If you only provide flen, then the function sets rlen = flen.

When a set of TaggedPairs is written to file, the tag information is contained in the R1 read of each read-pair. The first 16bp of each R1 read is a sequence that is the tag, and a following 7bp are a buffer between the 16bp tag, and the rest of the read sequence.

Note

If a molecule in the pool is not long enough to create a forward and/or reverse read, then that molecule will simply be skipped.

source

Introducing errors

Pseudoseq.mark_errors — Function.

mark_errors(reads::Reads, rate::Float64)

Create a new set of reads, with errors, from an input set of reads.

When you first create a set of reads using a make_reads method, all the reads in that set are perfect reads. In real sequencing experiments, sequencers make errors when reading a DNA molecule, and are characterised by an error rate. This function lets you simulate this characteristic of sequencers, by marking (at random) positions in the reads that are destined to be errors in the output FASTQ.

Note

Currently, every position in every read is equally likely to be marked as an error.

Note

The number of errors introduced into the reads can be calculated.

\[E = NR\]

Where $E$ is the number of errors, $N$ is the total number of bases in your set of reads, and $R$ is the error rate you provide to this function.

source

Generating FASTQ files

Pseudoseq.generate — Function.

generate(filename::String, reads::Reads)

Write the reads out to a FASTQ formatted file with the given filename.

If this method is used with a paired-end read type, then the FASTQ file will be interleaved; all R1 reads will be odd records, and all R2 reads will be even records in the file.

Note

Reads are named according to the sequence in the input genome they came from. e.g. @Reference_1_R1 means the first sequence in the genome, and @Reference_2_R1 means the second sequence in the genome.

source

generate(R1name::String, R2name::String, reads::Reads{<:PairedReads})

This method only works for paired reads. Instead of interleaving R1 and R2 reads in a single FASTQ file, R1 and R2 reads are partitioned into two seperate FASTQ files.

source