API: MoleculePool
Exported functions
Making a pool of molecules
Pseudoseq.makepool
— Function.makepool(gen::Vector{BioSequence{DNAAlphabet{2}}}, ng::Int = 1, iscircular::Bool = false)
Create a pool of ng
copies of a genome defined by the gen
vector of sequences.
The argument iscircular
is currently not used.
makepool(rdr::FASTA.Reader, ng::Int = 1, iscircular::Bool = false)
Create a pool of ng
copies of the genome read in from the FASTA.Reader
.
The argument iscircular
is currently not used.
makepool(file::String, ng::Int, iscircular::Bool = false)
Create a pool of ng
copies of the genome in the fasta formatted file
.
The argument iscircular
is currently not used.
Molecule pool transformations
Fragment
Pseudoseq.fragment
— Method.fragment(p::MoleculePool, meansize::Int)
Create a new pool by breaking up the DNA fragments in an input pool.
This method breaks up a DNA molecule in a pool p
, such that the average length of the fragments is approximately meansize
.
It fragments a molecule by scattering an appropriate number of breakpoints across the molecule, before cutting the molecule at those breakpoints.
Breakpoints are scattered entirely at random across a molecule. No two or more breakpoints can fall in exactly the same place, as those positions are sampled without replacement.
The appropriate number of breakpoints to scatter across a molecule is calculated as:
Where $L$ is the length of the molecule being fragmented, and $S$ is the desired expected fragment size. This calculation assumes breakpoints fall randomly across the molecule (see above note).
If a DNA molecule being fragmented is smaller than the desired meansize
, then it will not be broken, it will simply be included in the new pool.
Subsampling
Pseudoseq.subsample
— Method.subsample(p::MoleculePool, n::Int)
Create a new pool by sampling an input pool.
DNA molecules in the input pool p
are selected according to the uniform distribution; no one molecule is more or less likely to be selected than another.
Sampling is done without replacement, so it is impossible for the new pool that is created to recieve one molecule the input pool twice.
Tagging
Pseudoseq.tag
— Method.tag(u::MoleculePool, ntags::Int)
Create a pool of tagged DNA molecules from some input pool.
The new tagged pool has the same DNA molecules as the input pool. However, each DNA molecule in the new tagged pool will be assigned a tag in the range of 1:ntags
.
For any tagged molecules in a pool, any other molecules that are derived from that tagged molecule will inherit the same tag. For example, if a DNA fragment in a pool is tagged, and then it is subsequently fragmented during a fragment
transform, then all the smaller fragments derived from that long fragment will inherit that long fragment's tag.
Which fragment gets a certain tag is completely random. It is possible for two distinct DNA molecules in a pool to be assigned the same tag. The likelihood of that happening depends on the size of the tag pool (ntags
), and the number of fragments in the pool.