Build-a-Genome

API: Build-a-Genome

Exported functions

Making an empty chromosome blueprint

Pseudoseq.plan_chromFunction.

Create an empty blueprint for n copies of a chromosome of len base pairs.

source

Planning motif repetitions along chromosomes

plan_repetition(cb::ChromosomeBlueprint, from::UnitRange{Int}, to::UnitRange{Int})

Plan a repetition in a chromosome, where the bases in the from region of the chromosome, are guaranteed to occur again in the to region of the chromosome.

Every copy of the chromosome will have this same repetition.

Creates a new chromosome blueprint, based on the input blueprint cb.

Note

In the new blueprint, the to region of the planned chromosome will be consumed, and cannot be used to plan any other subsequently added features.

source
plan_repetition(cb::ChromosomeBlueprint, from::Int, to::Int, size::Int)

Plan a repetition in a chromosome, where the bases in the from:(from + size - 1) region of the chromosome, are guaranteed to occur again in the to:(to + size - 1) region of the chromosome.

Every copy of the chromosome will have this same repetition.

Creates a new chromosome blueprint, based on the input blueprint cb.

source
plan_repetition(cb::ChromosomeBlueprint, intervals::Vector{UnitRange{Int}})

A conveinience method of plan_repetition. Designed to ease the process of planing a series of repetitions in a chromosome.

Every copy of the chromosome will have these same repetitions.

Creates a new chromosome blueprint, based on the input blueprint cb.

Tip

Use the suggest_regions function to help decide on a set of sites to make heterozygous.

Note

The number of intervals provided must be an even number. This is because intervals 1 & 2 define the first repeat, intervals 3 & 4 define the second, and so on.

Note

In the new blueprint, the regions the repetitions occupy, have been consumed, and cannot be used to plan any other subsequently added features.

source

Planning heterozygosity between chromosome copies

Pseudoseq.plan_hetFunction.
plan_het(cb::ChromosomeBlueprint, pos::Int, alleles::Vector{DNA})

Plan heterozygosity between copies of a chromosome at position pos.

The alleles vector must contain a nucleotide for each copy of the chromosome.

For example if you provided a vector of [DNA_A, DNA_C], for the heterozygous site you're defining at pos, the first copy of the chromosome will have an A, and the second copy of the chromosome will have a C.

Creates a new chromosome blueprint, based on the input blueprint cb.

Tip

Use the suggest_alleles function to help decide on a set of alleles to use.

Note

In the new blueprint, the positions that were used to plan the heterozygosity, have been consumed, and cannot be used to plan any other subsequently added features.

source
plan_het(cb::ChromosomeBlueprint, pos, alle...)

A generic method of plan_het.

Creates a new chromosome blueprint, based on the input blueprint cb.

The pos argument determines which sites in a chromosome are made heterozygous, and depending on the type of argument provided, the behaviour differs slightly:

  • If pos is an integer, then that many available sites are selected at random to be heterozygous.

  • If pos is a float, then it is treated as a proportion of the length of the, chromosome, which is converted into an integer, and that many sites are selected at random to be heterozygous.

  • If pos is a vector of integers, it is trated as a list of positions the user wants to be heterozygous.

The alle... argument is a vararg argument which defines the how bases will be allocated at each heterozygous site.

Depending on the type of the argument provided, the behaviour differs slightly, based on the different methods of suggest_alleles:

  • If alle... is a single integer value. It is taken to mean the number of states each heterozygous site has. At a minumum this number must be two, as a site with only one state is not heterozygous by definition. For each site, which chromosome copies get which state is determined at random.

  • If alle... is a vector of integers (one integer for every chromosome copy), then the nth value in the vector dictates the group the nth chromosome copy belongs to. For example for a triploid, the vector [2, 1, 2] means that there are two groups (group 1 and group 2), and that the second chromosome copy of the three belongs to group 1, and the other two copies belong to group

    1. Chromosome copies that belong to the same group will get the same base at

    each heterozygous site. In our example, copies 1 and 3 will get the same base, as they have been given the same group number. Chromosome copy 2 will end up with a different base. Which base corresponds to which group number will be determined at random.

  • If alle... is multiple vectors of integers, then each vector constitutes a group, and each value in the vector determines which copies are in that group. It is alternative way to giving the same information as you can with a single vector (as above). E.g. consider the example of [2, 1, 2] from the previous point. This can be expressed as two vectors of [2] and [1, 3]: The first vector contains a number indicating the second chromosome copy, and the second vector contains numbers representing the first and third chromosome copy.

  • If alle... is a vector of nucleotide vectors, then the nth vector of nucleotides determines which chromosome copy recieves which base at the nth heterozygous position. For example, for a triploid, [[DNA_A, DNA_A, DNA_T], [DNA_G, DNA_C, DNA_C]] would mean that at the first heterozygous site, the first and second copies of the chromosome would recieve an DNA_A base, and the third copy would recieve a DNA_T base. For the second heterozygous position, the first copy of the chromosome would recieve a DNAG base, and the other two copies will recieve a `DNAC` base.

(See also: plan_repetition)

source

Fabricate

Pseudoseq.fabricateFunction.
fabricate(file::String, cb...)

Fabricate the sequence(s) planned in a number chromosome blueprints.

The fabricated sequences will be written out to the FASTA formatted file file.

cb... should be provided as a series of blueprint, seed-sequence pairs. E.g. fabricate("mygenome.fasta", chr1plan => ch1seq, ch2plan => ch2seq).

source
fabricate(fw::FASTA.Writer, cb...)

Fabricate the sequence(s) planned in a number chromosome blueprints.

The fabricated sequences will be written out to the FASTA formatted file fw.

cb... should be provided as a series of blueprint, seed-sequence pairs. E.g. fabricate("mygenome.fasta", chr1plan => ch1seq, ch2plan => ch2seq).

source
fabricate(cb::ChromosomeBlueprint)

Fabricate the sequence(s) planned in the chromosome blueprint.

A random DNA sequence will be generated to use as a seed sequence.

A sequence will be built for each chromosome copy in the blueprint.

source
fabricate(cb::ChromosomeBlueprint, seed::BioSequence{DNAAlphabet{2}})

Fabricate a DNA sequence by applying the planned features in a chromosome blueprint, to some initial starting seed sequence.

A sequence will be built for each chromosome copy in the blueprint.

source

Utility functions

suggest_regions(cp::ChromosomeBlueprint, size::Int, n::Int)

A useful utility function to assist planning features in a chromosome blueprint.

This method returns a vector of non-overlapping, regions of the chromosome planned represented by the ChromosomeBlueprint cb.

These regions are free regions: they are untouched by any other planned features, and so may be used when planning other features (See also: plan_repetition, plan_het).

The regions will be sizebp in length.

Warning

This function was designed for use interactively in a julia session.

If this method cannot find n free regions of the size you've asked for, it will still give an output vector containing the regions it did manage to find, but it will issue a warning to the terminal. This will get increasingly likely as you fill the chromosome blueprint up with features.

If you are using this method in a script or program where you depend on a reliable number of regions, either add a check to make sure you got the number of regions you need, or use this method interactively, and hard code an appropriate output into your script.

source
suggest_alleles(groups::Vector{Int})

Suggest an allele pattern for a single heterozygous site.

The groups vector should have a number of The nth value in the groups designates a group to the nth chromosome copy.

E.g. consider for a triploid, the vector [2, 1, 2] means that there are two groups (group 1 and group 2), and that the second chromosome copy of the three belongs to group 1, and the other two copies belong to group 2.

Chromosome copies that belong to the same group will get the same base at each heterozygous site.

For the vector [2, 1, 2], this method might return a suggested allele pattern such as [DNA_A, DNA_G, DNA_A].

Note

Which base corresponds to which group number will be determined at random.

source
suggest_alleles(groups::Vector{Int}...)

Suggest an allele pattern for a single heterozygous site.

Multiple integer vectors make up groups.

Each vector defines a group, and each value in a vector determines which copies are allocated to that group.

E.g. consider for a triploid, two vectors of [2] and [1, 3] passed as arguments to groups. They define two groups, the first vector contains a number denoting the second chromosome copy, and he second vector contains numbers representing the first and third chromosome copy.

For the vectors [2] and [1, 3], this method might return a suggested allele pattern such as [DNA_A, DNA_G, DNA_A].

Note

Which base corresponds to which group number will be determined at random.

source
suggest_alleles(ncopies::Int, ngroups::Int)

Suggest an allele pattern for a single heterozygous site.

By providing a number representing the number of copies of a chromosome, and a number representing the number of groups (which at a minimum must be 2). This method of suggest_alleles will randomly allocate the ncopies of the chromosome into the ngroups groups.

For example, if you ask for a suggested allele pattern for 3 copies of a chromosome, and 2 groups, this method might return a suggested pattern of [DNA_G, DNA_A, DNA_G].

Note

Which base corresponds to which group number will be determined at random.

source
suggest_alleles(npositions::Int, args...)

Suggest allele patterns for numerous heterozygous sites.

This method of suggest_alleles calls another suggest_alleles with args... npositions times, and returns a vector of the results of each call.

source