## EffiSim - Measuring the efficiency of mutations screens for recessive phenotypes

EffiSim is program that calculates two different measures of efficiency of mutation screens for recessive phenotypes.
The first measure is the expected number of distinct mutations screened, and the second is the expected number
of mutations screened for a given amount of work. The first measure is called the *efficiency* and the second
is called the *balanced efficiency*. This program can be used to determine the most efficient design (i.e.
it will screen, on average, the largest number of mutations for a given amount of work), given the costs of a screen.
It can also be used to help determine which of the backcross (BC) and intercross (IC) breeding strategies is most
appropriate for a given mutation screen.

To download EffiSim, click here and then untar with the command:`tar -xvf EffiSim.tar`

A screen for recessively acting mutations requires a three-generation breeding strategy to produce homozygotes for
the novel mutations. The design of a screen describes the numbers of mice bred at each generation. Let us define
a few variables. Let *x* be the number of mutations inherited per G_{1}, *d* be the number of G_{1} pairs,
*h* be the number of G_{2} crosses per G_{1} pair (*h* = 1 for the backcross, however the intercross can
have *h* > 1), *k* be the number of G_{2} females mating with each G_{2} male, *n* be the number of G_{3} pups
born to each G_{2} male and *r* be the number of pups born to each G_{2} male (thus *r* = *nk*).

An important feature of EffiSim is the cost equation, or the weights attributed to each generation. Any G_{1} mice
carries, and therefore can transmit, only a finite number of mutations. The law of diminishing returns suggests that the
greatest returns (in terms of the numbers of mutations screened) will come after screening the first few mice. This line of
thought implies that to screen the greatest number of distinct mutations, one should screen only
few G_{3} progeny of any G_{1} pair. However, the generation of the
G_{1} and G_{2} mice is time-consuming and expensive in itself. This implies that it is worth screening many
G_{3} pups to make this effort worthwhile. Hence there are two competing interests that must be balanced. This can
be done by finding the screen that has the greatest balanced efficiency - it will screen the greatest number of equations
for a given amount of work.

To do this requires careful consideration of where the costs go in a screen. One simple way of doing this would be to
calculate the average amount of work required (*w _{Gi}*) for each mouse in generation

*i*. Using the design parameters above (

*d*,

*h*,

*k*, and

*r*), we can calculate the number of mice in each generation. Putting these things together, we have a way of calculateing the costs of a screen that uses the same variables that are required to estimate the returns. Thus it is important that the weights accurately represent the cost/labour/time/space/effort involved in conducting the screen. The equation used to calculate the cost of the screen is

*g(d,h,k,r)*= 2

*dw*1 + 2

_{G}*dhk*2 +

_{G}*dhr*3. This equation has been designed to take into account all mice bred, not only mice that are used in the screen (e.g. in the backcross, any male G

_{G}_{2}mice are not used in the screen). It should be altered as necessary.

There are two ways of using EffiSim. The first requires you to enter the design of the mutation screen through command line. The second allows you to describe a range of designs (or simply a single design) through a list of possible values for each of the parameters, and to enter these into a file. In either case, EffiSim will provide the results in an output file.

To run EffiSim, you will require PERL. To run this script, type:```
perl EffiSim.pl [options]
```

Option | What it changes | Default |

`-p` | Specifies the parameters of the screen. They proceed in the following order, without spaces, separated by
commas: length of the target interval (either 'g' for a genome-wide screen or, for a regional screen, the length in cM), breeding scheme
('ic' for the intercross, and 'bc' for the backcross), type of screen ('m' for mutation screen,
and 's' for sensitised screen), how to fix the number of G_{3} pups (r means fixed per G_{2} male and
n means fixed per G_{2} female), the number of G_{1} pairs (d), the number of G_{2}
crosses per G_{1} pair (h), the number of G_{2} females mating with each G_{2} male (k),
the fixed number of G_{3} pups (either per G_{2} male or per G_{2} female). | No default. Either `-p` or
`-f` must be specified |

`-f` | The name of file that specifies a range of parameters over which the efficiency should be calculated. It must follow the
format of the file Example.txt. Most of the other options can be specified using this
parameter file. If you want to specify weights other than 1 for all generations, then you must use this option
| No default. Either `-p` or `-f` must be specified |

`-n` | The name of the file where the results are to be sent. | No default. This file must be specified |

`-x` | Specifies the expected number of mutations inherited by each G_{1} pup. This must be an integer. | 100 |

`-y` | If a mutation is 'screened', it must be homozygous in y mice. This must be a positive integer. | 1 |

`-o` | Output type. The short option,'s' gives all the results for each design in a single line, while the long option 'l' lists the number of muations that
are screened y mice. If you want to compare a large number of screen designs, viewing the results is probably easiest in
the short format. | s |

`-s` | Are simulations performed or not? If not, it is followed by 0. Then only the theoretical efficiency is calculated (this may be much faster). If a positive integer follows this option, then this will provide the number of times the simulation is repeated. | 0 |

Each option must be followed by a colon and then the value that is required. Spaces are required in between different options, but
not between the option, its colon and its value. Here are a few examples:

`perl EffiSim.pl -f:Example.txt -n:output.txt -o:s`

perl EffiSim.pl -p:g,bc,m,r,2,1,4,20 -n:output.txt -o:l -x:50 -y:2 -s:100

To reference this program, please cite:

Silver J.D, Hilton D.J., Bahlo M., Kile B.T. (2006) Efficient breeding strategies for the generation of ENU mutant mice with recessive phenotypes.
Submitted for publication.

Finally, there is a file that lists all bugs/revisions/comments that have been reported.

Comments/Questions? Contact Jeremy Silver: silver@wehi.edu.au.

Last modified: 19-12-2005