[Guidelines for Nomenclature 2]


Guidelines for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat 


Revised: September, 2011

International Committee on Standardized Genetic Nomenclature for Mice

Chairperson: Dr. Janan T. Eppig 
(e-mail:janan.eppig@jax.org)

Rat Genome and Nomenclature Committee

Chairperson: Dr. Goran Levan
(e-mail: Goran.Levan@gen.gu.se)



3   Names and Symbols for Variant and Mutant Alleles

Different alleles of a gene or locus can be distinguished by a number of methods, including DNA fragment length, protein electrophoretic mobility, or variant physiological or morphological phenotype.

All mutant alleles, whether of spontaneous or induced origin, targeted mutations, gene traps, or transgenics should be submitted to MGD (mouse) or RGD (rat) for an allele or gene accession identifier.


3.1   Mutant Phenotypes

3.1.1   Genes Known Only by Mutant Phenotypes

Where a gene is known only by mutant phenotype, the gene is given the name and symbol of the first identified mutant. Symbols of mutations conferring a recessive phenotype begin with a lowercase letter; symbols for dominant or semidominant phenotype genes begin with an uppercase letter.

Examples:
In mouse, recessive spotting, rs; abnormal feet and tail, Aft; circling, cir
In rat, polydactyly-luxate, lx.

Further (allelic) mutations at the same locus, if they have the same phenotype, are given the same name with a Laboratory code preceded by a serial number (if more than one additional allele from the same lab). In the symbol the Laboratory code is added as a superscript.

  • For example, agil2J, the second new allele of mouse agitans-like identified at The Jackson Laboratory.

If a new allelic mutation of a gene known only by a mutant phenotype is caused by a transgenic insertion, the symbol of this mutation should use the symbol of the transgene as superscript (see Section 3.4.2 and Section 4).

  • awgTg(GBtslenv)832Pkw; mutation of abnormal wobbly gait caused by a transgene, mouse line 832, produced in the laboratory of Paul Wong. (An abbreviated form, awgTg832Pkw can be used if the abbreviated designation is unique).

If the additional allele has a different phenotype, it may be given a different name, but when symbolized the new mutant symbol is superscripted to the original mutant symbol. Also, if a new mutation is described and named but not shown to be an allele of an existing gene until later, the original name of the new mutation can be kept. Even if the phenotype is apparently identical, the original symbol is used, with the new mutation symbol as superscript.

For example

  • grey coat is an allele of recessive spotting (rs) in the mouse, and hence is symbolized rsgrc.

3.1.2   Phenotypes Due to Mutations in Structural Genes

When a spontaneous or induced mutant phenotype is subsequently found to be a mutation in a structural gene, or the gene in which the mutation has occurred is cloned, the mutation becomes an allele of that gene and the symbol for the mutant allele is formed by adding the original mutant symbol as a superscript to the new gene symbol. (The mutant symbol should retain its initial upper or lowercase letter).

  • The hotfoot (ho) mutation of the mouse glutamate receptor Grid2Grid2ho.
  • The dominant white spotting (W) mutation of mouse KitKitW

If the original mutation has multiple alleles, when describing these alleles, their symbols become part of the superscript to the identified structural gene.

  • creeper, Grid2ho-cpr.
  • viable white spotting, KitW-v; sash, KitW-sh.

Even if the identified gene is novel and unnamed, it is recommended that it is nevertheless given a name and symbol different from the mutant name and symbol. This will more readily allow discrimination between mutant and wild type and between gene and phenotype.

3.1.3   Wild Type Alleles and Revertants

The wild type allele of a gene is indicated by + as superscript to the mutant symbol.

  • The wild type allele of the agitans-like mutation, agil+.
  • The wild type Kit locus (if necessary to distinguish from mutations), Kit+.

A revertant to wild type of a mutant phenotype locus should be indicated by the symbol + with the mutant symbol as superscript.

  • Revertant to wild type at the hairless mutant locus +hr

Additional revertants are given a Laboratory Code and preceded by a serial number if more than one revertant is found in a lab. Serial numbers are independent for mouse and rat revertants and no homology is implied. If the revertant is in a gene that has been cloned, then the mutant symbol is retained as superscript to the gene symbol, and + is appended.

  • Revertant to wild type of the dilute mutation of myosin Va; Myo5ad+
  • Second such revertant identified at The Jackson Laboratory; Myo5ad+2J.


3.2   Variants

3.2.1   Biochemical Variants

Electrophoretic or other biochemicalvariant alleles of known structural genes are usually given lowercase letters to indicate different alleles, and in the symbol the letter becomes a superscript to the gene symbol.

  • For example, glucose phosphate isomerase 1 alleles a and b; Gpi1aGpi1b.

  3.2.2   DNA Segment Variants

Variants of DNA segments are indicated by a superscript to the symbol. The symbol is usually an abbreviation for the inbred strain in which the variant is being described. However, a particular allele may be found in several inbred strains, and, furthermore, it may be difficult to establish whether an allele in one strain is identical to one in another. The use of allele symbols for DNA segments is mainly limited to describing inheritance and haplotypes in crosses. As long as the symbols are defined in the description, users are free to use whatever allele symbol best fits their needs. In tables of genotypes, the gene symbol can be omitted and the allele abbreviation used alone.

  • D11Mit19aD11Mit19bD11Mit19c are variant alleles of D11Mit19 in mouse.

3.2.3   Single Nucleotide Polymorphisms (SNPs)

Polymorphisms defined by SNPs may occur within or outside of a protein coding sequence.

If the SNP occurs within a gene, the SNP allele can be designated based on its dbSNP_id, followed by a hyphen and the specific nucleotide.

Examples:
Park2rs6200232-G  The Park2 rs6200232 SNP allele with the G variant
Park2rs6200232-AThe Park2 rs6200232 SNP allele with the A variant

If the SNP occurs outside of an identified gene, the SNP locus can be designated using the dbSNP_id as the locus symbol and the nucleotide allelic variants are then superscripted as alleles. If a gene is later discovered to include this SNP locus, the same guidelines are applicable as those used when mutant locus symbols become alleles of known genes.

Examples:
rs6200616T  A SNP locus with the T variant
rs6200616CA SNP locus with the C variant

Note: If a gene Xyz is later discovered to include this SNP locus, rs620061, then the alleles listed above become Xyzrs620061-T and Xyzrs620061-C.

3.3  Variation in Quantitative Trait Loci and in Response and Resistance Genes

Variation in genes that do not give rise to a visible phenotype may be detected by assaying physiological or pathological parameters. Examples of this type of variation include levels of metabolite, immune response to antigen challenge, viral resistance, or response to drugs. Genetic variation may also produce phenotypic variation in morphology, behavior, or other observable traits that interact in a complex manner with other genes and/or with the environment.

These genes can only be identified by virtue of allelic variation. In most cases, there will not be a clear wild type; hence all alleles should be named. In most cases, the alleles should be named according to their strain of origin and symbolized by adding the strain abbreviation as superscript, although for resistance and sensitivity, variants r and s may be used. Bear in mind that resistance alleles deriving from different strains may not be the same and should be given different names and symbols.

Once the gene underlying a quantitative trait has been cloned or identified, the phenotypic name should be replaced by the name of the identified gene. The allele names and symbols should be the same as those used for the phenotype.

Examples:
Slc11a1rsolute carrier family 11, host resistance allele
Slc11a1ssolute carrier family 11, host susceptibility allele 
(the QTL originally known as BCG/Lsh resistance has been identified as Slc11a1)
Scc2BALB/cHeA  colon tumor susceptibility 2, BALB/cHeA allele
Scc2STS/Acolon tumor susceptibility 2, STS/A allele
(for QTL Scc2, the STS/A allele has increased tumor susceptibility vs. BALB/cHeA)

3.4   Insertional and Induced Mutations

Mutations that are induced, targeted, or selected in structural genes are named as alleles of the structural gene.

3.4.1   Mutations of Structural Genes

Variants of structural genes that are clearly mutations, whether or not they confer a phenotype, are given the superscript m#Labcode, where # is a serial number and is followed by the Laboratory code where the mutation was found or characterized. Serial numbers are independently assigned in mouse and rat and the same assigned serial number does not imply orthology. If the mutation is known to have occurred on a particular allele, that can be specified by preceding the superscript with the allele symbol and a hyphen.

  • for example, Mod1a-m1Lws is a mutation of the mouse Mod1a allele, the first found in the laboratory of Susan Lewis.

If the mutation is shown to be a deletion of all or part of the structural gene, the superscript del can be used in place of m. Note that this should be used only for deletions that encompass a single gene; larger deletions should use the chromosomal deletion nomenclature.

3.4.2    Transgenic Insertional Mutations

Mutations produced by random insertion of a transgene (not by gene targeting) are named as a mutant allele of the gene (which should be given a name and symbol if it is a novel gene), with the superscript the symbol for the transgene (see Section 3.1.1 for examples, and Section 4 for naming transgenes).


3.5    Targeted and Trapped Mutations

3.5.1   Knockout, Knockin, Conditional and Other Targeted Mutations

Mutations that are the result of gene targeting by homologous recombination in ES cells are given the symbol of the targeted gene, with a superscript consisting of three parts: the symbol tm to denote a targeted mutation, a serial number from the laboratory of origin and the Laboratory code where the mutation was produced (see Section 2.1).

  • For example, Cftrtm1Unc is the first targeted mutation of the cystic fibrosis transmembrane regulator (Cftr) gene produced at the University of North Carolina.

So-called "knock in" mutations,  in which all or part of the coding region of one gene is replaced by another,  should be given a tm symbol and the particular details of the knock-in associated with the name in publications or databases. Where there has been a replacement of the complete coding region, the replacing gene symbol can be used parenthetically as part of the allele symbol of the replaced gene along with a Laboratory code and serial number.

  • For example, En1tm1(Otx2)Wrst where the coding region of En1 was replaced by the Otx2 gene, originating from the W. Wurst laboratory.

Knock in alleles expressing a RNAi under the control of the endogenous promoter can be designated using targeted mutation or transgene mutation nomenclature, as appropriate:

Example:
Genetm#(RNAi:Xyz)Labcode

When a targeting vector is used to generate multiple germline transmissible alleles, such as in the Cre-Lox system, the original knock-in of loxP would follow the regular tm designation rules. If a second heritable allele was then generated after mating with a cre transgenic mouse, it would retain the parental designation followed by a decimal point and serial number.

  • Tfamtm1Lrsn and Tfamtm1.1Lrsn. In this example, Tfamtm1Lrsn designates a targeted mutation where loxP was inserted into the Tfam gene. Tfamtm1.1Lrsn designates another germline transmissible allele generated after mating with a cre transgenic mouse. Note: somatic events generated in offspring from a Tfamtm1Lrsn bearing mouse and a cre transgenic that cause disruption of Tfam in selective tissues would not be assigned nomenclature.

Other more complex forms of gene replacement, such as partial "knock-in", hit-and-run, double replacements, and loxP mediated integrations are not conveniently abbreviated and should be given a conventional tm#Labcode superscript. Details of the targeted locus should be given in associated publications and database entries.

Note that although subtle alterations made in a gene appear to lend themselves to a simple naming convention whereby the base or amino acid changes are specified, in fact these do not provide unique gene names, as such alterations, which could be made in independent labs, while bearing the same changes, may differ elsewhere in the gene.

Large-scale projects that systematically produce a large number of alleles (>1000) may include a project abbreviation in parentheses as part of the allele designation. These should retain the accepted nomenclature features of other alleles of that class. For example, a targeted allele created by Velocigene (Regeneron) in the KOMP knockout project:

Gstm3tm1(KOMP)Vlcg

Once fully designated in a publication, the allele can be abbreviated by removing the portion of the allele designation in parentheses (in this case, Gstm3tm1Vlcg), providing the symbol remains unique.

3.5.2   Endonuclease-induced Mutations

Endonuclease-induced mutations are targeted mutations generated in pluripotent or totipotent cells by an endonuclease joined to sequence-specific DNA binding domains. The mutation is introduced during homology-directed or non-homologous end-joining repair of the induced DNA break(s). Endonuclease-induced mutations are given the symbol of the mutated gene, with a superscript consisting of three parts: the symbol em to denote an endonuclease-induced mutation, a serial number from the laboratory of origin and the Laboratory code where the mutation was produced.

Example:
Fgf1em1Mcw I   the first endonuclease-induced mutation of the fibroblast growth factor 1 (Fgf1) gene produced at the Medical College of Wisconsin.

3.5.3    Gene Trap Mutations

Gene trap mutations are symbolized in a similar way to targeted mutations. If the trapped gene is known, the symbol for the trapped allele will be similar to a targeted mutation of the same gene using the format Gt(vector content)#Labcode for the allele designation. Example:

Akap12Gt(ble-lacZ)15Brr   a gene trap allele of the Akap12 gene, where the gene trap vector contains a phleomycin resistance gene (ble) and lacZ, the 15th analyzed in the laboratory of Jacqueline Barra (Brr).

If the trapped gene is novel, it should be given a name and a symbol, which includes the letters Gt for "gene trap," the vector in parentheses, a serial number, and Laboratory code.

  • For example, a gene trapped locus (where the gene is unknown) using vector ROSA, the 26th made in P. Soriano's laboratory, is Gt(ROSA)26Sor.

For high throughput systematic gene trap pipelines, the mutant ES cell line's designation can be used in parentheses instead of the vector designation, and the serial number following the parentheses may be omitted.

Examples:
Gt(DTM030)Bygfor a trapped gene (at an undefined locus) in mutant ES cell line DTM030, made by BayGenomics
Osbpl1aGt(OST48536)Lex  gene trap allele of the oxysterol binding protein-like 1A gene, in mutant ES cell line OST48536, made by Lexicon Genetics, Inc.

3.5.4    Enhancer Traps

Enhancer traps are specialized transgenes. One utility of these transgenes is in creating cre driver lines. Enhancer traps of this type that are currently being created may include a minimal promoter, introns, a cre recombinase cassette (sometimes fused with another element such as ERT2), and polyA sites from different sources.

Nomenclature for these enhancer traps consists of 4 parts as follows:

Et   prefix for enhancer trap
cre recombinase cassette   portion in parentheses...
for example, cre, icre, or cre/ERT2 (if fused with ERT2)
line number or serial number   to designate lab trap number or serial number
Lab code   ILAR code identifying the creator of this enhancer trap

Examples:
Et(icre)1642Rdav   Enhancer trap 1642, Ron Davis
Et(cre/ERT2)2047Rdav   Enhancer trap 2047, Ron Davis

Note that the minimal promoter, poly A source, etc. are not part of the enhancer trap nomenclature. These are molecular details of the specific construct that will be captured in database records and reported with experimental results.



4   Transgenes

Any DNA that has been stably introduced into the germline of mice or rats is a transgene. Transgenes can be broken down into two categories:

  • Those that are produced by homologous recombination as targeted events at particular loci.
  • Those that occur by random insertion into the genome (usually by means of microinjection).

Nomenclature for targeted genes is dealt with in Section 3.5. Random insertion of a transgene in or near an endogenous gene may produce a new allele of this gene. This new allele should be named as described in Section 3.4.2. The transgene itself is a new genetic entity for which a name may be required. This section describes the guidelines for naming the inserted transgene.

It is recognized that it is not necessary, or even desirable, to name all transgenes. For example, if a number of transgenic lines are described in a publication but not all are subsequently maintained or archived, then only those that are maintained require standardized names. The following Guidelines were developed by an interspecies committee sponsored by ILAR in 1992 and modified by the Nomenclature Committee in 1999 and 2000. Transgenic symbols should be submitted to MGD or RGD/RatMap through the usual nomenclature submission form for new loci. The transgene symbol is made up of four parts:

  • Tg denoting transgene.
  • In parentheses, the official gene symbol of the inserted DNA, using nomenclature conventions of the species of origin.
  • The laboratory's line or founder designation or a serial number (note that numbering is independent for mouse and rat series).
  • The Laboratory code of the originating lab.

Note that, in contrast to gene and allele symbols, transgene symbols are not italicized as they are random insertions of foreign DNA material and are not part of the native mouse genome.

Examples:
Tg(Zfp38)D1Htza transgene containing the mouse Zfp38 gene, in line D1 reported by Nathaniel Heintz.
Tg(CD8)1Jwga transgene containing the human CD8 gene, the first transgenic line using this construct described by the lab of Jon W. Gordon.
Tg(HLA-B*2705, B2M)33-3Trg  a double transgene in rat containing the human HLA-B*2705 and B2M genes, that were co-injected, giving rise to line 33-3 by Joel D. Taurog.

The *, as used in the last example above, indicates that the included gene is mutant.

Different transgenic constructs containing the same gene should not be differentiated in the symbol; they will use the same gene symbol in parentheses and will be distinguished by the serial number/Laboratory code. Information about the nature of the transgenic entity should be given in associated publications and database entries.

In many cases, a large number of transgenic lines are made from the same gene construct and only differ by tissue specificity of expression. The most common of these are transgenes that use reporter constructs or recombinases (e.g., GFP, lacZ, cre), where the promoter should be specified as the first part of the gene insertion designation, separated by a hyphen from the reporter or recombinase designation. The SV40 large T antigen is another example. The use of promoter designations is helpful in such cases.

Examples:
Tg(Wnt1-LacZ)206Amc  the LacZ transgene with a Wnt1 promoter, from mouse line 206 in the laboratory of Andrew McMahon.
Tg(Zp3-cre)3Mrtthe cre transgene with a Zp3 promoter, the third transgenic mouse line from the laboratory of Gail Martin.

In the case of a fusion gene insert, where roughly equal parts of two genes compose the construct, a forward slash separates the two genes in parentheses.

Example:
Tg(TCF3/HLF)1Mlc  a transgene in which the human transcription factor 3 gene and the hepatic leukemia factor gene were inserted as a fusion chimeric cDNA, the first transgenic mouse line produced by Michael L. Cleary's laboratory (Mlc).

This scheme is to name the transgene entity only. The mouse or rat strain on which the transgene is maintained should be named separately as in the Rules and Guidelines for Nomenclature of Mouse and Rat Strains. In describing a transgenic mouse or rat strain, the strain name should precede the transgene designation.

Examples:
C57BL/6J-Tg(CD8)1Jwgmouse strain C57BL/6J carrying the Tg(CD8)1Jwg transgene.
F344/CrlBR-Tg(HLA-B*2705, B2M)33-3Trg  rat strain F344/CrlBR carrying the Tg(HLA-B*2705,B2M)33-3Trg double transgene.

For BAC transgenics, the insert designation is the BAC clone and follows the same naming convention as the Clone Registry at NCBI.

Example:
Tg(RP22-412K21)15Som  a BAC transgene where the inserted BAC is from the RP22 BAC library, plate 412, row K, column 21. It is the 15th in the mouse made in the laboratory of Stefan Somlo (Som).

Transgenes containing RNAi constructs can be designated minimally as:

Tg(RNAi:geneX)#Labcode, where
geneXis the gene that is knocked down
#is the serial number of the transgene

An expanded version of this designation is:

Tg(Pro-yyRNAi:geneX)#Labcode, where
Pro-can be used optionally to designate the promoter
yycan be used optionally for the specific RNAi construct

While there is the option to include significant information on vectors, promoters, etc. within the parentheses of a transgene symbol, this should be minimized for brevity and clarity. The function of a symbol is to provide a unique designation to a gene, locus, or mutation. The fine molecular detail of these loci and mutations should reside in databases such as MGD and RGD.



5    Transposon-induced Mutations and Inserts

Three types of genetic inserts are involved in creating transposon-induced mutations. Two lines, one carrying the transposable-element as a concatamer and the other carrying the transposase are mated. This causes the transposable-element to come in contact with the transposase and to be mobilized from its original site, and, when reintegrated into the genome, can cause a heritable phenotypic mutation. (c.f.Ding, et al.,2005; Bestor, 2005;Dupuy, et al., 2005). Accepted nomenclature for the transposable-element inserts, transposase transgenes, and resulting transposed insertion alleles are given below.

5.1   Transgenic Transposable Element (TE) Concatamers

The transgenic transposable element concatamers are identified with a standard prefix Tg (for transgenic) and Tn (for transposable element). The class of transposable element may be included in parentheses. The general format of the symbol is:

TgTn(transposon_class_abbreviation-vector)#Labcode

Example:  TgTn(sb-T2/GT2/tTA)1Dla

The symbol consists of:
  • Tg denoting transgenic
  • Tn denoting transposon
  • In parentheses, a lowercase abbreviation of the transposon class (in this case sb for Sleeping Beauty), followed by a hyphen and the vector designation
  • The laboratory's line or founder designation or a serial number
  • The Laboratory Code of the originating lab

5.2  Transposase Inserts

Transposases can be engineered into the genome via transgenesis or specific gene targeting. In these cases the relevant nomenclature for transgenes or targeted mutations is used.

For a transgene, use the standard prefix Tg (for transgene). The contents of the parentheses will usually be the promoter and the symbol for the transposase with which it is associated, separated by a hyphen. The general format of the symbol is:

Tg(promoter-transposase)#Labcode

Example:  Tg(ACTB-sb10)545Abc

The symbol consists of:
  • Tg denoting transgene
  • In parentheses, the official gene symbol for the promoter, using the nomenclature of the species of origin, followed by a hyphen and a lowercase transposase symbol, in this case sb10 for the Sleeping Beauty 10 transposase
  • The laboratory's line or founder designation or a serial number
  • The Laboratory code of the originating lab.

For a targeted knock-in of the transposase, use the standard format for a targeted mutation, i.e., the symbol of the targeted gene with a superscripted allele symbol beginning with the prefix tm. The contents of the parentheses will usually be the symbol for the transposase with which it is associated. The general format of the symbol is:

Genetm#(transposase)Labcode

Example:  Gt(ROSA)26Sortm1(sb11)Njen

The symbol consists of:
  • The gene into which the transposase was integrated, in this case Gt(ROSA)26Sor
  • In the superscript:
    • tm denoting targeted mutation
    • A serial number of the targeted mutation
    • In parentheses, a lowercase transposase symbol, in this case sb11 for the Sleeping Beauty 11 transposase
    • The Laboratory Code of the originating lab

5.3   Transposed Insertion Alleles

These alleles follow the rules for naming all other alleles. In general a transposable element concatamer marker will already be established, as above. The new allele, then, will be a superscripted form of the concatamer symbol. Note that all such alleles that are "derived from" a transposable element concatamer carry the original number with a decimal point and serial number identifying the specific allele. The general format is:

GeneTn(transposon_class_abbreviation-vector)#Labcode

Example: Car12Tn(sb-T2/GT2/tTA)1.1Dla

The symbol consists of:
  • The gene into which the transposable element was integrated (transposed)
  • In the superscript:
    • Tn denoting transposon
    • In parentheses, a lowercase abbreviation of the transposon class (in this case sb for Sleeping Beauty), followed by a hyphen and the vector designation
    • A serial number, in which the primary number corresponds to that given to the transposable element concatamer from which it arose, followed by a decimal point and a serial number designating its number within the series of derivative insertion alleles.
    • The Laboratory Code of the lab originating the transposable element line.
  • If a newly transposed insertion occurs in an unknown site or intergenic region, the form:

    Tn(transposon_class_abbreviation-vector)#Labcode

    is used to symbolize the "genomic mutation" without being superscripted to a gene symbol, similar to the way a random transgene inserted into a non-gene site is designated.


    + Recent posts