Motifs

What are protein sequence motifs?

A motif is a distinct and recurrent sequence that implies the conservation of short regions within larger sequences. A protein sequence motif is a unique amino acid (AA) sequence that is distinguishable from the rest of the protein universe. These motifs are often believed to reflect functional and structural constraints and their amino acid sequence consistency often implies a common descent or homology. These putative protein motifs are often determined using computational methods that are based upon the alignment of the orthologs of many organisms [1].

MOTIF is an online database that helps to identify protein sequence motifs through multiple sequence alignments, using certain motif libraries such as PROSITE or Pfam. MEME is another tool for discovering motifs in a group of related protein sequences. MEME creates sequence logos (visual stacks of letters) that use position-dependent letter-probability matrices to help determine the probability of each possible letter (amino acid) at each position in the sequence. The height of each letter depicts the level of conservation at that specific position [2].

What motifs were identified in the amino acid sequence of the AAKG2 protein?

MOTIF was able to identify the four CBS domains (see Protein Domains Page for more information) of the AAKG2 protein, suggesting that these are highly conserved regions of the protein sequence. Once these sequences were identified, MEME was used to create a sequence logo for each of the amino acid sequences that coded the specific CBS domain for all 11 homologous organisms' proteins.

Figure 1. The AA sequence motif of the CBS1 domain of the AAKG2 protein using MEME.

Figure 2. The AA sequence motif of the CBS2 domain of the AAKG2 protein using MEME.

Figure 3. The AA sequence motif of the CBS3 domain of the AAKG2 protein using MEME.

Figure 4. The AA sequence motif of the CBS4 domain of the AAKG2 protein using MEME.

CBS1 Motif
E-Value: 7.1e-331
AA Length: 50
Position: 278-328

CBS2 Motif
E-Value: 1.7e-290
AA Length: 50
Position: 359-408

CBS3 Motif
E-Value: 5.2e-301
AA Length: 47
Position: 435-481

CBS4 Motif
E-Value: 2.0e-251
AA Length: 47
Position: 508-554

Analysis

The results above show the alignment and frequency of each amino acid in the conserved CBS domain region of the PRKAG2 gene. The size of each nucleotide represents the likelihood of that specific amino acid being found in that position. CBS 1 domain is highly conserved due to the extensive number of large nucleotides present in it's DNA motif, whereas CBS 4 has a bit more variation between it's amino acids.

The E-value's obtained from the motif analyses also give insight into the significance of the results obtianed. The lower the E-value the more "significant" the match, suggesting that the conserved amino acids of all fourCBS domains are legitimate and not due to chance [3].

In the future it would be necessary to look at the whole PRKAG2 sequence in order to see if there are any significant motifs that lay outside the four CBS domains.

References

[1] Bork, P., and Koonin, E.V. (1996). Protein sequence motifs. Structural Biology, 6(3):366-76. doi: 10.1016?S0959-440X(96)80057-1
[2] Bailey, T.L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology: 28-36.
[3] Clark, Francis. "An Introduction to Blast." Feb. 2006. Web. 10 May 2013. <http://www.clarkfrancis.com/blast/Blast_what_and_how.html>

Margaret Beatka ([email protected])
Page Last Updated: 5/10/13
This web page was produced as an assignment for Genetics 677, as an undergraduate course at UW-Madison.