Gene Sequence - Burkitt Lymphoma and MYC

This web page was produced as an assignment for Genetics 677, an undergraduate course at UW-Madison.

MYC Database Identification

Gene Accession #: NG_007161
mRNA Accession #: NM_002467
Protein Accession #: NP_002458.2
Total Base Pairs: 5349
Total Amino Acids: 454

Obtained from Entrez

DNA Sequence

The DNA sequence for the myc gene contains 5349 base pairs. The link below shows the complete DNA sequence of the myc gene. This link was chosen because it shows where the exons of the sequence are located and gives an idea of how much of the sequence is used to code for the protein.

http://www.ensembl.org/Homo_sapiens/Gene/Sequence?db=core;g=ENSG00000136997

Sequence Alignment

CLUSTAL FORMAT for T-COFFEE Version_7.38 [http://www.tcoffee.org] [MODE:  ], CPU=2.38 sec, SCORE=86, Nseq=5, Len=795 

gi|71774083|ref|NP_002458.2|      MDFFRVVENQQPPATMPLNVSFTNR-NYDLDYDSVQPYFYCDEEENFYQQQQQSELQPPA
gi|114050751|ref|NP_001039539.1|  ---------------MPLNVSFANK-NYDLDYDSVQPYFYCDEEENFYHQQQQSELQPPA
gi|50979040|ref|NP_001003246.1|   ---------------MPLNVSFANR-NYDLDYDSVQPYFYCDEEENFYQQQQQSELQPPA
gi|71834866|ref|NP_036735.2|      MNFLWEVENP-TVTTMPLNVSFANR-NYDLDYDSVQPYFICDEEENFYHQQQQSELQPPA
gi|24639496|ref|NP_525062.2|      MALYRSD----PYSIM-DDQLFSNISIFDMDNDL------YDMDKLLSSSTIQSDLEKIE
                                                 *  :  *:*   :*:* *        * :: :  .  **:*:   

gi|71774083|ref|NP_002458.2|      PSEDIWKKFELLPTPPLSPSRRSGLCSPSYVAVT-PFSLRGDNDGGGGSFSTADQLEMVT
gi|114050751|ref|NP_001039539.1|  PSEDIWKKFELLPTPPLSPSRRSGLCSPSYVAVA-SFSPRGDDDGGGGSFSSADQLEMVT
gi|50979040|ref|NP_001003246.1|   PSEDIWKKFELLPTPPLSPSRRSGLCSPSYVAVA-SFSPRGDDDGGGGSFSTADQLEMVT
gi|71834866|ref|NP_036735.2|      PSEDIWKKFELLPTPPLSPSRRSGLCSPSYVAVATSFSPREDDDGGGGNFSTADQLEMMT
gi|24639496|ref|NP_525062.2|      DMESVFQDYDL-------------------------------------------------
                                    *.:::.::*                                                 

gi|71774083|ref|NP_002458.2|      ELLGGDMVNQSFICDPDDETFIKNIIIQDCMWSGFSAAAKLVS----EKLASYQAARKDS
gi|114050751|ref|NP_001039539.1|  ELLGGDMVNQSFICDPDDETLIKNIIIQDCMWSGFSAAAKLVS----EKLASYQAARKDG
gi|50979040|ref|NP_001003246.1|   ELLGGDMVNQSFICDPDDETFIKNIIIQDCMWSGFSAAAKLVS----EKLASYQAARKDS
gi|71834866|ref|NP_036735.2|      ELLGGDMVNQSFICDPDDETFIKNIIIQDCMWSGFSAAAKLVS----EKLASYQAARKDS
gi|24639496|ref|NP_525062.2|      --------------EEDMKPEIRNI---DCMWPAMSSCLTSGNGNGIESGNSAASSYSET
                                                : * :. *:**   ****..:*:. .  .    *.  *  :: .: 

gi|71774083|ref|NP_002458.2|      GSPNPARGHSVCSTSSLYLQDLSAAASEC----------------------IDPSVVFPY
gi|114050751|ref|NP_001039539.1|  GSPSPARGHGGCSTSSLYLQDLSAAASEC----------------------IDPSVVFPY
gi|50979040|ref|NP_001003246.1|   GSPSPARGPGGCSTSSLYLQDLSAAASEC----------------------IDPSVVFPY
gi|71834866|ref|NP_036735.2|      TSLSPARGHSVCSTSSLYLQDLTAAASEC----------------------IDPSVVFPY
gi|24639496|ref|NP_525062.2|      GAVSLAMVSGSTNLYSAYQRSQTTDNTQSNQQHVVNSAENMPVIIKKELADLDYT-VCQK
                                   : . *   .  .  * * :. ::  ::.                      :* : *   

gi|71774083|ref|NP_002458.2|      PLNDSSSPKSCASQDSSAFSPSSDSLLSST----------------------ESSPQGSP
gi|114050751|ref|NP_001039539.1|  PLNDSSSPKPCASPDSTAFSPSSDSLLSSA----------------------ESSPRASP
gi|50979040|ref|NP_001003246.1|   PLNDSSSPKPCASPDSAAFSPSSDSLLSSA----------------------ESSPRASP
gi|71834866|ref|NP_036735.2|      PLNDSSSPKSCTSSDSTAFSSSSDSLLSS-----------------------ESSPRATP
gi|24639496|ref|NP_525062.2|      RLRLSGGDKKSQIQDEVHLIPPGGSLLRKRNNQDIIRKSGELSGSDSIKYQRPDTPHSLT
                                   *. *.. * .   *.  : ....*** .                        .:*:. .

gi|71774083|ref|NP_002458.2|      ------------------------------------------------EPLVL-------
gi|114050751|ref|NP_001039539.1|  ------------------------------------------------EPLAL-------
gi|50979040|ref|NP_001003246.1|   ------------------------------------------------EPLAL-------
gi|71834866|ref|NP_036735.2|      ------------------------------------------------EPLVL-------
gi|24639496|ref|NP_525062.2|      DEVAASEFRHNVDLRACVMGSNNISLTGNDSDVNYIKQISRELQNTGKDPLPVRYIPPIN
                                                                                  :** :       

gi|71774083|ref|NP_002458.2|      --------------------------------------HEETPPTTSSDSEEE-------
gi|114050751|ref|NP_001039539.1|  --------------------------------------HEETPPTTSSDSEEE-------
gi|50979040|ref|NP_001003246.1|   --------------------------------------HEETPPTTSSDSEEE-------
gi|71834866|ref|NP_036735.2|      --------------------------------------HEETPPTTSSDSEEE-------
gi|24639496|ref|NP_525062.2|      DVLDVLNQHSNSTGGQQQLNQQQLDEQQQAIDIATGRNTVDSPPTTGSDSDSDDGEPLNF
                                                                          ::****.***:.:       

gi|71774083|ref|NP_002458.2|      ------------------------------------------------------------
gi|114050751|ref|NP_001039539.1|  ------------------------------------------------------------
gi|50979040|ref|NP_001003246.1|   ------------------------------------------------------------
gi|71834866|ref|NP_036735.2|      ------------------------------------------------------------
gi|24639496|ref|NP_525062.2|      DLRHHRTSKSGSNASITTNNNNSNNKNNKLKNNSNGMLHMMHITDHSYTRCNDMVDDGPN
                                                                                              

gi|71774083|ref|NP_002458.2|      ----QEDEEEIDVVSVEKRQAPGKRS----------------------------------
gi|114050751|ref|NP_001039539.1|  ----QEDEEEIDVVSVEKRQPPAKRS----------------------------------
gi|50979040|ref|NP_001003246.1|   ----QEDEEEIDVVSVEKRQAPAKRS----------------------------------
gi|71834866|ref|NP_036735.2|      ----QDDEEEIDVVSVEKRQPPAKRS----------------------------------
gi|24639496|ref|NP_525062.2|      LETPSDSDEEIDVVSYTDKKLPTNPSCHLMGALQFQMAHKISIDHMKQKPRYNNFNLPYT
                                      .:.:*******  .:: * : *                                  

gi|71774083|ref|NP_002458.2|      -----------------ESGSPSAG-GHSKPPHSPLVLKRCHVSTHQHNYAAPPS--TRK
gi|114050751|ref|NP_001039539.1|  -----------------ESGSPSAG-SHSKPPHSPLVLKRCHVSTHQHNYAAPPS--TRK
gi|50979040|ref|NP_001003246.1|   -----------------ESGSPSAG-GHSKPPHSPLVLKRCHVSTHQHNYAAPPS--TRK
gi|71834866|ref|NP_036735.2|      -----------------ESGSSPSR-GHSKPPHSPLVLKRCHVSTHQHNYAAPPS--TRK
gi|24639496|ref|NP_525062.2|      PASSSPVKSVANSRYPSPSSTPYQNCSSASPSYSPLSVDSSNVSSSSSSSSSQSSFTTSS
                                                    *.:.    . :.*.:*** :. .:**: . . :: .*  * .

gi|71774083|ref|NP_002458.2|      DYPAAKRVKLDSVRVL----------------------------RQISNNRKCTSPRSS-
gi|114050751|ref|NP_001039539.1|  DYPAAKRAKLDSGRVL----------------------------KQISNNRKCASPRSS-
gi|50979040|ref|NP_001003246.1|   DYPAAKRARLDSGRVL----------------------------KQISNNRKCASPRSS-
gi|71834866|ref|NP_036735.2|      DYPAAKRAKLDSGRVL----------------------------KQISNNRKCSSPRSS-
gi|24639496|ref|NP_525062.2|      SNKGRKRSSLKDPGLLISSSSVYLPGVNNKVTHSSMMSKKSRGKKVVGTSSGNTSPISSG
                                  .  . **  *..  :*                            : :...   :** ** 

gi|71774083|ref|NP_002458.2|      -------------------------------------DTEENVKRRTHNVLERQRRNELK
gi|114050751|ref|NP_001039539.1|  -------------------------------------DTEENDKRRTHNVLERQRRNELK
gi|50979040|ref|NP_001003246.1|   -------------------------------------DTEENDKRRTHNVLERQRRNELK
gi|71834866|ref|NP_036735.2|      -------------------------------------DTEENDKRRTHNVLERQRRNELK
gi|24639496|ref|NP_525062.2|      QDVDAMDRNWQRRSGGIATSTSSNSSVHRKDFVLGFDEADTIEKRNQHNDMERQRRIGLK
                                                                       :::   **. ** :*****  **

gi|71774083|ref|NP_002458.2|      RSFFALRDQIPELENNEKAPKVVILKKATAYILSVQAEEQKLISEEDLLRKRREQ----L
gi|114050751|ref|NP_001039539.1|  RSFFALRDQIPELENNEKAPKVVILKKATAYILSVQAEQQKLKSEIDVLQKRREQ----L
gi|50979040|ref|NP_001003246.1|   RSFFALRDQIPELENNEKAPKVVILKKATAYILSVQAEEQKLLSEKDLLRKRREQ----L
gi|71834866|ref|NP_036735.2|      RSFFALRDQIPELENNEKAPKVVILKKATAYILSVQADEHKLISEKDLLRKRREQ----L
gi|24639496|ref|NP_525062.2|      NLFEALKKQIPTIRDKERAPKVNILREAAKLCIQLTQEEKELSMQRQLLSLQLKQRQDTL
                                  . * **:.*** :.::*:**** **::*:   :.:  ::::*  : ::*  : :*    *

gi|71774083|ref|NP_002458.2|      ---KHKLEQLRNSCA
gi|114050751|ref|NP_001039539.1|  ---KLKLEQIRNSCA
gi|50979040|ref|NP_001003246.1|   ---KHKLEQLRNSGA
gi|71834866|ref|NP_036735.2|      ---KHKLEQLRNSGA
gi|24639496|ref|NP_525062.2|      ASYQMELNESRSVSG
                                     : :*:: *.  .

The multiple sequence alignment was created through the use of multiple websites. The protein sequences were obtained from Entrez in a FASTA format. Using the FASTA format, T-Coffee was then utilized to create the alignment and is presented in the ClustalW format. The species that the alignment was generated for include, from top to bottom: human, cow, dog, rat, and zebrafish.

Phylogram Tree

This Phylogram tree shows the relationship of the mRNA sequences of different organisms (human, rat, mouse, fruit fly, chimpanzee, zebrafish, dog, and cow in descending order on phylogram). The mRNA sequence was chosen due to the shortened length of the sequence in comparison to the DNA sequence. From the results, it is shown that the myc gene in humans is most closely related to that of the rat myc gene and most distal to the cow sequence. With the rat being the closest homolog to humans I believe that it could potentially be the most likely candidate for an ideal model organism to aid in the understanding of Burkitt Lymphoma. This specific tree was created through the use of mRNA sequences obtained from Entrez. The actual production of the tree was from ClustalW.

DNA Motifs

There were 7 motifs found within the DNA sequence of the MYC gene. A cut off score of 100 was used in the Motif finder program. The motifs found include:

Stimulating Protein 1 (Sp1)
Hunchback (Hb)
Heat Shock Factor (HSF; Drosophila and Yeast)
Alcohol Dehydrogenase Gene Regulator 1 (ADR1)
CdXA
Activator of Nitrogen-regulated genes (NIT2)

Of the seven motifs that were found in the sequence, Sp1 was the only one that was found in human beings. in regards to Burkitt Lymphoma, the DNA motif of Sp1 was of interest to find in the sequence. Sp1 is a DNA-binding protein that displays an activity in the regulation of "prosurvival proteins" and "prodeath proteins" (6). As Sp1 plays a role in the regulation of cell death it would not be a surprise that a mutation arising in this segment of the DNA would result in abnormal cell growth and tumorigenesis.

References

1 Entrez
2 ClustalW
3 Ensemble
4 T-Coffee
5 Motif
6 Ryu, H., Lee, J., Zaman, K., Kubulis, J., Ferrante, R. J., Ross, B. D., Neve, R., Ratan, R. R. (2003). Sp1 and Sp3 are Oxidative Stress-Inducible, Antideath Transcriptional Factors in Cortical Neurons. The Journal of Neuroscience. 23(9):3597. Retrieved from:
http://www.jneurosci.org/cgi/content/full/23/9/3597?maxtoshow=&HITS=&hits=&RESULTFORMAT=1&andorexacttitle=and&fulltext=Sp1&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspec=relevance&resourcetype=HWCIT

Contact Info: Deeter Neumann, [email protected], May 14, 2009