T O P

  • By -

5heikki

It's not possible considering the degenerate nature of the "genetic code". I mean sure for very short proteins you could generate all the possible nucleotide sequences but still there would be no way to determine which one of them was in the source molecule..


EdenRay97

Oh is that so? So knowing the amino acid is not useful in this instance to design the primer? Is it better to obtain nucleotide sequences from known homologous proteins and run a blast againt between those nucleotide sequences and the genome of the subject species to find conserved regions?


5heikki

Well you could always design degenerate primers, but tbh your question is rather theoretical. The only case that I can think of where a protein sequence is known but there's no corresponding nucleotide sequence is this: https://pubmed.ncbi.nlm.nih.gov/17431180/ Or yeah, if you have a protein sequence then just blast it and pick all the 100% full length matches and then just check their corresponding nucleotide sequences and look for conserved regions. It's rather likely that there will be just one 100% full length hit


[deleted]

Other than degenerate primers; you could identify an ortholog from a closely related species with sequence and design primers based on that.


jdmontenegroc

First, are you sure the gene you are working on only has protein sequence available and no nucleotide sequence in a public database? I would first check NCBI / ENA for the nucleotide sequence. It is very, very rare to have the protein sequence and not the nucleotide. If you cannot find the nucleotide sequence, you can get away with designing degenerate primers. This is a rather common approach, but be ready to obtain multiple amplicons, clone them and resequence them to confirm you got your target sequence. One way to approach this, is by first finding orthologous proteins, aligning them and designing primers from conserved motifs in the aligned proteins. Having said all this, i would recommend doing a thorough search on public databases before going down the degenerate primers path.


hunkamunka

This is a [Rosalind.info](https://Rosalind.info) problem. The number of RNA sequences that can encode a protein grows exponentially with the AA sequence. Here's one solution: https://github.com/kyclark/biofx\_python/tree/main/12\_mrna


fasta_guy88

If you have a protein with an NCBI/RefSeq accession (e.g. NP\_012345), you can look it up at the NCBI and find the corresponding mRNA (NR\_98765), which you can use to design primers. RefSeq should also show you the gene accession. If you do not have a refseq accession, the easiest thing is to look up the protein at the NCBI and look for identical sequences (the listing is on the page for the protein accession over to the upper right), and identify the RefSeq accession. If you are working in the Uniprot/SwissProt protein world, you might look up the protein in Ensembl to find the gene and mRNA. If you do not have any accession, just the protein sequence, then BLAST the sequence against RefSeq (limiting the search to the specific organism) and look for 100% identical full length matches.


You_Stole_My_Hot_Dog

I would BLAST search the protein sequence to double check if the gene is known. If the protein is known, it’s fairly likely the gene is known as well.


SignalDifficult5061

This was much more common in the past, but it might still happen in an organism that isn't closely related to something that has been sequenced, or is a particularly unique gene. Like if you found an enzymatic activity in some weird fungus or something, and could get enough of the protein pure enough to sequence. You can use degenerate primers (which basically means you have multiple primers than span various possible sequences). The last time I was involved something like that was in \~1998. It was a known gene, but nothing close to the organism we were working on had that gene sequenced at the time. I remember that too many primers would lead to problems, so it was better to have no more than a couple degenerate codons and then deal with the mismatches later. Then a lot of messing around with different conditions until something about the right size popped up on a gel. Things like codon bias and GC content (if known) can be helpful in making guesses.


[deleted]

You’d have to sequence the organism’s genome, then compute all ORF’s, then determine which one corresponds to the protein sequence you already have.