Hydrophobicity Plot using BioPython
Hydrophobicity is the property of being water repellent, tending to repel and not absorb water. Calculation of hydrophobicity in proteins is important in identifying its various features. This can be membrane spanning regions, antigenic sites, exposed loops or buried residues. Usually, these calculations are shown as a plot along the protein sequence, making it easy to identify the location of potential protein features. The hydrophobicity is calculated by sliding a fixed size window (of an odd number) over the protein sequence. At the central position of the window, the average hydrophobicity of the entire windows is plotted.
A hydrophobicity plot is a quantitative analysis of the degree of hydrophobicity of amino acids of a protein. It is used to characterize or identify possible structure or domains of a protein. The plot has amino acid sequence of a protein on its x-axis, and degree of hydrophobicity on its y-axis.
In hydrophobicity plot, the degree of hydrophobicity is taken from the hydrophobicity scale. There are several hydrophobicity scales have been published for various uses. Many of the commonly used hydrophobicity scales are: Kyte-Doolittle scale, Engelman scale (GES scale), Eisenberg scale, Hopp-Woods scale, Cornette scale, Rose scale, and Janin scale. Many more scales have been published in the literature throughout the last three decades
AA | Amino Acid | Kyte-Doolittle | Hopp-Woods | Cornette | Eisenberg | Rose | Janin | Engelman (GES) |
---|---|---|---|---|---|---|---|---|
A | Alanine | 1.80 | -0.50 | 0.20 | 0.62 | 0.74 | 0.30 | 1.60 |
C | Cysteine | 2.50 | -1.00 | 4.10 | 0.29 | 0.91 | 0.90 | 2.00 |
D | Aspartic acid | -3.50 | 3.00 | -3.10 | -0.90 | 0.62 | -0.60 | -9.20 |
E | Glutamic acid | -3.50 | 3.00 | -1.80 | -0.74 | 0.62 | -0.70 | -8.20 |
F | Phenylalanine | 2.80 | -2.50 | 4.40 | 1.19 | 0.88 | 0.50 | 3.70 |
G | Glycine | -0.40 | 0.00 | 0.00 | 0.48 | 0.72 | 0.30 | 1.00 |
H | Histidine | -3.20 | -0.50 | 0.50 | -0.40 | 0.78 | -0.10 | -3.00 |
I | Isoleucine | 4.50 | -1.80 | 4.80 | 1.38 | 0.88 | 0.70 | 3.10 |
K | Lysine | -3.90 | 3.00 | -3.10 | -1.50 | 0.52 | -1.80 | -8.80 |
L | Leucine | 3.80 | -1.80 | 5.70 | 1.06 | 0.85 | 0.50 | 2.80 |
M | Methionine | 1.90 | -1.30 | 4.20 | 0.64 | 0.85 | 0.40 | 3.40 |
N | Asparagine | -3.50 | 0.20 | -0.50 | -0.78 | 0.63 | -0.50 | -4.80 |
P | Proline | -1.60 | 0.00 | -2.20 | 0.12 | 0.64 | -0.30 | -0.20 |
Q | Glutamine | -3.50 | 0.20 | -2.80 | -0.85 | 0.62 | -0.70 | -4.10 |
R | Arginine | -4.50 | 3.00 | 1.40 | -2.53 | 0.64 | -1.40 | -12.3 |
S | Serine | -0.80 | 0.30 | -0.50 | -0.18 | 0.66 | -0.10 | 0.60 |
T | Threonine | -0.70 | -0.40 | -1.90 | -0.05 | 0.70 | -0.20 | 1.20 |
V | Valine | 4.20 | -1.50 | 4.70 | 1.08 | 0.86 | 0.60 | 2.60 |
W | Tryptophan | -0.90 | -3.40 | 1.00 | 0.81 | 0.85 | 0.30 | 1.90 |
Y | Tyrosine | -1.30 | -2.30 | 3.20 | 0.26 | 0.76 | -0.40 | -0.70 |
The Kyte-Doolittle scale is widely used for detecting hydrophobic regions in proteins. Regions with a positive value are hydrophobic. This scale can be used for identifying both surface-exposed regions as well as transmembrane regions, depending on the used window size. Short window sizes of 5-7 generally works well for predicting putative surface-exposed regions. Large window sizes of 19-21 is well suited for finding transmembrane domains if the values calculated are above 1.6.
Program Implementation
In this tutorial, I have used Python 3.4 software, and BioPython 1.63, MatPlotLib, PyParsing 2.0.1, Python-DateUtil 2.2, PyTZ 2014.1, Six 1.6.1, NumPy-MKL 1.8 modules implemented under Windows 8.1 Enterprise operating system. The 7 modules are chosen based on the compatibility of Python and OS. The input given in the program is a protein sequence in fasta format.
Program
from pylab import *
from Bio import SeqIO
fh = open("E:\\BioPython\\Q9UKY0.fasta")
for record in SeqIO.parse(fh, "fasta"):
id = record.id
seq = record.seq
num_residues = len(seq)
fh.close()
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }
values = []
for residue in seq:
values.append(kd[residue])
x_data = range(1, num_residues+1)
plot(x_data, values, linewidth=1.0)
axis(xmin = 1, xmax = num_residues)
xlabel("Residue Number")
ylabel("Hydrophobicity")
title("K&D Hydrophobicity for " + id)
show()
Query
>sp|Q9UKY0|PRND_HUMAN Prion-like protein doppel OS=Homo sapiens
MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGEF
QKPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK
Thank you for your posts. They are really helpful.
ReplyDeleteI have a question: to calculate hydrophobicity of the entire TM helix, do we have to take mean value of all residues?
{ 'Ala': 1.290, 'Arg': 0.960, 'Asn': 0.900, 'Asp': 1.040, 'Cys': 1.110, 'Gln': 1.270, 'Glu': 1.440, 'Gly': 0.560, 'His': 1.220, 'Ile': 0.970, 'Leu': 1.300, 'Lys': 1.230, 'Met': 1.470, 'Phe': 1.070, 'Pro': 0.520, 'Ser': 0.820, 'Thr': 0.820, 'Trp': 0.990, 'Tyr': 0.720, 'Val': 0.910 }
DeleteReference: http://web.expasy.org/protscale/pscale/alpha-helixLevitt.html
Thank you very much.
ReplyDeleteHello Ashok. I will like to calculate hydrophobicity on aligned sequences. I have aligned sequences in FASTA format. There are gaps in aligned sequences. I need output as values for every sequence which i can then avaerage. Can it be done ?
ReplyDeleteThanks.
Dear Rohit Jain,
DeleteThis program is to generate hydrophobicity plot using dataset (Hydrophic values of 20 amino acids. Kyte-Doolittle scale values are used in this program) from the amino acid sequence. The basic concept is plotting 2D graph through x and y coordinate numbers.
Generating a plot using the alignment gap is meaningless. It will break the graph plot.
My name is Rohit Jain
ReplyDeleteLooking forward to your reply.
ReplyDelete