%% ICEMAN Example of phylogenetic analysis
%%
% This demo belongs to the collection of Case Studies in Computational
% Genomics, mostly based on classic papers, and mostly based on the contents
% of the book
%
% Introduction to Computational Genomics, A Case Studies Approach
% Cambridge University Press, 2006
% Nello Cristianini and Matthew Hahn
%
% The demos of the other case studies, pointers to software and papers that
% are available on-line can be found on the website:
%
% www.computational-genomics.net
%
% This demo is also available on-line at
web('http://www.computational-genomics.net/case_studies/iceman_demo.html');
% Demo by Elisa Ricci.
%% Introduction
% Ötzi the Iceman is a well-preserved natural mummy of a man frozen for about
% 5300 years and found in 1991 in a glacier of the Ötztal Alps, near the border
% between Austria and Italy.
% Recently researchers found interesting informations about it from the
% exam of mitochondrial DNA taken from cells of the iceman intestine. In
% particular from phylogenetic analysis they made some hypotesis about its
% mitochondrial haplogroup.
% A haplogroup is defined by set of characteristic mutations on the mitochondrial
% genome. Therefore it can be traced along a person's maternal line and it can be
% used to group populations by genetic features. The famous book
% "The Seven Daughters of Eve" by Bryan Sykes describes the classification of all modern
% humans into into mitochondrial haplogroups and links each haplogroup
% to a specific prehistoric woman ("clan mothers"). In fact the branches of
% the mtDNA tree (composed of groups of people with related haplogroups) are
% continent-specific.
% In this example, we analyze the statistical properties and perform phylogenetic
% analysis of the mtDNA of the iceman to investigate the relationship with
% modern humans of different geographical locations and to determine useful
% information about its haplogroup.
%%
% The mtDNA sequence of the iceman can be obtained from the Genbank database
% (accession number S69989). Some other sequence of modern humans from different
% parts of the world are considered. They have been extracted from the Max
% Ingman database (http://www.genpat.uu.se/mtDB/). All the sequences are stored in the
% same structure. The associated haplogrop is indicated.
fullset = {
'iceman' 'S69989' ''
'Aborigine' 'AF346963' 'A'
'Aborigine2' 'AF346965' 'A'
'Biaka Pygmy' 'AF346968' 'L1'
'Biaka Pygmy2' 'AF346969' 'L1'
'Buriat' 'AF346970' 'C'
'Chukchi' 'AF346971' 'A'
'Chinese' 'AF346972' 'F'
'Chinese2' 'AF346973' 'D'
'Crimean Tatar' 'AF346974' 'H'
'Dutch' 'AF346975' 'H'
'Effik' 'AF346976' 'L2'
'Effik2' 'AF346977' 'L2'
'English' 'AF346978' 'V'
'Evenki' 'AF346979' 'C'
'German' 'AF346983' 'J'
'Guarani' 'AF346984' 'D'
'Hausa' 'AF346985' 'L0'
'Ibo' 'AF346986' 'L1'
'Ibo2' 'AF346987' 'L1'
'Italian' 'AY882413' 'UK'
'Japanese' 'AF346989' 'D'
'Japanese2' 'AF346990' 'D'
'Kikuyu' 'AF346992' 'L1'
'Korean' 'AF346993' 'B'
'Mandenka' 'AF346995' 'L2'
'Mbenzele Pygmy' 'AF346996' 'L1'
'Mbenzele Pygmy2' 'AF346997' 'L1'
'Mbuti Pygmy' 'AF346998' 'L0'
'Mbuti Pygmy2' 'AF346999' 'L0'
'Piman' 'AF347001' 'B'
'PNG Coast' 'AF347002' 'N'
'PNG Highlands' 'AF347004' 'N'
'Samoan' 'AF347007' 'B'
'Saami' 'AF347006' 'V'
'San' 'AF347008' 'L0'
'San2' 'AF347009' 'L0'
'Siberian Inuit' 'AF347010' 'D'
'Warao' 'AF347012' 'C'
'Warao2' 'AF347013' 'C'
};
%%
% Since the mtDNA sequence of the Oetzi mummy is only partially available
% we consider the corresponding nucleotides in modern human sequences. To
% do that, we perform some local alignments. The scores of local alignments
% are stored in a vector indicating various degrees of similarity between
% sequences.
for ind = 1:length(fullset)
fullPeople(ind).Header = [fullset{ind,1},' ', fullset{ind,3}];
end
fullPeople(1).Sequence = getgenbank(fullset{1,2},'sequenceonly','true');
for ind = 2:length(fullset)
temp = getgenbank(fullset{ind,2},'sequenceonly','true');
[score, localAlignment, Start] = swalign(fullPeople(1).Sequence,temp,'Alphabet', 'NT');
sci(ind-1) = score;
cutof = Start(2);
lenK=length(localAlignment)+cutof;
fullPeople(ind).Sequence=temp(cutof:lenK);
end
%%
% For convenience the data are also directly available in the website
% www.computational-genomics.net and you can load them from a .mat
% file using the command
% load bigPic
%% Statistical analysis
% Before performing phylogenic analysis, we analyse some statistical
% properties of the iceman sequence.
S = fullPeople(1).Sequence;
%%
% The MATLAB function *basecount* reports nucleotide counts and displays them
% in pie chart.
figure;
basecount(S,'chart','pie');
title('Distribution of nucleotide bases of Iceman mtDNA HVR');
%%
% We can also plot the density of each nucleotide.
figure;
ntdensity(S);
%%
% The MATLAB function *dimercount* can be used to count dimers and plot them.
figure;
dimercount(S,'Chart','pie');
title('Dimer distribution for Iceman mtDNA')
%%
% The codons of each of reading frame of the positive strand are also counted
% and plotted. You can use the MATLAB function codoncount.
figure;
for f = 1 : 3,
subplot(3,1,f);
codons{1,f} = codoncount(S,'frame',f,'figure',true);
title(sprintf('Codons for Frame %d of Iceman mtDNA ',f));
end
%% Phylogenetic analysis
% The scores of local alignments between sequences previously computed are
% analyzed. We plot the associated histogram and we compute some statistics
% (mean, variance max, min).
figure
hist(sci,40)
title('Istogram of local alignment scores')
stats=[max(sci), min(sci), mean(sci) var(sci)]
%%
% Few sequences have high scores and then are strongly related to the
% iceman sequence.
% The multiple alignment gives some information about the relationship
% between the analysed sequences.
MultiAligned = multialign(fullPeople);
showalignment(MultiAligned)
%%
% Then we calculate the pairwise distances using the Juckes Cantor correction.
% The MATLAB function *seqpdist* is used for that.
distances = seqpdist(fullPeople,'Method','Jukes-Cantor','Alphabet','NT');
%%
% The phylogenetic tree is built with the UPGMA algorithm an displayed.
UPGMAtree = seqlinkage(distances,'UPGMA',fullPeople);
h = plot(UPGMAtree,'orient','bottom');
title('UPGMA Distance Tree using Jukes-Cantor model');
ylabel('Evolutionary distance')
set(h.terminalNodeLabels,'Rotation',-45)
%%
% A neighbor-joining tree is also constructed using the *seqneighjoin* function.
% Neighbor-joining tree uses also the pairwise distance calculated above
% (using the *seqpdist* function).
NJtree = seqneighjoin(distances,'equivar',fullPeople);
h = plot(NJtree,'orient','left');
title('Neighbor-joining tree using Jukes-Cantor model');
%%
% The phylogenetic trees show that the iceman is more related to european
% population in particular to Italian belonging to the UK superhaplogroup.
% This is in accordance with the results presented in the paper of Rollo et
% al. that associates Otzi to the K haplogroup.
% Both trees show also some infomation about the relationship and the
% evolution of haplogroups in specific regions of the world. The neighbor
% joining algorithm seems more accurate.
% The tree depicts in a general manner also the relationship and the evolution
% of different populations.
% In Africa, the most ancient mtDNA haplogroups (L0, L1, L2), make up
% macrohaplogroup L. It radiated to form the Eurasian macrohaplogroups M and
% N. Among Europeans, haplogroups H, I, J, N1b, T, U, V, W, and X make up
% about 98% of the mtDNAs. These were derived primarily from macrohaplogroup N.
% In Asia, macrohaplogroups N and M contributed equally to mtDNA radiation.
% Here now the main haplogroups are A, C, D, G.
%% References
% F. Rollo, L. Ermini, S. Luciani, I. Marota, C. Olivieri, D. Luiselli, *Fine
% characterization of the Iceman's mtDNA haplogroup*, J Phys Anthropol. Jan
% 2006.
%
% Torroni, K. Huoponen, P. Francalacci, M. Petrozzi, L. Morelli, R. Scozzari,
% D. Obinu, M. L. Savontaus, D.C. Wallace. *Classification of European mtDNAs
% from an Analysis of Three European Populations*, Genetics, Vol 144, 18351850,
% 1996.
%
% M.Ingman, H. Kaessmann, S. Paabo, U. Gyllensten, *Mitochondrial genome
% variation and the origin of modern humans*. Nature 408, 708-713, 2000.
%
% D.Mishmar, E. Ruiz-Pesini, P. Golik, V. Macaulay, A. Clark, S. Hosseini,
% M.Brandon, K. Easley, M. Brown, R.I. Sukernik, A. Olckers, D. Wallace.
% *Natural selection shaped regional mtDNA variation in humans*.
% Proc.Natl.Acad.Sci. 100, 171-176, 2003.