Introduction to Cladistics.
I. Scoring a matrix:
If God revealed the true cladogram to us and all that we, as systematists, had to do was read it and provide group names, our lives would be easy indeed. Instead, we must draw the cladograms ourselves, based on what we know about organisms.
A simple example. We have three taxa whose phylogeny we would like to know: Willow trees, parrots, and cows. To start, we assume only that the three are related and have a common ancestor but that we don't know anything more about their relationship. When we draw a picture of this assumption, we see a polytomy, or branch point with more than two lineages growing from it. This means that we do not know the true relationship between the various descendant branches.
We want to resolve this polytomy into one of three possible trees:
We know that synapomorphies can be used to identify monophyletic groups. Our task, then, is to identify synapomorphies. An easy means of organizing information for this purpose is a matrix, or chart of taxa and characters.
For now, we code each character as "yes" or "no".
Character 1. Cell membranes present 2. Leaves present 3. Paired limbs present |
A-Willow Y Y N |
B-Parrot Y N Y |
C-Cow Y N Y |
We can map these character state changes onto the three alternative trees to obtain three possible sequences of character changes.
II. Parsimony:
Now we must choose the one that seems the most reasonable. We don't just close our eyes and point. Instead we use the principle of parsimony to guide us.
Applying simplicity to our choice of cladograms, we assume that the tree that requires the fewest character state changes is the most likely to be true. The changes are counted, and tree A is seen to have the fewest changes, so we are finally left with one cladogram which represents our best approximation of the phylogeny of willows, parrots, and cows.
With large matrices, hand manipulation of the data is impossible. For these, phylogenetic analysis computer programs exist.
III. Outgroups:
When we look at alternative character states such as "Paired limbs present" or "Paired limbes absent." We need to know which characters are ancestral or primitive and which are derived. This is because only shared derived characters tell us anything about organismal relationships. In the last example, you simply took our word for which characters were ancestral and which were derived. In real life, no one tells us this. Instead, we choose a taxon which we consider distantly related to the taxa whose phylogeny we want to know and let it be our standard. This standard taxon is called the outgroup. We assume that the character states that it shows are ancestral, and score the other taxa (the ingroup) by comparison with it.
Imagine that we have landed on the planet of the potato heads, and want to perform a phylogenetic analysis of the different taxa that we have found. Natives tell us that taxon A is very distantly related to the others, so we choose it as our outgroup. We now score our matrix using the systematist's convention that "0"=ancestral and "1"=derived instead of with "yes" and "no" (which is systematists' baby-talk.):
Character 1. Eyes present 2. Feet present 3. Hat present 4. Lips present |
A-Outgroup 0 0 0 0 |
B 0 0 1 0 |
C 0 0 0 1 |
D 0 1 0 1 |
The outgroup characters are assumed to be ancestral by definition, so everything in the outgroup column gets a 0. Others are scored based on the presence or absence of particular anatomical features, as compared with the outgroup. The resulting table shows us the distribution of ancestral and derived characters. In any matrix, we can identify the following different types of character states:
Using the information in the matrix, we can reconstruct the following cladogram: By definition, taxa B, C, and D are more closely related to one another than to the outgroup. Additionally, the only synapomorphy, character 4, shows that C and D share a more recent common ancestry with one another than either does with B.
>
IV. Homoplasy:
For our final exercise, we will look at some real organisms: Bass, turtle, snake, and crocodile. We will choose a shark as the outgroup. We score the following matrix:
Character 1. Paired limbs present 2. Supratemporal fenestra present 3. Infratemporal fenestra present 4. Breathes air |
OG-Shark 0 0 0 0 |
A - Bass 0 0 0 0 |
B - Turtle 1 0 0 1 |
C - Snake 0 1 1 1 |
D- Croc 1 1 1 1 |
Now we have a problem. Characters 2 and 3 are synapomorphies of the snake and croc, while character 1 is a synapomorphy of the turtle and croc, but NOT of the snake. Thus, these characters seem to tell different stories about evolutionary history. This character discordance is called homoplasy (adj. is homoplastic). This may result from character reversals, convergent evolution, or any of a variety of other causes. We may not like it, but we have to deal with it. To do this this we again resort to parsimony. The matrix suggests two possible cladograms:
To choose between them, we bite the bullet and count character state changes for both cladograms. What we see is that the second tree is considerably simpler, so we accept it. Our result suggests that snakes are descended from a recent common ancestor with crocs, and have secondarily lost their limbs.
XI. Reservations:
The cladistic technique is very good at unraveling the phylogenies of related lineages. Indeed, anything related by ancestry and descent; organisms, languages, etc. are fair game. It is based on a series of assumptions which must not be broken if the result is to have meaning:
The worst part: If you violate one of these assumptions, you may never know. The cladistic technique simply assumes that you have done your homework properly.