Why Is It So Hard to Say What a Species Is?

After all, what's so hard about it? Sinningia eumorpha isn't the same as Sinningia cardinalis, is it? One has slipper-shaped white flowers, the other has tubular hooded red flowers. One has dark glossy leaves with some amount of red in the leafback, the other has light green fuzzy leaves without any red in the leafback. A species is a species, right? What's the problem?

One of the earliest and most-repeated definitions of a species is this: a species is a collection of interbreeding individuals.

If A and B are two individuals and they can mate to produce fertile progeny, they belong to the same species. Very simple.

Not so simple. Sinningia eumorpha and Sinningia cardinalis interbreed to produce fertile progeny. At least, they can. With human assistance.

Grump. What we hope for is a definition of species which doesn't depend on human intervention or perception. A species should be a species, whether or not we're around to define it, and publish it in learned journals, and quarrel over splitting and lumping. Right?

Wrong. Probably.

The question comes down to this. Is it possible to define species in an objective way independent of the observer?

Mathematics Rears its Attractive Head

We notice three properties about belonging to a species.

an invididual is a member of the same species as itself
if an individual is a member of the same species as another individual, then that other is a member of the same species as the first
if one individual is a member of the same species as a second individual, and the second individual is a member of the same species as a third individual, then the first individual is a member of the same species as the third individual.

Whew. Let's use a shorthand. If A and B are two individuals, we write A ~ B to mean "A and B are members of the same species".

Then we can rewrite those long-winded statements above in a more concise form. If A, B, and C are any three individuals:

A ~ A [the identity condition]
If A ~ B, then B ~ A [the symmetry condition]
If A ~ B, and B ~ C, then A ~ C [the transitivity condition]

It turns out that conditions #1, #2, and #3 are exactly the properties necessary to define an equivalence relation.

The power of the equivalence relation is that it divides up the entire universe upon which it is defined into equivalence classes: non-overlapping subsets with a very simple definition. If A is any member of the set, then the equivalence class to which A belongs is precisely the subset of all members B for which A ~ B.

Let us note some examples of equivalence relations. "owns the same make of car as" is an equivalence relation, if it is defined on the set of people who own only one car. "is printed in the same language as" is an equivalence relation on the set of monolingual books.

So what kind of relations are not equivalence relations?

"is brother to" isn't an equivalence relation because it does not satisfy the first (identity) condition: A is not his own brother.
"is brother to or the same as" isn't an equivalence relation because it does not satisfy the second (symmetry) condition: if A is B's brother and B is female, then B is not A's brother.
"shares at least one parent with" isn't an equivalence relation because it does not satisfy the third (transitivity) condition: if A is the child of Q and R, and B is the child of R and S, and C is the child of S and T, then A ~ B and B ~ C, but it is not true that A ~ C.

Back to Species

The third example should be an ominous one for the attempt to define a species as an interbreeding group of individuals. What guarantee is there that "interbreeding" is going to be transitive?

Actually, none. Somewhere (in a book by Stephen Jay Gould, I think) I read about a continuous range of arctic bird, which circles the Arctic Ocean, from Canada to Scandinavia to northern Russia to Siberia to Alaska and back to Canada. Each group interbreeds with the group to the east, until the Alaska-Canada boundary, where they do not interbreed. (I probably have a detail wrong, but it was birds and it was the Arctic.)

"belongs to the same species as" must be an equivalence relation if the species concept is to mean anything. A robust, human-intervention-independent definition of species requires that this definition result in an equivalence relation. In particular, it must be transitive: if A ~ B and B ~ C, then A ~ C. There is just no assurance that the ability to interbreed will be transitive.

Counting Differences: the Distance Problem

Another approach to defining species (or other taxonomic levels like genus and family) has been to make a list of 10 or 50 or 200 characteristics, and count how many of them match between a pair of individuals. Obviously, the characteristics must be chosen cleverly: "interested in gesneriads", for instance, is unlikely to be very useful.

Still, the approach seems promising, and it removes a lot of the subjectivity. Two individuals are members of the same species if they differ by fewer than (say) seven of the 200 characteristics.

Unfortunately, it can be seen right away that "differ by fewer than 7 characteristics in the list" is not an equivalence relation. A ~ B and B ~ C does not imply A ~ C. The mismatches between A and B might be totally different from the mismatches between B and C, so there may actually be 14 differences between A and C.

Of course, the actual technique for using lists of differences is not anywhere near that naive. The same tree-construction algorithms are used as in creating trees from DNA data, as described elsewhere. But the problem remains. These methods measure a distance between two individuals and attempt to convert that to an equivalence relation, and any such attempt is, at least in theory, doomed to fail. In general, distances can not be used to define equivalence relations.

Example: continents

Australia is a well-defined continent -- a big hunk of land entirely surrounded by water. "is on the same continent as" is a good equivalence relation. It's got all three of the required properties. So can we use a distance measurement to tell us whether a city is in Australia?

A glance at a map is all it takes to see how hopeless that will be. Sydney (Australia) is closer to Auckland (New Zealand) than it is to Perth or Darwin (both in Australia). Darwin is closer to Dili (East Timor) than it is to Sydney or Perth. (Perth isn't close to anything.)

Of course, we could be smarter and use a combination of distances, but it still wouldn't help in the case of Africa, where Algiers is closer to every place in Spain than it is to most cities in the rest of Africa.

The Genome to the Rescue!

Fortunately, the interbreeding definition and difference counting belong to the Dark Ages of the Twentieth Century. In the Twenty-First Century, we have DNA analysis to give us a definition of species that will stand up to the transitivity test. Don't we?

Alas, the same problem applies. DNA analysis just gives us another distance measurement, albeit way more sophisticated. We get the same problem: DNA similarity is not transitive. A ~ B and B ~ C still don't imply A ~ C.

So do we give up?

It's important to note that the difficulties aren't in the choice of method. It's not that we are obstinately trying to use thermometers to measure weight. The problem is inherent in the data: there's no guarantee that we can actually discover a species equivalence relation. One may simply not exist.

Remember those arctic birds mentioned above? A continuous spectrum of interbreeding groups stretching around the top of the world, until it returns to its starting point -- and no longer interbreeds. We have seen that "interbreeds with" is not an equivalence relation, but it is still one of the properties we want for any species. We don't want a species definition which results in a species some members of which are not interfertile.

So the two groups at the non-interbreeding boundary have to belong to different species. But that means we have to draw a species boundary somewhere in the middle of the range, which will separate two interbreeding populations! That is not very satisfactory, but there's no getting around it. The problem isn't in our methods, it's in the data -- and in the whole idea of species.

The point is this: we can't expect perfection in any definition of species or in any attempt to group individuals into species, or in any attempt to group species into higher categories. We have to live with imperfection, as most of us (except for my wife) do every day.