The iPhylo blog had a few days ago a post about the relevance of taxonomic names (http://iphylo.blogspot.com/2010/10/are-names-really-key-to-big-new-biology.html). Do we need names? According to this blogpost, “formal taxonomic names don’t seem terribly necessary in order to do a lot of science“. This opinion was underpinned by referring to surrogate names (like ‘SAR-11’), which are commonly used in experimental studies. In many cases this may be appropriate, but if you have read my blogpost of yesterday, I would like to say: “think it over, it’s not that simple”.
There are a few wrinkles to deal with. Firstly, names may have synonyms, lexical variants, etc. (….). Leaving aside lexical variants, what we want is a “view” of the [name,document] pairs that says this subset refer to the same thing (the “taxon concept”).We can obsess with details in individual cases, but at web-scale there are only two ones that spring to mind. The first is the Catalogue of Life, the second is NCBI. The Catalogue of Life lists sets of names and reference that it regards as being the same thing, although it does unspeakable things to many of the references. In the case of NCBI the “concepts” would be the sets of DNA sequences and associated publications linked to the same taxonomy id. Whatever you think of the NCBI taxonomy, it is at least computable, in the sense that you could take a taxon and generate a list of publications ‘about” that taxon.