E-infrastructure for organism names to facilitate data sharing
Final report, 2011-01-26, textual parts only.
1. Summary
The purpose of the project was to set up an e-infrastructure to manage names of biological taxonomy using globally unique identifiers. It is a well known problem that scientific names alone cannot be used to unequivocally denote species and to link the alternative name usages in different databases. It must be known, according to which publication a name has been used.
A database platform managing names from lists used in all Nordic countries was created, which database is available at http://taxon.luomus.fi/. The pilot data set consists of five check lists of two butterfly superfamilies. The main function of the data base (setting relations between taxon names meaning the same taxon) is used by experts who are authorised users. The relations specify if two taxa are synonyms, one belonging to another or taxa are partly overlapping (for instance, the species of a genus are partly the same). Other users can browse check lists, search valid names and inspect the history of splitting and lumping of taxa.
Each taxonomic concept was given a globally unique Life Science Identifier (LSIDs) obtained from the Catalogue of Life's Annual Checklist when they existed. The missing LSIDs were generated by the project. LSIDs were implemented using the recommendations published by Biodiversity Information Standards (TDWG) and detailed scheme of applying LSIDs was developed. In addition, an identifier was given to each taxonomic name and differing spellings, and an LSID of a valid taxon name was attached to a taxonomic concept. These two identifiers are needed when updating the changes in nomenclature.
Names and LSIDs can be queried using the LSID resolver which returns results in RDF Resource Description Framework) format. The results contain valid names and their synonyms, and LSIDs for taxonomic concepts.
Taxon names and their synonyms for ca 150 species and their higher taxa (Lepidoptera: Hesperioidea and Papilionoidea) are linked using the underlying taxonomic concepts. Amongst the rest of the Lepidoptera, several other organism groups covering the target countries will be uploaded into the database. This basically covers all insect orders, spiders, molluscs and vascular plants occurring in Scandinavia. The taxonomic database contains ca 30 000 names.
Occurrence data, collectively about 3 million records of butterflies and moths from about 20 different databases in five Nordic countries (Finland, Norway, Estonia, Denmark and NW Russia) can now be integrated into one large database using LSIDs. This will enable new kind of macroecological research for purposes such as studying the impact of climate change to biodiversity.
2. Activity Report
A. Publications:
Tuominen, J., Frosterus, H., Laurenne, N. & Hyvönen, E. 2010. Publishing biological classifications as SKOS vocabulary services on the Semantic Web. TDWG 2010 Conference, 25-30 September 2010, Wood Hole, MA, USA. Weitzman, A.(Editor). Abstracts of the 2010 Annual Conference of the Taxonomic Databases Working Group.
Laurenne, N.M., Koho, M., Mertaniemi, A. & Saarenmaa, H. 2009. LSIDs for managing biological names in data integration. TDWG 2009 Conference, 9-13 October 2009, Montpellier, France. Weitzman, A.(Editor). Abstracts of the 2009 Annual Conference of the Taxonomic Databases Working Group. P. 60.
Laurenne, N.M., Penttilä, M. & Saarenmaa, H. 2008. Using LSIDs to link taxonomic concepts to scientific names for efficient data integration. TDWG 2008 Conference, 19-24 October 2008, Fremantle, Australia. Weitzman, A. & Belbin, L. (Editors). Proceedings of TDWG 2008. P. 89. Biodiversity Information Standards (TDWG) and the Missouri Botanical Garden, St. Louis.
Laurenne, N., Penttilä, M. & Saarenmaa, H. 2008. An LSID infrastructure for taxonomic concepts and scientific names for efficient data integration in the Nordic region. Stockholm Biodiversity Informatics Symposium 2008. The Book of Abstracts. Naturhistoriska Riksmuseet. Sweden.
B. Other dissemination activities:
- Poster at Biodiversity Information Standards (TDWG) 2010 Conference. Woods Hole, MA, USA.
- Poster at Biodiversity Information Standards (TDWG) 2009 Conference. France.
- Presentation at Nordic GBIF Nodes meeting, Copenhagen 9.10.2009.
- Presentation at Stockholm Biodiversity Informatics Symposium 2008. Sweden.
- Poster at Biodiversity Information Standards (TDWG) 2008 Conference. Fremantle, WA, Australia.
C. Mobility
- Visit to Naturhistorisk museum, Oslo, Norway. 14.5.09.
- Visits to Zoological Institute of Russian Academy of sciences. St. Peterburg, Russia. 21-22.4.2009 and 10.6.2010.
- Visits to Stockholm and Copenhagen museums mentioned under dissemination.
- Visit to Göteborg Natural History Museum for planning a joint Nordic collection management system, 24.11.2010.
D. Interaction between e-infrastructures
A common Nordic database of species names was constructed at the Finnish Museum of Natural History, and is available at http://taxon.luomus.fi/. The participants from the Nordic countries delivered the checklists of the names of butterflies and moths occurring in their country to this database. An LSID resolver was constructed using the guidelines of TDWG.ORG. Globally unique identifiers according to the LSID specification for all taxonomic entities in all checklists were issued by the database or acquired from Species 2000 where they existed. The species concepts across all these checklists were mapped to each other. These "nordically harmonised" LSIDs were are available for download, and were also then sent back to the participating museums for inclusion in their observation databases. This last step was actually implemented only in Sweden at their GBIF data provider. The other participants still need to implement this final step, but it is understood that this is work that is ongoing also in future and not necessarily a divergence of the plan at this stage. Furthermore, for the first time, butterfly data has been digitised in Russia and is being shared in Estonia as result of this project. Otherwise, all the goals and aims of the project have been reached. GBIF Secretariat was informed of these developments and it is now under consideration whether this mechanism can be implemented for their global data portal.
E. Participation and planning of EU and international networks
The Swedish, Finnish and Norwegian partners have participated in preparation of the LifeWatch ESFRI project in their countries. The Swedish and Nowegian projects have been funded, and the decision is pending in Finland. Taxonomy and GBIF are major components of LifeWatch. A Nordic-level LifeWatch project is at planning stage, and will probably be offered to NordForsk in near future.
The Finnish Museum is partner in an EU ICT PSP project with the title OpenUp!. This aims at delivering natural history data to the www.europeana.eu portal. Our role in this project is to provide a service to validate zoological taxonomic names for the purposes of the entire project, which builds directly to the capacities acquired in this NordForsk project.
F. Experiences of the project
This was the first joint project of the Nordic GBIF nodes, and as such exciting. The problem we addressed was probably rather complicated one for such a first project, because only Sweden was able to complete the work assigned to countries. The results were well appreciated by the GBIF Secretariat that has the technical capacity to take them up. Furthermore, keeping momentum up was hard after the initial enthusiasm. This was probably due to the small amount of funding that was distributed too thinly to the network, and at a too early stage.
G. Experiences of NordForsk
Research infrastructure is receiving increased attention at EU level and also at national level. We have participated in several such proposals lately. Building e-infrastructure at Nordic level is a natural thing to do because Nordic countries are relatively small and cannot afford separately all necessary e-infrastructure, including building proper databases for managing collection data at national museums. We wish there was an ongoing funding for e-infrastructures and e-science available at Nordic level.
Furthermore, e-infrastructure requires Internet identity. We wish that NordForsk would consider establishing an Internet top level domain for the Nordic region. .no has been taken by Norway, but something could probably be found, like .nx.