Today, I'm attanding a workshop in Cologne organized by and about DataCite. DataCite is an organization that helps make scientific data citable, as the name says (for instance by registering DOIs for data). I was invited there to present the efforts in our lab to establish an infrastructure in which all our data is collected digitally, evaluated and then automatically published in a persistent, citable manner. You can see the slides I used in my presentation on slideshare. The gist was that we feel compelled to implement some of ths technology on our own, even though it occurs to us that they should be offered as a component of the standard scientific infrastructure offered by our institutions. To the end, we use the package Rfigshare for R to publish our data on FigShare at the same time the scientist is evaluating their data, on the fly, seamlessly, without any additional work by the researcher.

The ensuing panel discussion was quite lively and covered many relevant topics such as missing incentives for scientists to share their data, other historical baggage, technical issues such as version control or the differences between different fields of scholarship.

One really exciting new development is a recent linked data collaboration between DataCite and CrossRef. This collaboration provides persitent semantic identifiers, allowing machines to resolve the RDF metadata for any DOI, for example. Another very cool service is the Citation Formatter, go an try it. These developments in content negotiation may seem trivial and these services puny, but the possibilities that these developments offer are quite staggering. A pervasive use of this technology for all scientific literature, data and software would allow us to semantically connect all of science in a machine-readable way. Fantastic!

We now even have data-level metrics. For instance, at, they track not only how often a dataset was cited, but also how often the associated DOI was resolved, allowing to indirectly track data usage. On this site you can see, for instance, that in September 2012, the service that we use to deposit our data, FigShare, has generated by far the most successful resolutions (at least that's how I read it).
