Nobody told me that yesterday was science metrics day, so of course I'm late with my take on how we ought to assess progress in science.
In the best of all worlds, it would be clear to anyone what constitutes scientific progress: the publications are there for anyone to read and form their own opinion. However, there are by now more than 2 million publications published in more than 24,000 peer-reviewed journals (that's all scholarly publications) per year. Moreover, nobody can know all the different disciplines well-enough to understand every single article, even if it were theoretically possible to read all of them. Thus, some may say unfortunately, it is inevitable that we will have to resort to some computational assistance when assessing the progress of science.
Adding insult to injury, scientific progress isn't nearly as gradual, predictable and steady as it may seem. On the contrary, as one New York Times author puts it:
The problems don't end there. Even the few scientists within a field are prone to mistakes in their assessment. Famously, Einstein missed that his own cosmological constant meant that the universe was indeed expanding and didn't realize that until Hubble pointed it out more than a decade later. Thus, even for the scientists themselves it is sometimes impossible to accurately assess the value of their own discoveries until sometimes decades later. In the end, only history can measure the value of a scientific discovery.
So, do we just throw our hands up in dispair and forget all about science assessment?
Of course not.
If the ideal cannot be reached, a common strategy in science is to approximate it as well as we can. For instance, in order to analyze the orbits of several planets around a star, a series of iterations are calculated to approximate the orbits as well as needed. In evolutionary biology, fitness approximations are used to speed up evolutionary simulations. In mathematics, we approximate irrational numbers such as pi by shorter, approximate versions which serve whatever purpose we need them for.
What are the approximations used to assess scientific progress? Strangely enough, there really isn't a whole lot. The most commonly used approach is to study citations. The idea is that a publication that is cited a lot, must have made some sort of impact on the scientific community. However, some of the most highly cited publications are publications which have been retracted because they contain falsified or irreproducible data. They are highly cited because they contain invalid data, not because they constitute scientific progress. Similarly, publications which are controversial will receive more citations than less controversial, but potentially much more important advances. Clearly, the relationship between the quantitiy of citations and the quality of a scientific discovery is strenuous at best.
More strangely yet, this vague and unreliable relationship hasn't kept scientists from using citation data to evaluate scientific progress. Perhaps most notorious (and most embarrassing for the scientific community) is the use of journal-level citation data. It's hard to find any rational reasons why one would use such an unreliable measure of scientific progress as citation data and then make it even more unreliable by using the measure only on the journal where the scientific discovery in question was published, but not on the discovery itself. Nevertheless, metrics of this absurd variety (such as the Impact Factor) are deciding scientific careers in many countries around the world.
Another, more recent attempt at using citation data is the h-index. This index (and its many variations) is thought to provide a single figure evaluating the scientific contributions of an individual researcher.It suffers from the same problems as all citation-based metrics. For instance, any recent discovery will not have had sufficient time to accumulate many citations, especially if the person's research field is small, or the discovery is years ahead of the rest of the scientific community.
More promising approaches are still in their infancy. Some journals count the number of downloads for each publication or collect 'ratings'. There are relatively new online services, such as Mendeley which keep track of which publications scientists bookmark for potential later citations. Such services may alert us to new, fashionable results, but suffer from the same 'social' deficiencies as citations.
In the end, every single metric has its own disadvantages, which often are easy to spot. It seems straightfoward that every reasonable attempt at an approximation of scientific assessment will use many different metrics, potentially with different drawbacks, to assess not only publications but also data contributions, ideas, reviewing and other contributions by scientists which help advance science. Any single metric on any single type of contribution is going to be all too easily gameable, because the deficiencies of the individual metrics are just too plain obvious. Yet, even the best possible approximation will leave much to be desired. Science is inherently difficult to assess and even the most sophisticated technological assistance can only make us feel better, because at least we have done our best.
We should not delude ourselves that new technology will eradicate future Einsteinian 'blunders', infallibly detect the 'best' science and relegate 'bad' science into oblivion forever. The scientific community has grown so much and has become so competitive, that the way we do science has irreversibly changed. Technology can only help us cope with this growth, it will not solve our problems for us. At best, technology can reduce the social dynamics within the scientific community and make scientific assessment more objective. At worst, it may multiply these dynamics beyond the current levels. While we strive to develop and improve these technologies, we should make sure that future generations of scientists realize that there isn't anything like 'good' or 'bad' science, that there isn't anything more or less 'important', 'significant' or 'exciting' about one discovery or another. Much like the orgasms we experience, there aren't any 'bad' scientific discoveries. If we fail to propagate this concept at least as efficiently as we propagate any new metric, we might as well not use metrics in the first place.
Metrics, no matter how advanced, are mere crutches to help us cope with size and complexity. Any use beyond that is erroneous at best and demagoguery at worst.
In the best of all worlds, it would be clear to anyone what constitutes scientific progress: the publications are there for anyone to read and form their own opinion. However, there are by now more than 2 million publications published in more than 24,000 peer-reviewed journals (that's all scholarly publications) per year. Moreover, nobody can know all the different disciplines well-enough to understand every single article, even if it were theoretically possible to read all of them. Thus, some may say unfortunately, it is inevitable that we will have to resort to some computational assistance when assessing the progress of science.
Adding insult to injury, scientific progress isn't nearly as gradual, predictable and steady as it may seem. On the contrary, as one New York Times author puts it:
Basic research, the attempt to understand the fundamental principles of science, is so risky, in fact, that only the federal government is willing to keep pouring money into it. It is a venture that produces far fewer hits than misses.
So how do we distinguish hits from misses if we can't read about everything that's going on and even those few reports that we manage to read are almost impossible to understand for anyone but the few scientists in that particular field?The problems don't end there. Even the few scientists within a field are prone to mistakes in their assessment. Famously, Einstein missed that his own cosmological constant meant that the universe was indeed expanding and didn't realize that until Hubble pointed it out more than a decade later. Thus, even for the scientists themselves it is sometimes impossible to accurately assess the value of their own discoveries until sometimes decades later. In the end, only history can measure the value of a scientific discovery.
So, do we just throw our hands up in dispair and forget all about science assessment?
Of course not.
If the ideal cannot be reached, a common strategy in science is to approximate it as well as we can. For instance, in order to analyze the orbits of several planets around a star, a series of iterations are calculated to approximate the orbits as well as needed. In evolutionary biology, fitness approximations are used to speed up evolutionary simulations. In mathematics, we approximate irrational numbers such as pi by shorter, approximate versions which serve whatever purpose we need them for.
What are the approximations used to assess scientific progress? Strangely enough, there really isn't a whole lot. The most commonly used approach is to study citations. The idea is that a publication that is cited a lot, must have made some sort of impact on the scientific community. However, some of the most highly cited publications are publications which have been retracted because they contain falsified or irreproducible data. They are highly cited because they contain invalid data, not because they constitute scientific progress. Similarly, publications which are controversial will receive more citations than less controversial, but potentially much more important advances. Clearly, the relationship between the quantitiy of citations and the quality of a scientific discovery is strenuous at best.
More strangely yet, this vague and unreliable relationship hasn't kept scientists from using citation data to evaluate scientific progress. Perhaps most notorious (and most embarrassing for the scientific community) is the use of journal-level citation data. It's hard to find any rational reasons why one would use such an unreliable measure of scientific progress as citation data and then make it even more unreliable by using the measure only on the journal where the scientific discovery in question was published, but not on the discovery itself. Nevertheless, metrics of this absurd variety (such as the Impact Factor) are deciding scientific careers in many countries around the world.
Another, more recent attempt at using citation data is the h-index. This index (and its many variations) is thought to provide a single figure evaluating the scientific contributions of an individual researcher.It suffers from the same problems as all citation-based metrics. For instance, any recent discovery will not have had sufficient time to accumulate many citations, especially if the person's research field is small, or the discovery is years ahead of the rest of the scientific community.
More promising approaches are still in their infancy. Some journals count the number of downloads for each publication or collect 'ratings'. There are relatively new online services, such as Mendeley which keep track of which publications scientists bookmark for potential later citations. Such services may alert us to new, fashionable results, but suffer from the same 'social' deficiencies as citations.
In the end, every single metric has its own disadvantages, which often are easy to spot. It seems straightfoward that every reasonable attempt at an approximation of scientific assessment will use many different metrics, potentially with different drawbacks, to assess not only publications but also data contributions, ideas, reviewing and other contributions by scientists which help advance science. Any single metric on any single type of contribution is going to be all too easily gameable, because the deficiencies of the individual metrics are just too plain obvious. Yet, even the best possible approximation will leave much to be desired. Science is inherently difficult to assess and even the most sophisticated technological assistance can only make us feel better, because at least we have done our best.
We should not delude ourselves that new technology will eradicate future Einsteinian 'blunders', infallibly detect the 'best' science and relegate 'bad' science into oblivion forever. The scientific community has grown so much and has become so competitive, that the way we do science has irreversibly changed. Technology can only help us cope with this growth, it will not solve our problems for us. At best, technology can reduce the social dynamics within the scientific community and make scientific assessment more objective. At worst, it may multiply these dynamics beyond the current levels. While we strive to develop and improve these technologies, we should make sure that future generations of scientists realize that there isn't anything like 'good' or 'bad' science, that there isn't anything more or less 'important', 'significant' or 'exciting' about one discovery or another. Much like the orgasms we experience, there aren't any 'bad' scientific discoveries. If we fail to propagate this concept at least as efficiently as we propagate any new metric, we might as well not use metrics in the first place.
Metrics, no matter how advanced, are mere crutches to help us cope with size and complexity. Any use beyond that is erroneous at best and demagoguery at worst.
Posted on Tuesday 30 November 2010 - 18:31:00 comment: 0
{TAGS}
{TAGS}
You must be logged in to make comments on this site - please log in, or if you are not registered click here to signup
Render time: 0.1023 sec, 0.0051 of that for queries.