Archive for bibliometrics

Do “high-quality journals” always publish “high-quality papers”?

Posted in Uncategorized with tags , , , , on May 23, 2023 by telescoper

After a busy morning correcting examination scripts, I have now reached the lunch interval and thought I’d use the opportunity to share a paper I found via Stephen Curry on Twitter with the title In which fields do higher impact journals publish higher quality articles?. It’s quite telling that anyone should ask the question. It’s also telling that the paper, in a Springer journal called Scientometrics is behind a paywall. I can at least share the abstract:

The Journal Impact Factor and other indicators that assess the average citation rate of articles in a journal are consulted by many academics and research evaluators, despite initiatives against overreliance on them. Undermining both practices, there is limited evidence about the extent to which journal impact indicators in any field relate to human judgements about the quality of the articles published in the field’s journals. In response, we compared average citation rates of journals against expert judgements of their articles in all fields of science. We used preliminary quality scores for 96,031 articles published 2014–18 from the UK Research Excellence Framework 2021. Unexpectedly, there was a positive correlation between expert judgements of article quality and average journal citation impact in all fields of science, although very weak in many fields and never strong. The strength of the correlation varied from 0.11 to 0.43 for the 27 broad fields of Scopus. The highest correlation for the 94 Scopus narrow fields with at least 750 articles was only 0.54, for Infectious Diseases, and there was only one negative correlation, for the mixed category Computer Science (all), probably due to the mixing. The average citation impact of a Scopus-indexed journal is therefore never completely irrelevant to the quality of an article but is also never a strong indicator of article quality. Since journal citation impact can at best moderately suggest article quality it should never be relied on for this, supporting the San Francisco Declaration on Research Assessment.

There is some follow-up discussion on this paper and its conclusions here.

The big problem of course is how you define “high-quality papers” and “high-quality journals”. As in the above discussion this usually resolves itself into something to do with citation impact, which is problematic to start with but if that’s the route you want to go down then there is sufficient readily available article-level information for each paper nowadays that you don’t need any journal metrics at all. The academic journal industry won’t agree of course, as it’s in their interest to perpetuate the falsehood that such rankings matter. The fact that correlation between article “quality” measures and journal “quality” measures is weak does not surprise me. I think there are many weak papers that have passed peer review and appeared in high-profile journals. This is another reason for disregarding the journal entirely. Don’t judge the quality of an item by the wrapping, but by what’s inside it!

There is quite a lot of discussion in my own field of astrophysics about what the “leading journals” are. Different ranking methods produce different lists, not surprisingly given the arbitrariness of the methods used. According to this site, The Open Journal of Astrophysics ranks 4th out of 48 journals., but it doesn’t appear on some other lists because the academic publication industry, which acts as gate-keeper via Clarivate, does not seem not to like its unconventional approach. According to Exaly, Monthly Notices of the Royal Astronomical Society (MNRAS) is ranked in 13th place, while according to this list, it is 14th. No disrespect to MNRAS, but I don’t see any objective justification for calling it “the leading journal in the field”.

The top ranked journals in astronomy and astrophysics are generally review journals, which have always attract lots of citations through references like “see Bloggs 2015 and references therein”. Many of these review articles are really excellent and contribute a great deal to their discipline, but it’s not obvious they can be compared with actual research papers. At OJAp we decided to allow review articles of sufficiently high quality because we see the journal primarily as a service to the community rather than a service to the bean-counters who make the rankings.

Now, back to the exams…

The Gaming of Citation and Authorship

Posted in Open Access with tags , , on February 22, 2023 by telescoper

About ten days ago I wrote a piece about authorship of scientific papers in which I pointed out that in astrophysics in cosmology it is often the case that many “authors” (i.e. people listed in the author list) of papers (largely those emanating from large consortia) often haven’t even read the paper they are claiming to have written.

I now draw your attention to a paper by Stuart Macdonald, with the abstract:

You can find the full paper here, but unfortunately it requires a subscription. Open Access hasn’t reached sociology yet.

The paper focuses on practices in medicine, but it would be very wrong to assume that the issues are confined to that discipline; others have already fallen into the mire. I draw your attention in particular to the sentence:

Many authors in medicine have made no meaningful contribution to the article that bears their names, and those who have contributed most are often not named as authors. 

The first bit certainly also applies to astronomy, for example.

The paper does not just discuss authorship, but also citations. I won’t discuss the Journal Impact Factor further, as any sane person knows that it is daft. Citations are not just used to determine the JIF, however – citations at article level make more sense, but are also not immune from gaming, and although they undoubtedly contain some information, they do not tell the whole story. Nor will I discuss the alleged ineffectiveness of peer review in medicine (about which I know nothing). I will however end with one further quote from the abstract:

The problem is magnified by the academic publishing industry and by academic institutions….

So many problems are…

The underlying cause of all this is that the people in charge of academic institutions nowadays have no concept of the intrinsic value of research and scholarship. The only things that are meaningful in their world are metrics. Everything we do now is reduced to key performance indicators, such as publication and citation counts. This mindset is a corrupting influence encourages perverse behaviour among researchers as well as managers.

Open Journal of Astrophysics Impact Factor Poll

Posted in Open Access with tags , , on February 5, 2021 by telescoper

A few people ask from time to time about whether the Open Journal of Astrophysics has a Journal Impact Factor.

For those of you in the dark about this, the impact factor for Year N, which is usually published in year N+1, is based on the average number of citations obtained in Year N for papers published in Years N-1 and N-2 so it requires two complete years of publishing.

For the OJA, therefore, the first time an official IF can be constructed is for 2021, which would be published is in 2022 and it would be based on the citations gained in 2021 (this year) for papers published in 2019 and 2020. Earlier years were incomplete so no IF can be defined.

It is my personal view that article-level level bibliometric data are far more useful than journal-level descriptors such as the Journal Impact Factor (JIF). I think the Impact Factor is very silly actually. Unfortunately, however, there are some bureaucrats that seem to think that the Journal Impact Factor is important and some of our authors think we should apply to have an official one.
What do you think? If you have an opinion you can vote on the twitter poll here:

I should add that my criticisms of the Journal Impact Factor are not about the Open Journal’s own citation performance. We have every reason to believe our impact factor would be pretty high.

Comments welcome.

What are scientific papers for?

Posted in Astrohype, Open Access with tags , , on May 30, 2020 by telescoper

Writing scientific papers and publishing them in academic journals is an essential part of the activity of a researcher. ‘Publish or perish’ is truer now than ever, and an extensive publication list is essential for anyone wanting to have a career in science.

But what are these papers actually for? What purpose do they serve?

I can think of two main purposes (which aren’t entirely mutually exclusive): one is to disseminate knowledge and ideas; the other is to confer status on the author(s) .

The academic journal began hundreds of years ago with the aim of achieving the former through distribution of articles in print form. Nowadays the distribution of research results is achieved much less expensively largely through online means. Nevertheless, journals still exist (largely, as I see it, to provide editorial input and organise peer review) .

Alongside this there is the practice of using articles as a measure of the ‘quality’ of an author. Papers in certain ‘prestigious’ ‘high impact’ journals are deemed important because they are indicators of status, like epaulettes on a uniform, and bibliometric data, especially citation counts, often seem to be more important than the articles themselves.

I thought it was just me getting cynical in my old age but a number of younger scientists I know have told me that the only reason they can see for writing papers is because you need to do it to get a job. There is no notion of disseminating knowledge just the need to establish priority and elevate oneself in the pecking order. In other words the original purpose of scientific publications has largely been lost.

I thought I’d test this by doing a (totally unscientific) poll here to see how my several readers think about this.

ADS and the Open Journal of Astrophysics

Posted in Open Access with tags , , , , , on January 19, 2020 by telescoper

Most if not all of the authors of papers published in the Open Journal of Astrophysics, along with a majority of astrophysicists in general, use the NASA/SAO Astrophysics Data System (ADS) as an important route to the research literature in their domain, including bibliometric statistics and other information. Indeed this is the most important source of such data for most working astrophysicists. In light of this we have been taking steps to facilitate better interaction between the Open Journal of Astrophysics and the ADS.

First, note that journals indexed by ADS are assigned a short code that makes it easier to retrieve a publication. For reference, the short code for the Open Journal of Astrophysics is OJAp. For example, the 12 papers published by the Open Journal of Astrophysics can be found on ADS here.

If you click the above link you will find that the papers published more recently have not got their citations assigned yet. When we publish a paper at the Open Journal of Astrophysics we assign a DOI and deposit it and related metadata to a system called CrossRef which is accessed by ADS to populate bibliographic fields in its own database. ADS also assigns a unique bibliometric code it generates itself (based on the metadata it obtains from Crossref). This process can take a little while, however, as both Crossref and ADS update using batch processes, the latter usually running only at weekends. This introduces a significant delay in aggregating the citations acquired via different sources.

To complicate things further, papers submitted to the arXiv as preprints are indexed on ADS as preprints and only appear as journal articles when they are published. Among other things, citations from the preprint version are then aggregated on the system with those of the published article, but it can take a while before this process is completed, particularly if an author does not update the journal reference on arXiv.

For a combination of reasons, therefore, the papers we have published in the past have sometimes appeared on ADS out of order. On top of this, of the 12 papers published in 2019, there is one assigned a bibliometric code ending in 13 by ADS and none numbered 6! This is not too much a problem as the ADS identifiers are unique, but the result is not as tidy as it might be.

To further improve our service to the community, we have decided at the Open Journal of Astrophysics that from now on we will speed up this interaction with ADS by depositing information directly at the same time as we lodge it with Crossref. This means that (a) ADS does not have to rely on authors updating the arXiv field and (b) we can give ADS directly information that is not lodged at Crossref.

I hope this clarifies the situation.

Not the Open Journal of Astrophysics Impact Factor

Posted in Open Access with tags , , , on October 22, 2019 by telescoper

Yesterday evening, after I’d finished my day job, I was doing some work on the Open Journal of Astrophysics ahead of a talk I am due to give this afternoon as part of the current Research Week at Maynooth University. The main thing I was doing was checking on citations for the papers we have published so far, to be sure that the Crossref mechanism is working properly and the papers were appearing correctly on, e.g., the NASA/ADS system. There are one or two minor things that need correcting, but it’s basically doing fine.

In the course of all that I remembered that when I’ve been giving talks about the Open Journal project quite a few people have asked me about its Journal Impact Factor. My usual response is (a) to repeat the arguments why the impact factor is daft and (b) point out that we have to have been running continuously for at least two years to have an official impact factor so we don’t really have one.

For those of you who can’t be bothered to look up the definition of an impact factor , for a given year it is basically the sum of the citations in a given year for all papers published in the journal over the previous two-year period divided by the total number of papers published in that journal over the same period. It’s therefore the average citations per paper published in a two-year window. The impact factor for 2019 would be defined using citations to papers publish in 2017 and 2018, etc.

The Open Journal of Astrophysics didn’t publish any papers in 2017 and only one in 2018 so obviously we can’t define an official impact factor for 2019. However, since I was rummaging around with bibliometric data at the time I could work out the average number of citations per paper for the papers we have published so far in 2019. That number is:

I stress again that this is not the Impact Factor for the Open Journal but it is a rough indication of the citation impact of our papers. For reference (but obviously not comparison) the latest actual impact factors (2018, i.e. based on 2016 and 2017 numbers) for some leading astronomy journals are: Monthly Notices of the Royal Astronomical Society 5.23; Astrophysical Journal 5.58; and Astronomy and Astrophysics 6.21.

Measuring the lack of impact of journal papers

Posted in Open Access with tags , , , on February 4, 2016 by telescoper

I’ve been involved in a depressing discussion on the Astronomers facebook page, part of which was about the widespread use of Journal Impact factors by appointments panels, grant agencies, promotion committees, and so on. It is argued (by some) that younger researchers should be discouraged from publishing in, e.g., the Open Journal of Astrophysics, because it doesn’t have an impact factor and they would therefore be jeopardising their research career. In fact it takes two years for new journal to acquire an impact factor so if you take this advice seriously nobody should ever publish in any new journal.

For the record, I will state that no promotion committee, grant panel or appointment process I’ve ever been involved in has even mentioned impact factors. However, it appears that some do, despite the fact that they are demonstrably worse than useless at measuring the quality of publications. You can find comprehensive debunking of impact factors and exposure of their flaws all over the internet if you care to look: a good place to start is Stephen Curry’s article here.  I’d make an additional point here, which is that the impact factor uses citation information for the journal as a whole as a sort of proxy measure of the research quality of papers publish in it. But why on Earth should one do this when citation information for each paper is freely available? Why use a proxy when it’s trivial to measure the real thing?

The basic statistical flaw behind impact factors is that they are based on the arithmetic mean number of citations per paper. Since the distribution of citations in all journals is very skewed, this number is dragged upwards by a few papers with extremely large numbers of citations. In fact, most papers published have many few citations than the impact factor of a journal. It’s all very misleading, especially when used as a marketing tool by cynical academic publishers.

Thinking about this on the bus on my way into work this morning I decided to suggest a couple of bibliometric indices that should help put impact factors into context. I urge relevant people to calculate these for their favourite journals:

  • The Dead Paper Fraction (DPF). This is defined to be the fraction of papers published in the journal that receive no citations at all in the census period.  For journals with an impact factor of a few, this is probably a majority of the papers published.
  • The Unreliability of Impact Factor Factor (UIFF). This is defined to be the fraction of papers with fewer citations than the Impact Factor. For many journals this is most of their papers, and the larger this fraction is the more unreliable their Impact Factor is.

Another usefel measure for individual papers is

  • The Corrected Impact Factor. If a paper with a number N of actual citations is published in a journal with impact factor I then the corrected impact factor is C=N-I. For a deeply uninteresting paper published in a flashily hyped journal this will be large and negative, and should be viewed accordingly by relevant panels.

Other suggestions for citation metrics less stupid than the impact factor are welcome through the comments box…


How do physicists and astronomers team up to write research papers?

Posted in Science Politics with tags , on October 16, 2013 by telescoper

Busy busy today so just time to reblog this, an interesting article about the irresistible rise of the multi-author paper. What fraction of the “authors” actually play any role at all in writing these papers? Am I the only one that thinks this has very profound implications for the way we interpret bibliometric analyses?


The way in which physicists and  astronomers team up to write technical papers has changed over the years, and not only is it interesting to look at this behavior for its own sake, but by analyzing the data it may be possible to better understand what role, if any, does the number of authors  have on the scientific impact of a paper. Likewise, such an analysis can allow physics and astronomy journals to make decisions about their publishing policies.

I was curious about the trends in the number of authors per refereed astronomy paper, so I set out to write an R script that would read in data from the NASA Astrophysics Data System, an online database of both refereed and non-refereed academic papers in astronomy and physics. The script counts the monthly number of refereed astronomy and physics papers between January 1967 and September 2013, as well as…

View original post 670 more words

The Impact X-Factor

Posted in Bad Statistics, Open Access with tags , , on August 14, 2012 by telescoper

Just time for a quick (yet still rather tardy) post to direct your attention to an excellent polemical piece by Stephen Curry pointing out the pointlessness of Journal Impact Factors. For those of you in blissful ignorance about the statistical aberration that is the JIF, it’s basically a measure of the average number of citations attracted by a paper published in a given journal. The idea is that if you publish a paper in a journal with a large JIF then it’s in among a number of papers that are highly cited and therefore presumably high quality. Using a form of Proof by Association, your paper must therefore be excellent too, hanging around with tall people being a tried-and-tested way of becoming tall.

I won’t repeat all Stephen Curry’s arguments as to why this is bollocks – read the piece for yourself – but one of the most important is that the distribution of citations per paper is extremely skewed, so the average is dragged upwards by a few papers with huge numbers of citations. As a consequence most papers published in a journal with a large JIF attract many fewer citations than the average. Moreover, modern bibliometric databases make it quite easy to extract citation information for individual papers, which is what is relevant if you’re trying to judge the quality impact of a particular piece of work, so why bother with the JIF at all?

I will however copy the summary, which is to the point:

So consider all that we know of impact factors and think on this: if you use impact factors you are statistically illiterate.

  • If you include journal impact factors in the list of publications in your cv, you are statistically illiterate.
  • If you are judging grant or promotion applications and find yourself scanning the applicant’s publications, checking off the impact factors, you are statistically illiterate.
  • If you publish a journal that trumpets its impact factor in adverts or emails, you are statistically illiterate. (If you trumpet that impact factor to three decimal places, there is little hope for you.)
  • If you see someone else using impact factors and make no attempt at correction, you connive at statistical illiteracy.

Statistical illiteracy is by no means as rare among scientists as we’d like to think, but at least I can say that I pay no attention whatsoever to Journal Impact Factors. In fact I don’t think many people in in astronomy or astrophysics use them at all. I’d be interested to hear from anyone who does.

I’d like to add a little coda to Stephen Curry’s argument. I’d say that if you publish a paper in a journal with a large JIF (e.g. Nature) but the paper turns out to attract very few citations then the paper should be penalised in a bibliometric analysis, rather like the handicap system used in horse racing or golf. If, despite the press hype and other tedious trumpetings associated with the publication of a Nature paper, the work still attracts negligible interest then it must really be a stinker and should be rated as such by grant panels, etc. Likewise if you publish a paper in a less impactful journal which nevertheless becomes a citation hit then it should be given extra kudos because it has gained recognition by quality alone.

Of course citation numbers don’t necessarily mean quality. Many excellent papers are slow burners from a bibliometric point of view. However, if a journal markets itself as being a vehicle for papers that are intended to attract large citation counts and a paper published there flops then I think it should attract a black mark. Hoist it on its own petard, as it were.

So I suggest papers be awarded an Impact X-Factor, based on the difference between its citation count and the JIF for the journal. For most papers this will of course be negative, which would serve their authors right for mentioning the Impact Factor in the first place.

PS. I chose the name “X-factor” as in the TV show precisely for its negative connotations.

The H-index is Redundant…

Posted in Bad Statistics, Science Politics with tags , , , , , on January 28, 2012 by telescoper

An interesting paper appeared on the arXiv last week by astrophysicist Henk Spruit on the subject of bibliometric indicators, and specifically the Hirsch index (or H-index) which has been the subject of a number of previous blog posts on here. The author’s surname is pronounced “sprout”, by the way.

The H-index is defined to be the largest number H such that the author has written at least H papers having H citations. It can easily be calculated by looking up all papers by a given author on a database such as NASA/ADS, sorting them by (decreasing) number of citations, and working down the list to the point where the number of citations of a paper falls below the number representing position in the list. Normalized quantities – obtained by dividing the number of citations a paper receives by the number of authors of that paper for each paper – can be used to form an alternative measure.

Here is the abstract of the paper:

Here are a couple of graphs which back up the claim of a near-perfect correlation between H-index and total citations:

The figure shows both total citations (right) and normalized citations (left); the latter, in my view, a much more sensible measure of individual contributions. The basic problem of course is that people don’t get citations, papers do. Apportioning appropriate credit for a multi-author paper is therefore extremely difficult. Does each author of a 100-author paper that gets 100 citations really deserve the same credit as a single author of a paper that also gets 100 citations? Clearly not, yet that’s what happens if you count total citations.

The correlation between H index and the square root of total citation numbers has been remarked upon before, but it is good to see it confirmed for the particular field of astrophysics.

Although I’m a bit unclear as to how the “sample” was selected I think this paper is a valuable contribution to the discussion, and I hope it helps counter the growing, and in my opinion already excessive, reliance on the H-index by grants panels and the like. Trying to condense all the available information about an applicant into a single number is clearly a futile task, and this paper shows that using H-index and total numbers doesn’t add anything as they are both measuring exactly the same thing.

A very interesting question emerges from this, however, which is why the relationship between total citation numbers and h-index has the form it does: the latter is always roughly half of the square-root of the former. This suggests to me that there might be some sort of scaling law describing onto which the distribution of cites-per-paper can be mapped for any individual. It would be interesting to construct a mathematical model of citation behaviour that could reproduce this apparently universal property….