Archive for Automated Plagiarism Engines

The trouble with arXiv

Posted in Biographical, Open Access with tags , , , , on October 17, 2024 by telescoper

We’re now publishing papers at a steady rate at the Open Journal of Astrophysics. This is probably not obvious to outsiders, but our platform actually consists of two different sites, one handling submissions and the other dealing with publishing those papers accepted. Although we have a large (and still expanding) team of volunteer Editors to deal with the former, as Managing Editor I am the only person with the keys to the publishing side of things. This part of the process has been simplified enormously after the automation introduced earlier this year but it still takes some time to do, as I have to check the overlay and metadata before pressing the button to deposit everything with Crossref and make the overlay live. I also announce each paper on social media. This usually takes around 15 minutes per paper, give or take.

Now that I’ve returned to full teaching duties at Maynooth University, I’ve developed a routine to deal with this activity. During workdays I usually wake around 7am, make some coffee, and then check the day’s arXiv mailing to see if any of our accepted papers have been announced. If any have, I do the honours while I have my coffee, and then proceed to shower and breakfast (including Coffee no. 2); if none have, I go straight to shower and breakfast. I’ve been following this routine for quite a while now.

In the last couple of weeks, however, I have noticed quite often when I try to look up newly-announced papers on arXiv that the connection times out with a message saying ‘rate exceeded’. If that happens I just wait a while and try again. It’s not a very serious issue but it does slow down the process.

Well, today I found out the reason via a message on Mastodon. The loading errors at arXiv are caused by people doing many simultaneous downloads in attempts to scrape all the content from arXiv as soon as it is announced. This is almost certainly to provide material for Large Language Models, such as ChatGPT, which are essentially Automated Plagiarism Engines. I propose the acronym APE for the kind of person who engages in this sort of activity.

This is a very tedious development and I hope arXiv can find a way of putting a stop to it without inconveniencing its authentic users. I suggest that the people managing arXiv identify the culprits and send the boys round.