Technology & Science

Google Scholar Risks and Alternatives

I wasn’t there. But it sounds like a ripper of a talk. “There seems to be a narrowing of our collective view of the literature,” according to Jevin West. He’s from the University of Washington, and he was speaking at the Metascience 2019 symposium earlier this month.

Why does he think we are seeing a narrower slice of the literature? Via Carl Bergstrom on Twitter: West and his colleague, Jason Portnoy, studied over half the article usage on JSTOR, and in recent years, Google Scholar is swamping every other way of arriving at an article. One result?

That does not necessarily mean that the cream is rising to the top. For that to be true, people would have to be putting in a lot of effort to make sure their citation practices were impeccable. And they really don’t. Citation can just mean, here’s a thing I found so I can plonk a citation in this sentence.

A case in point: across Twitter in another conversation, Paul Whaley pointed out a 2018 study by Andreas Stang and colleagues. Stang had written about the many problems of a particular quality-measuring scale. The title began “Critical evaluation of…” Despite the fact that he was challenging the validity of the scale, in 96 systematic reviews citing his article, all but 2 were using it as justification for using that scale. “It appears”, wrote Stang & co, “that the vast majority of systematic review authors who cited this commentary did not read it”.

But back to Google Scholar. Johan Ugander weighed in: “…of course Google Scholar (+ other tools) are altering citation patterns, but not necessarily only in bad ways”. Ex-Googler, Helder Suzuki, pointed to a 2014 paper he co-authored with Anurag Acharya (Google Scholar’s developer) and others [PDF]. They report,

[T]he fraction of citations to articles published in non-elite journals has grown substantially over most research areas… Now that finding and reading relevant articles in non-elite journals is about as easy as finding and reading articles in elite journals, researchers are increasingly building on and citing work published everywhere.

It’s great to crack open the old monopolies of our attention. They were never as good at finding the best as they were reckoned to be. But how good is Google Scholar at herding us to the most important papers?

Katie Corker pointed to Chapter 2 of Nick Fox’s dissertation. Fox raises interesting points: we’re not reading more, and if we just lean on varieties of social cues – anything that’s click- or citation-based – where is that going to lead science?

And if we all become reliant on Google Scholar, what happens if Google pulls the plug on it? West pointed us to “Killed by Google“, lest we be too complacent about this. Bergstrom argued this blind dependence is a failure of the scientific community. It doesn’t have to be dramatic either: Google Scholar killed off one of its few functionalities not that long ago – one I used almost every day, making it far less useful to me. PubMed does that too sometimes, so it’s not just a matter of being a private company. There’s something risky, though, about having all your eggs in one basket.

What can we do about it? West and colleagues built a search engine based on images from PMC (PubMed Central). Even without trying to take on the giant commitment of building a community-driven wildly popular alternative to Google Scholar, there are many smaller scale projects that could help improve our access to knowledge.

In my area of interest, Epistemonikos is a pretty spectacular example. Developed by Gabriel Rada and colleagues in Chile, it’s an indispensable scientist-driven searchable, relational database of health evidence [PDF]. Here’s another, that also shows a useful resource doesn’t even have to be technologically advanced, although that sure helps. This database of curated methodology papers is indispensable, too, since it’s so hard to search for papers about a methodology, in the great ocean of papers using it – is simply released in a Zotero library. It’s not well known, though.

But it was this exchange that prompted this blog post:

I hadn’t looked at Microsoft Academic in ages, and it has changed a lot. I had no idea it had added downloads of results and citations. It’s got close to 200 million papers more than PubMed’s 30 million today, covering nearly 50,000 journals and more.

This is the second one mentioned: Dimensions. It’s got more than 100 million. But it looks like you need a subscription for its interesting features. Moving on.

The third is one I had never seen before, and it’s an eye-opener. Lens is open source, with APIs. Its core content is patents, but within 2 years they hope to be linking to “most of the scholarly literature”. So far they are tapping PubMed/PMC, Crossref, Microsoft Academic, and CORE. It’s at more than 200 million scholarly works.

Speaking of CORE, that’s now got over 130 million open access papers.

I hope I can lessen my reliance on Google Scholar by getting to know Lens and MS Academic better. Even if multiple databases have the same contents, big variations in the search engines mean you can end up with very different results.

There’s another aspect here, too. And that’s our search skills. A few years ago, I wrote a post about the impact Google has had on them. The little research I found then, and the last time I tried to update it, was grim reading. We could be worse at finding information these days, because of putting in too little effort and over-relying on the Google machine. Maybe the thing that’s most off-putting about adding more places to search is one of the strongest reasons for doing it: learning some new ropes and taking a bit more time.

~~~~

Disclosure: I was a senior scientist at the NIH’s NCBI working on PubMed-related projects from 2011 to 2018. (NCBI is part of the U.S. National Library of Medicine.)

On a related note, check out 8 PubMed Ninja Skills

Imaginary PubMed Ninja cartoon game

#Metascience2019

The video of West’s talk will be going online in the next few months. In the meantime, you can check out at least some of the discussion of the talk here.

The cartoons are my own (CC BY-NC-ND license). (More cartoons at Statistically Funny and on Tumblr.)