Where tech aligns

Steal This Book?

A group of publishers is suing the Internet Archive, and a lot of people aren’t happy about it.

The Twitter commentariat lined up to share their views, pitting authors, on one side, against internet libertarians on the other, equating the lawsuit to burning down the Great Library of Alexandria, yelling about rentiership, and insisting that all the information, in any case, wants to be free. 

The lawsuit started in 2020, but has surfaced recently into the rather febrile social media discourse because after two years of wrangling both sides have now filed for summary judgment. As is often the case, there is something of a chasm between the discourse and the underlying reality. The filing from the publishers is here and from the Internet Archive is here

So what is it about? Who is involved, and what’s at stake, online and off?

A consortium of four large publishing houses has brought the action: Hachette Book Group, HarperCollins Publishers, John Wiley & Sons, and Penguin Random House. They’re big boys of the publishing world, and you almost certainly know their names. Their legal action is supported by the Association of American Publishers and the Authors Guild

You’ve probably also heard of the Internet Archive (IA). They run the Wayback Machine (this lawsuit isn’t about that, though the IA say that if they lose the lawsuit the financial viability of the IA itself and, consequently, of the Wayback Machine may be at risk). Their legal defense is being led by the Electronic Frontier Foundation (known as the EFF), who you may also have heard of. They’re sort of the Amnesty International of the internet, when it comes to things like privacy, free speech, and the legal protection of open source projects. 

The lawsuit is about the Open Library, an IA project founded in 2010, and specifically about its use of a process known as Controlled Digital Lending (CDL). The principle behind CDL is that a digital copy is made of a book in a library collection (generally by scanning). That copy is then lent out as a time-limited, DRM-protected digital download on a one-to-one basis in place of some or all of the print copies the library holds. If a library owns five print copies of a title, they should only be making use of five total copies at once. If five digital copies are out on loan then all five print copies should be unavailable to be consulted or borrowed.

Library lending in the US rests on a legal principle called the First Sale Doctrine. This establishes the idea that once you have bought a copy of a copyright work (a book, for example), the copyright holder can’t impose any further restrictions on what you do with the physical object you now own. You can sell it to someone else, put it on display, cut it up to make Christmas decorations, set fire to it to show your disapproval of its contents, and none of these things is copyright infringement (though burning books may make you an idiot). Having bought a book, a library can loan it out without any additional liabilities to or restraint by the copyright holder. 

The First Sale Doctrine does not entitle you to reproduce, sell, or create derivative works based on the copyright contents. For example, you can’t buy a book, translate it and publish your translation, or record it as an audiobook and stick it on Audible, even for free – both cases amount to creating an infringing derivative work. The IA’s summary judgment filing claims that their digitization and archiving, instead, creates a transformative work, and that transformative work is inherently protected as fair use.

The argument underpinning CDL is that lending digital books “as print” in this way (even if the process of scanning the contents to create a digital copy does create an unlicensed derivative work) should be considered fair use because, in effect, it’s a “no harm, no foul” action. Superficially at least, this seems a compelling argument. CDL has been quite widely adopted by public and institutional libraries, frequently through digitization partnerships with the Open Library; in fact its popularization, and quite possibly even its invention, can be attributed to the OL.

This case is the first legal test of the principle. A white paper and position statement on CDL, published in 2018, and co-authored by the Internet Archive’s Policy Counsel Lila Bailey, concluded that while there are legal gray areas, it’s likely that CDL would be considered fair use, particularly where it is used to allow access, for example, to books that are copyright expired, out of print, or so-called “orphan works” whose legal copyright owner cannot be identified. In these cases, the fact the copyright holder isn’t actively attempting to exploit the work themselves would make it very hard to claim that someone lending out a digital version of dubious legality on a limited basis was causing them any financial loss. 

The white paper notes such lending would most likely not be legal if lending was generating income (via rental charges or advertising, for example), if ongoing dissemination of the digital copies wasn’t robustly prevented by use of appropriate DRM, or if digital copies were being loaned of books not owned (or not owned in sufficient number) as print copies by the lending library. It concedes that the fair-use, public benefit arguments for loaning recently published, in-copyright works of fiction in this way may well be weaker than for non-fiction and reference titles.

The plaintiffs (we’ll call them the publishers, to avoid this article reading like a Law & Order script) complain that the Open Library’s digitization and lending operations amount to copyright infringement on an industrial scale. If we accept the definition of CDL above, that sounds like a rather hyperbolic claim. The Open Library’s defenders meanwhile make an equally hyperbolic counter-claim that the lawsuit “aims to criminalize library lending” itself.

So why has the lawsuit been brought now, and what is it about the Open Library specifically that has led to the publishers taking this step?

Digging through the court filings, and through the Internet Archive’s own blog, there would seem to be several aspects of the Open Library operation – some of which are inherent to CDL, some existing parts of the OL “ecosystem” prior to 2020, and some of which specifically have to do with the OL’s actions in connection with its National Emergency Library during the first COVID-19 lockdowns – which might reasonably raise questions about the way the Open Library has been operating.

It’s worth noting the publishers don’t like CDL, full-stop (it allows libraries to avoid paying separately to license official eBooks for digital loan), and would ideally like it struck down. But they also argue that even if the court decides that CDL in principle is fair use, the version implemented by the OL isn’t.

There are obviously ways in which CDL is inherently not “like” loaning print copies, even if it’s done with scrupulous care. A single digital copy of a popular book can be passed almost instantly from one borrower to the next, with no downtime, which means a CDL library can achieve the same number of loans from a smaller number of owned copies than a print library can. Print library books suffer significant wear and tear, and popular titles have to be replaced regularly with fresh copies as they wear out or become unacceptably annotated; whereas those used as collateral for CDL loans could sit perfectly undamaged in the stacks, indefinitely. (And according to the CDL FAQ they can even be destroyed altogether, though not resold or donated.) Libraries often purchase more expensive hardcover editions of books; whereas books held as CDL collateral might as well be paperbacks. All of these factors mean that a library lending on a CDL basis is probably buying fewer, cheaper copies of books from the publisher than one lending the same books in print. 

Do those things represent sufficient harm to justify declaring CDL itself to be unlawful, and not protected by fair use? We should find out reasonably soon, and that decision clearly has implications beyond the Open Library and the Internet Archive itself. 

Is it really true, as the EFF asserts, that the Open Library’s activity is “fundamentally the same as traditional library lending, and poses no new harm to authors or the publishing industry”? 

You might imagine that if the OL were offering, for example, one hundred copies of Harry Potter and the Goblet of Fire, it must have one hundred crisp clean copies all stored tidily in a warehouse somewhere. But it’s likely you would be wrong. The first clue is in how the EFF describes the activities of the Open Library, saying that it “only permits patrons to check out as many copies as the Archive and its partner libraries physically own” (emphasis mine). The IA’s own website is not especially forthcoming on the subject of who these “partner libraries” are but it appears that OL is using as collateral, not just books in its own collection, but also those on the shelves and in the stacks of partner organizations. How many of those hundred notional books are we talking about? How certain can the OL be that those books are not also available physically at any given time? How reasonable is it for the OL to lend a digital copy using a different organization’s holdings as collateral?

But at least the books IA does own have been bought in the same way a library would buy a print copy for lending? That seems not to be the case either. You only need to have looked at a handful previews of books on the Open Library website to stumble upon one that has clearly been digitized from a withdrawn library book (library stamps don’t lie). In 2019, the IA trumpeted a commercial link-up with Better World Books. There’s a load of buzzword-heavy guff about “newfound synergies,” and the details of the commercial arrangement are murky, but reading between the lines: Better World Books will be supplying secondhand books, many of them at the end of their library lifespan, to Open Library for digitization, and, it’s fair to assume, as a free or very low-cost source of literal van-loads of “collateral” copies. 

Does it make a legal difference if the print copy being held as collateral is not in lendable condition? What about a copy acquired in secondhand condition and then shredded (since after all, destruction of the physical copy is permitted)? It seems possible the OL could be lending titles it doesn’t own, having had a single copy pass temporarily through its hands for digitization, based on copies it believes partner libraries hold, and on copies whose ownership transferred briefly to the OL from Better World Books on their way to landfill.

In 2020 the Open Library launched what it called the National Emergency Library, in response to the first COVID-19 lockdowns. Claiming that the unprecedented circumstances justified lifting their lending controls, they uncoupled from the fixed one-to-one owned-to-loaned ratio inherent to CDL, lending unlimited copies of any given title at once. The publishers argue there was no legal justification for this action and that it simply represents large-scale copyright infringement. Other than some hand-waving that all of the books locked up in closed public libraries should somehow be counted against their digital lending, and an appeal to the exceptional circumstances of the pandemic amounting to fair use, it’s not clear that IA has any defense to this complaint. The National Emergency Library ceased operating earlier than IA had intended in response to this lawsuit being filed.

While the Open Library presents itself as a non-profit-David lining up against the mighty big-publishing-Goliath, the digitization services it provided to libraries between 2011 and 2020 yielded over $30 million in revenue. Founder Brewster Kahle is also reported to have earned hundreds of millions of dollars personally from licensing the scanning technologies developed by the Open Library digitization project to businesses such as Amazon. 

The claim by the publishers against the Internet Archive is for damages totalling $19 million. Though how much of that the IA may have to pay, if found at fault, will depend to a great extent on whether they can convince the court they were acting in good faith and believed their actions were legal at the time. The Open Library homepage is currently soliciting donations from visitors in relation to the lawsuit, claiming “the right for libraries to lend books is being threatened.”