Last updated: 
4 weeks 10 min ago
Blog Manager
One of Jisc’s activities is to monitor and, where possible, influence regulatory developments that affect us and our customer universities, colleges and schools as operators of large computer networks. Since Janet and its customer networks are classified by Ofcom as private networks, postings here are likely to concentrate on the regulation of those networks. Postings here are, to the best of our knowledge, accurate on the date they are made, but may well become out of date or unreliable at unpredictable times thereafter. Before taking action that may have legal consequences, you should talk to your own lawyers. NEW: To help navigate the many posts on the General Data Protection Regulation, I've classified them as most relevant to developing a GDPR compliance process, GDPR's effect on specific topics, or how the GDPR is being developed. Or you can just use my free GDPR project plan.

Group administrators:

Apples and Oranges

Wednesday, April 8, 2015 - 19:30

In discussions of the "Right to be Forgotten" it is often observed that Google manages each month to deal with tens of millions of delisting requests for breach of copyright, as opposed to tens of thousands for inaccurate personal data. Often the implication seems to be that those numbers should be more similar. However it seems to me that the two types of request need to be handled in significantly different ways and that they probably require, on average, significantly different amounts of manual effort per request received. If the processes ought to be different, then we need to be careful when comparing them, lest we (or search engines implementing them) come to the wrong conclusion.

The main differences concern the source of requests and the content to which they apply.

It seems likely that most requests to "forget" will come from individuals and that, unless they are particularly unfortunate, most individuals will only have one or a few pages to complain about. That means Google may well have to check the requester's identity and entitlement to make a request for nearly every "forget" request they receive. That contrasts with copyright delisting requests that generally come in large numbers from a small number of rights holders and their representatives. That can allow a much more efficient identification process, for example by exchanging digital signatures so the sender’s identity can be verified automatically in future.

Automation is also a possibility for copyright delisting as most requests will apply to the second, tenth or hundredth identical copy of the same digital file. Once one copy of the file has been assessed as probably infringing, requests relating to further identical copies can be recognised immediately using hash values. It seems likely that anyone trying to implement an efficient takedown process would conclude that all identical copies should be treated in the same way. With "forget" requests, by contrast, it seems unlikely that identical pages will reappear so, again, every request will need to be assessed manually.

There are also significant differences in the laws that apply to the two types of request, which ought to make a difference to a search engine that tries to implement them accurately.

The European Court's definition of the "right to be forgotten" under Data Protection law explicitly requires judgments and balancing tests in every case: is the material inaccurate, irrelevant, irrelevant or excessive? does the public interest in finding the material outweigh the individual's right to object to processing? For material written in human language, it's hard to conceive of a computer being able to apply those rules. Copyright law involves different tests: is the material subject to copyright in a relevant country? is the publication covered by fair use or other exemptions (again, with national variations)? Here there may be some possibility for computers to help, particularly when multiple requests are received for the same material.

If there's any value in comparing and contrasting the two kinds of request, I think it needs to be done at this kind of detailed level. Raw numbers of requests don't say much about what is (or ought to be) going on.