Last updated: 
2 months 3 weeks ago
Blog Manager
One of Jisc’s activities is to monitor and, where possible, influence regulatory developments that affect us and our customer universities, colleges and schools as operators of large computer networks. Since Janet and its customer networks are classified by Ofcom as private networks, postings here are likely to concentrate on the regulation of those networks. Postings here are, to the best of our knowledge, accurate on the date they are made, but may well become out of date or unreliable at unpredictable times thereafter. Before taking action that may have legal consequences, you should talk to your own lawyers. NEW: To help navigate the many posts on the General Data Protection Regulation, I've classified them as most relevant to developing a GDPR compliance process, GDPR's effect on specific topics, or how the GDPR is being developed. Or you can just use my free GDPR project plan.

Group administrators:

Life-long identifiers in Research and Education

Tuesday, November 19, 2013 - 09:07

There are several situations when it would be useful to have a life-long identifier that doesn’t change when we move house, employer or even country. Most of us already have life-long identifiers to link together all our interactions with the health service and the tax office; in research and education linking together our achievements would also be useful when preparing a CV or research proposal. However these applications have very different consequences if the link between individual and identifier fails; they also need to resist different types of threat. When using a life-long identifier it's important to know the types of problem it was designed to address and be particularly careful when moving beyond those. Indeed in the UK both tax and National Health Service identifiers are restricted by law to the purposes for which they were originally designed.

In the offline world the authority responsible for each identifier still tends to send out pieces of paper to inform us of our tax or NHS number; we can then show those pieces of paper to service providers if required. Online it's more usual to log in to the authority's database and either obtain service from them, or have them vouch for our identifier value to a third-party service provider. Rather than remembering where we put the vital piece of paper we need to remember the password to log in to the authority.

Processes for Identifiers

In any system of identifiers enrolment – when a new identifier is created and allocated to a particular individual – is a critical process that establishes what reliance subsequent users can place on the identifier. Life-long identifiers are also likely to need to be transferred or linked between authorities since few of us have a life-long relationship with a single authority: without leaving the UK I've attended six different educational organisations under three different national education authorities! The transfer/linking process as I and my identifier move from one authority to another needs to preserve the level of confidence established by the original enrolment.

Designing an enrolment process involves two main questions: do you need to know who someone is in the real world? And does it matter if one person has more than one identifier? For some applications it may be sufficient to know that a series of on-line actions were performed by the same person, and acceptable that they may have performed other actions under a separate identifier. If so, it may be possible to do enrolment entirely on-line. For health and taxation that’s not good enough so their enrolment processes must include real world checks of "identity" and "same person". For research and education, it probably depends on the application.

Processes for transfer of an identifier can create risks even if the owner's identity doesn't matter. I might well be interested in boosting my own publication record by claiming to be a prolific researcher who has moved institution. Weak on-line transfer processes have been used to take over e-mail and Twitter accounts and even domain names by forging a transfer request that appeared to come from the legitimate owner. Note that taking over an identifier can result in harm to the rightful owner, the organisation issuing the identity, or others who may rely on it. Probably the best way to transfer an identifier is to have the owner log in simultaneously to their old and new accounts and authorise the transfer or link between them, but this has to be done in what may be a narrow time window when both accounts exist.

Even having a single authority and allowing a person to have multiple identifiers is unlikely to avoid the need for secure processes during the lifetime of the identifier. If multiple identifiers are accidentally created for the same individual they may want to link or merge them; if the individual loses or forgets their password there needs to be a password reset process to re-establish the link between them and their account. Each of these processes needs to address the same risks as during transfer or linking, and provide an equivalent level of protection, in order to maintain confidence that the identifier is still controller by its intended owner.

In addition, of course, each authority needs to ensure that the systems issuing and using the identifier have adequate technical and organisational security to resist technical or social engineering attacks on the authority and its users.

What's the risk?

The risk to any life-long identifier depends very much on what it is used for and what incentives that creates for someone to try to misuse it. Incentives to misuse may not be limited to the academic community: criminals have found ways to make money from processes for grants and loans, and by reselling fraudulently obtained services. The following scenarios suggest how some possible uses of life-long identifiers in education might be misused.

  • Using a life-long identifier to index publications doesn't appear to create much incentive to misuse, though there might be advantages for my status if I can claim a Nobel Prize winner as joint author on one of my papers! However if the quantity or academic standing of 'my' papers is used as a basis for awarding grants, contracts or employment then there could be a financial motive to 'borrow' the status of others.
  • Using a life-long identifier to monitor or report usage of a resource can create various incentives. If there is a free trial period for each new identifier then there is an obvious incentive to create multiple identifiers; conversely if there is a bulk discount then I do better to share my identifier with others.
  • Charging against any identifier creates an incentive to quote another person or organisation's identifier value so they get the bill.
  • As in the criminal examples above, using an identifier to authorise access to valuable processes or data creates a correspondingly large incentive to attack the identifier and the authorisation process. Competitors and campaigning organisations have been willing to expend a lot of time and effort, not always lawfully, to get access to raw research data. If that can be done by guessing the password for my identifier or forging a message to transfer control of my account then it seems likely that they will try.

Each of these examples involves a different type of misuse that  the processes around the identifier must protect against. Those processes must provide a consistent strength of protection, as a mis-user will simply exploit the weakest link. Identifying the right strength involves a balance: too lightweight and the system will not provide sufficient assurance, too heavyweight and it will either not be used or will encourage workarounds such as password-sharing that undermine the intended assurance. Making this choice effectively determines what types and intensity of attack the system will protect against, and therefore what applications the system is, and is not, suitable for.

Once the cost of attacking a system has been designed in by the choice of processes and technologies, it is very hard to increase it without starting again from scratch. New applications must therefore be careful not to raise the benefits of attacking the identifier system to near or above that cost. Once the potential gains justify the cost of running a phishing campaign, forging a transfer request, cracking passwords or hacking servers, then someone will do it.

ORCID

With these thoughts in mind I've been looking at the best known life-long identifier in the research world: ORCID. According to its website, the main purpose of ORCID is to avoid confusion between people with the same or similar names:

As researchers and scholars, you face the ongoing challenge of distinguishing your research activities from those of others with similar names. You need to be able to easily and uniquely attach your identity to research objects such as datasets, equipment, articles, media stories, citations, experiments, patents, and notebooks.

The ORCID home page says you should be able to create a life-long identifier in 30 seconds (mine took a bit longer because I was studying the excellent privacy statement!). It's clear that this is designed as a lightweight process, suitable for widespread adoption. In fact the basic ORCID process contains no assurance of an individual's real-world identity: since the purpose is to distinguish people whose real-world identities may cause confusion, that's actually quite logical! What ORCID does provide assurance of is that a series of claims made by an ORCID identifier were made by the same person. And, provided users choose good passwords and use them safely, that assurance should be pretty good.

With the basic ORCID system, all claims about an identifier (the name of its owner, the claim that the owner was the author of that paper, etc.) are self-asserted by the owner with no external check. Perhaps surprisingly, for indexing and even low-levels of charging, that may well be sufficient. So long as you can send an invoice to the same ORCID identifier as ran up the bill, then it may not matter who the person actually is. Indexing of publications may even be self-correcting: the purpose of ORCID is to reduce confusion, so it would be paradoxical for someone to register an ORCID and then try to use it to create confusion. Furthermore claims to authorship are public and ORCID has a challenge process to dispute claims that the academic community think are untrue.

ORCID are introducing a process for third-party verification of claims so, for example, my employer could publicly confirm the claim that my ORCID does belong to the Andrew Cormack who works for Janet, and who was joint author of RFC3067. That could be useful for the original name de-confliction purpose ("Ah, that Andrew") and perhaps to give service providers confidence that there is a third party who may be able to compel me to pay my bills. But the process for registering a verifier needs to be different from the one for registering a basic ORCID identifier, otherwise I could simply create another Id, claim it belonged to "Janet" and then use it to verify my claim to work for "them". Processes relating to verifiers do need confirmation of identity that isn't just self-asserted, and need to be strong enough to preserve that confirmation through transfers, links, mergers and password resets.

So as far as I can see, ORCID seems well designed for the problems it’s intended to solve. It's quick and easy to use, and can provide the level of assurance needed to distinguish scholars and (given an appropriate verifier process) to verify their claims to authorship. But the idea of using ORCID as the gatekeeper to permit or deny access to valuable data or resources worries me. That application greatly increases the incentive to attack the ORCID processes and technologies. Probably those processes and technologies could be strengthened – you could build a system around face-to-face identity vetting and two-factor authentication – but that would sacrifice the ease of use that is critical to ORCID's main purpose.