Last updated: 
6 months 2 weeks ago
Blog Manager
One of Jisc’s activities is to monitor and, where possible, influence regulatory developments that affect us and our customer universities, colleges and schools as operators of large computer networks. Since Janet and its customer networks are classified by Ofcom as private networks, postings here are likely to concentrate on the regulation of those networks. Postings here are, to the best of our knowledge, accurate on the date they are made, but may well become out of date or unreliable at unpredictable times thereafter. Before taking action that may have legal consequences, you should talk to your own lawyers. NEW: To help navigate the many posts on the General Data Protection Regulation, I've classified them as most relevant to developing a GDPR compliance process, GDPR's effect on specific topics, or how the GDPR is being developed. Or you can just use my free GDPR project plan.

Group administrators:

Free Text and Data Protection

Friday, March 2, 2018 - 09:49

Collections of free text – whether in database fields, documents or email archives – present a challenge both for operations and under data protection law. They may contain personal data but it's hard to find: whether you're trying to use it, to ensure compliance with the data protection principles, or to allow data subjects to exercise their legal rights. Some level of risk is unavoidable in these collections, but there are ways to reduce it.

  • Provide structured fields wherever possible. If you know that a helpdesk ticket will contain the requester's name and e-mail address, ensure those fields exist in the database. This makes the information much easier to find for operational purposes, as well as to apply appropriate deletion/anonymisation policies.
  • Set policies for using those structured fields, for when and how personal data may be entered into unstructured fields, and which personal data (e.g. sensitive) should never be entered there. Some of the data may be entered by people not under your control (e.g. if someone describes their health problems in a website comment field), but at least those who are under your control should know how to do the right thing. Knowing the source of unstructured data and the ways in which it is collected should also help in the subsequent assessment of how great a risk it represents.
  • Set appropriate retention periods for both structured and unstructured fields. With structured fields it should be relatively easy to define when personal data are no longer needed for the purpose and the content of a single field can be deleted or over-written. For unstructured fields this is harder, since both the utility and risk of long retention are unknown. Deciding on an appropriate period to retain unstructured information is likely to involve balancing the benefit and the risk, taking account of the uncertainty of both.
  • For high-risk situations or activities, it may be worth considering using either humans or computers to scan unstructured data for personal content. The choice involves a trade-off: humans are likely to be more accurate but also more expensive. Conversely a computer may spot a name and redact it without realising that it was critical to the meaning and purpose of the record (though in that case it should, perhaps, have been in a structured field anyway).

Databases and other collections should also be secured using technical means, of course. Where appropriate to the purpose, access controls can ensure that only authorised users can see the content, encrypting that content when it is at rest and in transmission can protect against those with physical access.

Finally, the organisation should assess the remaining risk – it is very unlikely to be possible to eliminate it – and ensure that this is justified by the benefits of storing and processing the data. The General Data Protection Regulation's requirement to demonstrate accountability for processing of personal data probably means this assessment (and particularly the reasons why possible risk-reduction options were not taken) should be documented, at least for large collections of information.