Myki incident- lessons for organisations
By Annan Boag, Assistant Commissioner, Privacy and Assurance
OVIC has published an investigation report about the disclosure of myki travel information by Public Transport Victoria (PTV), now part of the Department of Transport.
In this blog post, I summarise some lessons from this investigation for organisations that hold data about people and their activities – particularly those organisations that want to release or use their data holdings in a de-identified form. Please refer to the full investigation report for more information.
- De-identifying large and complex datasets is difficult, and in some cases may be impossible
PTV released information about myki trips based on an understanding that the information was de-identified. However, analysis completed by data scientists and academics revealed that people’s travel movements could be identified from the dataset.
This highlights how difficult effective de-identification can be. For longitudinal unit-level data about people’s behaviours, such as the myki dataset, true de-identification may be impossible. Given this, organisations should consider that this sort of material may not be suitable for open release.
For more information about the limits of de-identification see Protecting unit-record level personal information.
- Organisations should not rely solely on de-identification to protect data
PTV’s confidence in its de-identification approach meant that it took no other steps to protect the myki data that it gave to the Melbourne Datathon. The data was given without requiring any agreement from the recipients about how the information might be used, or whether it could be on-disclosed.
This incident demonstrates why it is dangerous to rely on de-identification alone when sharing and releasing data. Organisations should consider referring to frameworks such as the ‘Five Safes Framework’ to identify a range of appropriate protections for shared data.
Protections might include developing contractual terms that prevent re-identification or on-disclosure or providing access in a secure environment or data lab. Victoria has a scheme in place to allow for effective information sharing and analysis within government, in the form of the Victorian Centre for Data Insights (VCDI) and the Victorian Data Sharing Act 2017 (Vic).
For information about controls and safeguards that can be applied when sharing data see De-identification and privacy: Considerations for the Victorian Public Sector.
- When considering if ‘de-identified’ information is personal information, context is crucial
When deciding if data needs to be protected as ‘personal information’ under privacy law, it is crucial to consider the context in which the data will be held or disclosed, and how recipients might use it. Rather than taking a narrow view of a dataset in isolation, consider what might happen if the dataset was combined with other information, for example social media. If the data is released or held in a context in which it is harder to match with other information, it is less likely to be regarded as personal information.
If in doubt, it may be prudent to err on the side of caution and protect the de-identified information as you would protect personal information.
For more information about the definition of ‘personal information’ see the Guidelines to the Information Privacy Principles.
- Privacy impact assessments, if done incorrectly, can create a false sense of security
A privacy impact assessment (PIA) was completed before the data was released by PTV. However, the analysis contained in the PIA was inadequate and did not properly consider privacy risks. The PIA created a false sense of security that privacy had been adequately considered, when in fact it had not.
A privacy impact assessment is only as effective as the process, expertise, and analysis that sits behind it. Filling in a form or a template is not a guarantee that privacy has been protected. A check-box compliance approach to privacy is not enough.
For more information about identifying add addressing the privacy risks in a proposed project or initiative see OVIC’s PIA guidance.
- Clear lines of responsibility are needed for effective data governance
Confusion about who was responsible for protecting the dataset contributed to this incident. Two agencies involved in the data release both believed that the other was responsible for ensuring that the data recipient secured the data.
This highlights the importance of clear lines of responsibility as an element of data governance. Organisations involved in data sharing and release, especially when working with other agencies, need to have a clear shared understanding about who is responsible for considering and addressing privacy risks.
For more information about how to safely share personal information in accordance with the PDP Act see Information Sharing and Privacy.