In reviewing Canada’s legal framework around the access of data for research purposes, there are some key findings that are summarized here:
- There are ethical imperatives to protect confidentiality but also to provide access to quality data that enable research in the public interest
- Data custodians have fundamental duties to protect confidentiality. This underpins their conduct and leads to cautious and conservative interpretations of allowable access when a complimentary mandate for access is not made explicit
- Federal and provincial laws generally address identifiable information and do not constrain researchers from access to de-identified data
- However, there are imprecise and inconsistencies around the definition of identifiable information
- Consent to data by participants has been a key requirement for experimental research. Access to information however does not require the same ethical and legal considerations. Risk associated with access to data need to be weighed against the research benefits healthcare outcomes for the general public.
For the purposes of this series we’ll focus on the use of de-identified data as this represents the “low hanging fruit” that can be used with great effect both within the research community and commercially within the existing legal framework. No discussion around the use of de-identified data can begin without first addressing the risk associated with re-identification of that data.
Risk of Re-Identification
In Ontario, the Personal Health Information Protection Act (PHIPA) specifically outlines de-identified data is not subject to any of the limitations and restrictions imposed by PHIPA. However, custodians have a duty to minimize risk and this starts from minimizing the amount of personal information collected. In fact, PHIPA mandates that if personal information is not required then it should not be collected. Custodians are incentivized to de-risk privacy leaks as much as possible – to this end, there is insufficient trust in the de-identification process thus resulting in a limiting principle which is to not share any data at all. This mistrust stems from concerns that re-identification can be achieved through the use of auxiliary data.
However, studies have shown that using the Safe Harbor Standard under HIPAA for de-identification reduces the possibility of re-identification by at least 2500 times (Ann Cavoukian) (less than .05% chance of re-identification). Further removal of quasi-identifiable information (e.g gender, postal code, occupation, unusual diagnosis) can further reduce the chance of re-identification. However, the process of de-identification can sometimes reduce the quality of the data to a degree that limits the usability for research purposes.
Data custodians need not take such a conservative approach given the clear value of sharing data. Instead, the process of de-identification should take into consideration the broad spectrum of risks that exist with any dataset. On one extreme, a custodian controls data that is not de-identified and will take the utmost conservative approach. However, this can lead to misuse, theft, sharing through improper means because no sanctioned method to share data exists. At the other end of the spectrum a custodian may have completely de-identified data but washed to a degree that it loses its research value.
Custodians should go through a methodical and objective process of analyzing the risk of re-identification. They can do this by, firstly, understanding the data elements that make the data valuable for the intended recipient of the data. The custodian should then consider an acceptable level of risk – there is no 100% guaranteed solution so acceptance of a non-100% guarantee should be a reasonable consideration. This risk threshold should consider the acceptable level of risk given the value of the data to researchers.