De-identifying your data
It's common for researchers to conclude that sensitive data can never be published or shared in any way, but one option that's often overlooked is the sharing of a portion of the original data that has been de-identified.
The process of data de-identification or anonymisation involves the removal of all data that can be used to identify individual participants in a research project, thereby protecting their privacy.
Typically the process of de-identification involves the removal of direct identifiers such names, addresses, phone numbers, etc. from the data. However the data may include indirect identifiers or data elements that don't identify an individual on their own, but can be combined with other information to identify a person and so should also be considered for removal depending on the data collected. Indirect identifiers can include things such as place of employment, occupation, postcode, ethnicity or age. These data elements aren't always problematic, but can be used to identify individuals where a particular combination of filters can restrict the data to a very small population. For example, there may only be one Aboriginal woman living in Maldon aged between 20 and 25, so other information about this person can be obtained from the data if the appropriate data filters are applied.
Data de-identification can be both time-consuming and expensive, and the process of stripping away identifiable data elements sometimes means that the data set loses almost all value. Because of this, the de-identification of data is not always the perfect solution for sharing sensitive data and whether it works for you will depend on the data being collected and the purposes to which it will be used. It is, however, generally a better option than not publishing or sharing any of your data at all.