Leakage of data in research can result in serious losses for research subjects, sponsors, and investigators. For example, leakage of subject identity data can expose research subjects to the risk of identity theft, embarrassment, and even physical and mental harm. This can reduce the integrity of the investigators and make it difficult to conduct other research in the future. In the event of data leakage, investigators must report the incident to the Research Ethics Committee, and report to all individuals whose data was leaked.
Some of the causes of data leaks are as follows.
For this reason, every tool/device and communication lines used to store/transmit data must be protected. Here are some ways that investigators can do to protect research data.
Using Data Deidentification
Each research data contains Personally Identifiable Information (PII), which is information or a combination of information that can be used to identify a particular individual (e.g. name, ID card number). It would be dangerous if research data could be directly linked to the research subject’s PII. For this reason, investigators must separate PII from all relevant research data for analysis. Alternatively, the investigator can use a randomly selected study ID to separate the individual’s personal identity from the data used for analysis. The research ID can be created using software such as STATA, R, Microsoft Excel by ensuring the uniqueness of each ID. Here are some things to avoid in creating a research ID to avoid the data being re-identifiable.
After going through the de-identification process, the two datasets (dataset containing PII and dataset for analysis) may not be combined except in cases where it is required. Datasets containing PII are stored in encrypted storage and protected from viruses.
In addition to electronic data, physical data such as survey files containing the identity of the subject must also be protected. Investigators should consider separating PII from other analytical data when designing surveys. One alternative that can be done is to put personal information on the survey cover sheet. After the survey is completed and the identification is carried out, the cover sheet is stored in a separate place from the other analysis data.
Data Storage Encryption
Encryption is the conversion of data into a code that requires a series of passwords or keys to open it. Some computer operating systems already have their own encryption software. However, investigators may also consider third-party encryption services (AES, Blowfish, etc.). Investigators are advised to encrypt data at several levels as follows.
KEP LPEM FEB UI specifically recommends investigators to perform encryption at least at the level of folders that store research data, especially for research data containing personal identity information of research subjects.
Data storage recommendations can be briefly summarized in the following table.
|Raw data or data with PII
|Encrypted folder on cloud storage/data server specifically for research projects
|Data that has gone through the de-identification process
|Normal folders, but device still needs to be password protected
|Physical data (paper questionnaire)
Protect Data Transfer
Data that has been protected by encryption during storage, is not necessarily protected when the data is distributed. To reduce the threat of data leaks during transmission, here are some things you can do:
In addition to electronic data distribution, physical information such as survey files must also be transferred with care. For example, using a locked suitcase and using a private vehicle.
When using data with more than one person, it will be very helpful if the research team makes a protocol for sending and using data between research members. This is to ensure that each member understands the steps for securing research data and avoiding leakage.
Using Keywords (Password)
Even if the file is protected with a password, there is still the possibility that the password used can be cracked. Here are some steps you can take to reduce this risk:
However, making sure keywords are easy for investigators to remember is just as important as making sure keywords are hard to crack. Some encryption software does not provide a forgotten password feature. If this happens, research data can be deleted. To anticipate this problem, investigators are advised to use a password manager application such as LastPass. Password manager apps like these can help investigators create complex random passwords and store them in encrypted investigators’ accounts.
Avoid Data Loss
To protect data from the risk of loss, investigators should consider backup data stored in a separate place. Data backups can be stored either via the cloud or the research institution’s servers. Software for backing up files such as SyncBack can also be used.
In addition, investigators should periodically use an antivirus program to avoid data corruption.
To ensure that all data has been deleted, even from the bin, investigators are advised to use third-party software that provides data erasure services (eg Eraser, WipeDrive). Physical data must also be deleted when it is no longer needed. Using a cross-cut shredder is preferable to using a strip-cut shredder as it is more difficult to identify. The research team should also consider the appropriate time to remove the research subject’s PII.
Informing the Data Security Protocol to KEP
In applying for ethics approval, the research team is required to provide data security protocol information. The information should at least contain the following:
Download this guide: Data Security Protocol Guide