Course Introduction
This training is required to receive de-identified, individual-level data from the Hawai‘i Data eXchange Partnership (DXP). Modules were developed to improve data users’ awareness and understanding of what is expected of them when handling DXP data with regards to:
- Data Use
- Data Privacy
- Data Security
The Importance of DXP Data Governance and User Training
A strong data governance program is the foundation of the DXP’s successful and continued operations. Data governance is the means by which the DXP is managed, including the development and enforcement of policies and processes that guide decisions around how data is handled and used. It is this careful management that allows DXP partner agencies to feel confident in sharing their data.
This training is an important aspect of DXP data governance. It is critical that anyone who handles data from the DXP does so responsibly and respectfully to protect individuals’ right to privacy and confidentiality. This training ensures that all requestors of de-identified, individual-level data have the same foundational understanding of how to handle DXP data.
To receive a certificate of completion, you must pass all three quizzes in the same sitting. If you navigate away from this page in the middle of the session, your past quiz results may disappear.
Module 1: Data Use
Module 1: Data Use
The Hawai‘i Data eXchange Partnership (DXP) is a partnership of five state agencies committed to cross-agency data sharing to improve education and workforce outcomes in the state. The DXP is managed by Hawai‘i P-20 Partnerships for Education (Hawai‘i P-20).
The DXP contributes to, and collectively governs, Hawai‘i’s Statewide Longitudinal Data System (SLDS) which links cross-agency information on citizens of Hawai‘i from infancy, early learning, K-12, postsecondary education, and the workforce.
The records connected in the SLDS help the state to better understand how well populations transition across the partner agencies. Through analysis of the data, either by Hawai‘i P-20 and/or researchers, trends can be found that highlight early indicators and/or later outcomes of populations of individuals that progress through education and into Hawai‘i’s workforce. This provides valuable information that support data-informed decisions to reduce achievement gaps and create policies and programs that improve equitable outcomes for Hawai‘i residents.
While data from agencies are shared for the SLDS, the ownership of the data remains with the originating agency. The DXP’s governance is built on the premise that access to any shared data will be handled in accordance with the DXP Data Governance Policy. This policy guides the protection and use of the shared data. As a requestor of DXP data, you are expected to ensure that any data you receive is handled properly. This will ensure that the DXP can continue to support data requests that support the mission and vision of the DXP partners in order to benefit the residents of Hawai‘i.
The DXP Data Request Process plays an important role of ensuring transparency to partner agencies (aka data owners) by allowing partners to maintain control around the use of their data. A data request must be submitted for any data from the DXP. Through this submission, you are agreeing that you will fulfill all requirements of the request process until the project has been completed.
Your responsibilities after submitting a data request includes:
- Suppressing small cells in all data views
To protect individual privacy and confidentiality, data from the DXP may only be reported in aggregate format with small cell sizes suppressed. Complementary suppression must also be used to prevent viewers from working backwards to reverse engineer the redacted cells.
*More on suppressions to be covered in the next module* - Submitting data products for data owner(s) approval
To keep data owners informed of what their data is being used for, you must submit your data products prior to sharing it with anyone other than data users listed on the data request form. Data owners are not looking to censor findings, but must ensure cell suppressions are properly deployed to protect individual confidentiality and privacy.
Data users are individuals who have submitted their training certification and confidentiality forms related to the approved data request to the DXP.
Data products are defined as tables, graphs, or other types of visualizations that describe the quantitative results of the data request and research questions. - Destroying data after project completion
To prevent inadvertent exposure or access to datasets, you must destroy individual-level datasets once the project is completed. While datasets are considered to be de-identified, the data are still considered to be sensitive in terms of risk management. Signed certifications of data destruction are required of all individual-level data requests.When possible, informing the DXP when products will be publicly disseminated is greatly appreciated. Stakeholders may see the publicly released findings and start asking questions to the data owners. Notification will help prepare data owners so they have context if they are contacted.
Depending on the situation, you may need to take additional actions during the data request process if:
- Adding new data users to the project
If more data users need to be added after the data request has been approved, the DXP must be notified at dxp@hawaii.edu (or specific DXP contact if known). Both the training certifications and signed confidentiality agreement for the new data user(s) must be provided.
The DXP will notify you if/when the new data user(s) may access the data. Prior to approval, data may not be shared with new data user(s) even if their forms have been submitted. - Amending the data request
If changes to the project are needed, a project amendment form needs to be submitted to the DXP. Data owners must review and approve the changes before the amendment can take effect. Changes may include: 1) extending the project end date; 2) adding data elements; 3) adding additional audiences to share the data with.
You are expected to use data from the DXP responsibly and ethically, following all terms and conditions listed on the DXP Confidentiality Agreement. Even when datasets are released to you for analysis, the data still belongs to its original data owners. As such, data owners maintain the right to require data destruction at any time without cause, but especially if they feel their data is being mishandled.
Responsible handling and use of DXP data means:
- Maintaining the privacy and confidentiality of all records. This includes:
- Using data only as noted in the approved data request;
- Not attempting to re-identify individuals in the dataset;
- Password-protecting files as necessary;
- Following all data suppression requirements; and
- Not sharing data with anyone other than approved data users.
- Maintaining communication with the DXP. This includes:
- Submitting all data products for the DXP review and approval prior to sharing any results;
- Notifying the DXP of any data user changes (adding/removing staff);
- Responding to check ins and following any instructions received from the DXP;
- Clarifying any data questions with the DXP to ensure data is not misrepresented; and
- Submitting additional data request forms as needed to document changes to approved requests (e.g., project amendments, data destruction).
The DXP actively follows up on cases of unauthorized release of and/or access to data received from the DXP. The DXP Data Misuse Process outlines different types of unauthorized releases and the general process of investigation.
Categories of misuse:
- Inadvertent Misuse
Any incident where involuntary, unauthorized access/release of data has occurred. Examples of inadvertent misuse:- A computer infected with malicious software; or
- A stolen computer.
- Intentional Misuse
Any incident where the requestors knowingly allowed access and/or released data beyond the scope of their approved data request. Intentional misuse examples:- The dataset(s) was shared with individuals other than the data users listed on the request;
- The data was saved on a shared drive (e.g., google drive) where other individuals can have open access to it; or
- Requestor tried to re-identify individuals in the dataset or used the dataset for another project that was not approved through the data request process.
- Refusal to Comply
Any incident where the requestors refuse to comply with instructions from the DXP. Refusal to comply examples:- Requestors refusing to return and/or destroy datasets; or
- Requestor does not respond to check in/follow up communications from the DXP.
Consequences of verified data misuse are determined by the DXP based on the severity of the incident. Consequences may involve, but are not limited to:
- Requiring requestors to take additional security training;
- Requiring requestors to return/destroy all data provide by the DXP; or
- Banning requestors (and possibly their entire organization) from receiving data from the DXP for no less than five years.
-
True or False: Consequences of verified data misuse can include banning requestors (and possibly their entire organization) from receiving data from the DXP for no less than five years.
-
What must you do to maintain the privacy of the DXP data?
-
True or false: Once you receive data from the DXP, you are allowed to use the data for any project.
-
Maintaining the communications of the DXP data must include:
-
Data owners are:
-
As part of the data request process, you must:
Module 2: Data Privacy
Module 2: Data Privacy
Datasets provided by the DXP are considered to be de-identified, but the data is still sensitive and only approved data users may see the individual-level data. In rare cases, combinations de-identified data elements may make it possible to reconstruct the identity of an individual, especially with small datasets or datasets with many disaggregations. Since datasets are at an individual-level, additional protections are needed to protect individuals’ privacy and confidentiality when releasing data to a wider audience. All data to be shared beyond data users must be in aggregate format, with small cell suppressions as needed.
Agency data are constantly being released through different analyses and reports. While direct identifiers (e.g., name, birth date) are generally not released, the more indirect identifiers (e.g., gender, race/ethnicity) viewers can put together, the more likely it is that they could reasonably identify students in the data. Using appropriate disclosure avoidance methods will help prevent such situations.
As a reminder, DXP data are to be used to highlight trends that impact groups or cohorts of students, not single individuals. While it may be interesting to continue to narrow down data results to the smallest disaggregation (e.g., SAT scores of male, Native Hawaiian, economically disadvantaged graduates from ‘Aiea High School in 2024), students’ right to privacy and confidentiality must be protected.
The most commonly used disclosure avoidance method used by Hawai‘i P-20 and users of DXP data is small cell suppression, or removing all data that are lower than the minimum acceptable limits.
Based on the small cell size rule, cells less than 5 should be suppressed as shown in the example below using economic disadvantage status (econ dis).
Unsuppressed Data | Data with Small Cell Suppression | |||
Economic Status | Students | → | Economic Status | Students |
---|---|---|---|---|
Econ Dis Prior to High School | 3 | Econ Dis Prior to High School | * | |
Econ Dis in High School | 12 | Econ Dis in High School | 12 | |
Never Econ Dis | 10 | Never Econ Dis | 10 | |
Total | 25 | Total | 25 |
However, depending on how your aggregations are set up, additional suppressions may be needed to prevent reverse engineering the missing values. In Example 1a, even if the small cell 3 is removed, it’s easy to figure out the missing number. Complementary suppression involves hiding data, even if it’s not a small cell, to prevent the reengineering of missing data.
Data with Small Cell and Complementary Suppression | |
Economic Status | Students |
---|---|
Econ Dis Prior to High School | * |
Econ Dis in High School | 12 |
Never Econ Dis | * |
Total | 25 |
The more data is disaggregated, the more complicated suppression becomes.
Unsuppressed Data | |||
Economic Status | Total Students | Gender | |
---|---|---|---|
Male | Female | ||
Econ Dis Prior to High School | 3 | 3 | 0 |
Econ Dis in High School | 12 | 5 | 7 |
Never Econ Dis | 10 | 5 | 5 |
Total | 25 | 13 | 11 |
Data with Small Cell and Complementary Suppression | |||
Economic Status | Total Students | Gender | |
---|---|---|---|
Male | Female | ||
Econ Dis Prior to High School | * | * | * |
Econ Dis in High School | 12 | 5 | 7 |
Never Econ Dis | * | * | * |
Total | 25 | 13 | 11 |
Econ Dis Prior to High School needed to be suppressed based on the small cell size and Never Econ Dis was removed as complementary suppression.
Some ways to show more data would be to remove totals, use percentages instead, and/or roll categories up to a higher level that meets the cell size minimum.
Data Rolled Up to Higher Categories to Avoid Small Cells | |||
Economic Status | Total Students | Gender | |
---|---|---|---|
Male | Female | ||
Econ Dis at Anytime | 15 | 12 | 7 |
Never Econ Dis | 10 | 5 | 5 |
Total | 25 | 17 | 12 |
Small cell suppression can be tricky depending on what level of detail is shown. It will eventually reach the point where very little usable data can be shown. There is a balance between the level of detail that can be shown and the usefulness of the information.
The DXP will check to ensure all suppressions are applied as needed during the data review process. Any questions around data suppressions can be sent to the DXP.
While suppression is the most commonly used method, there are other disclosure avoidance methods, such as the methods described below by the U.S. Department of Education Privacy Technical Assistance Center.
Blurring is used to reduce the precision of the disclosed data to minimize the certainty of identification. Examples of blurring include rounding, aggregating across different populations or geographies, and reporting percentages and ranges instead of exact counts. This method may affect the utility of the data by reducing users’ ability to make inferences about small changes in the data. Similarly, blurring methods that rely on aggregation across geographies or subgroups may interfere with time-series or cross-sectional data analysis. Applying this technique generally ensures low risk of disclosure; however, if any unblurred cell counts or row and/or column totals are published (or are available elsewhere), it may be possible to calculate the values of sensitive cells.
Perturbation involves making small changes to the data to prevent identification of individuals from unique or rare population groups. Examples of this technique include swapping data among individual cells (this still preserves the marginal distributions, such as row totals) and introducing “noise,” or errors (e.g., by randomly reclassifying values of a categorical variable). This method helps to minimize the loss of data utility as compared to other methods (e.g., compared to the complete loss of information due to suppression); however, it also reduces the transparency and credibility of the data. Therefore, perturbation is often considered inappropriate for public reporting of program data, from an accountability perspective. Applying this technique generally ensures low risk of disclosure, as long as the rules used to alter the data (e.g., the swapping rate) are protected. This requires securing the information about the technique itself as well as restricting access to the original data, so that perturbation rules cannot be reverse-engineered.
Both descriptions above come directly from the U.S. Department of Education Privacy Technical Assistance Center: https://studentprivacy.ed.gov/sites/default/files/resource_document/file/FAQs_disclosure_avoidance_0.pdf
-
True or False: De-identified, individual-level data are not sensitive and do not need strong protections.
-
The DXP suppression rule for educational data is < _ numerator and/or < _ denominator.
-
Complementary suppression means:
-
At minimum, how many cells need to be suppressed in the following example:
-
True or False: Other methods of disclosure avoidance include reducing the exactness of the data by introduce uncertainty of the actual counts of individuals reducing the possibility of identifying individuals.
Module 3: Data Security
Module 3: Data Security
It is your responsibility to protect all DXP data you receive to the best of your ability. Individuals should not request individual-level data from the DXP if they are unable to secure it. Below is a list of proactive measures to help secure personal devices and accounts and tips on following safe online behavior to protect yourself against online cyber threats, adapted from of the University of Hawai‘i’s Information Security Awareness Training (ISAT).
Recent surveys have shown that at least 80% of breaches were caused by users not practicing cyber hygiene. Everyone must be aware of these threats and stay vigilant.
Cyber Hygiene Best Practices
- Use Anti-Malware/Antivirus Software and Host-Based Firewalls
- Install anti-malware/antivirus software and ensure the software is regularly updated.
- Most modern operating systems include built-in firewalls, which are commonly referred to as Host Based Firewalls. Host Based Firewalls run on your device and provide an additional layer of protection from network cyber attacks.
- Update Regularly
- Enable automatic updates.
- Software updates can be for operating systems, firmware, patches, and security fixes.
- Enable Multi-Factor Authentication (MFA)
- Enable MFA on all applications and websites when offered.
- MFA comes in many forms such as push, text, voice call, or hard token.
- MFA attacks are when you receive unsolicited authentication approval requests. If you are receiving these, you should immediately change your password.
- MFA fatigue is when you receive multiple authentication approval requests and assume each request is legitimate. Be vigilant and only approve requests you initiate.
- Create Strong Passwords
- Use uppercase letters, lowercase letters, numbers, special characters, and if allowed, spaces
- Meet minimum password lengths of at least 8-32 characters long with MFA or 14-32 characters long without MFA.
- Avoid using:
- Common phrases, famous quotes, and song lyrics.
- Personally identifiable information such as birthdates, mother’s maiden name, or where you were born.
- Do NOT share your passwords and do not write them down on paper or in digital form unless they are stored in a password manager.
- More on password managers: https://www.hawaii.edu/infosec/resources-tips/password-manager/
- Don’t reuse passwords.
- Change the default passwords that the manufacturer places on devices such as routers, printers, internet cameras, etc.
- Use Encryption
- Data stored on desktops, laptops, and removable storage media (USBs, external hard drives, and CD/DVDs) should be safeguarded with encryption.
- Sensitive data shall be encrypted when stored and transmitted.
- When sending files, consider using the UH File Drop service (https://www.hawaii.edu/filedrop/) or other secure file transfer protocol (SFTP).
- For more information on encryption please visit the following link: https://www.hawaii.edu/infosec/resources-tips/encryption/
- Avoid Unknown Storage Media or Devices
- Do not plug in any “lost” or unknown storage media or devices into your computer.
- Lock Your Devices
- Whenever you step away from your device, lock the device so that a password is needed to regain access.
- Configure your device to automatically lock when it is inactive.
- Limit The Use of Administrative Accounts
- Use a non-privileged user account for normal day-to-day activities such as the internet and email.
- When you need to perform actions like installing or removing software, log in with a privileged (Administer) account, and then log out when done.
For other helpful hints on how to secure devices and user information, visit https://www.hawaii.edu/infosec/minimum-standards/cyber-hygiene/
Special Considerations for Laptops
Laptops give the owner the ability to work anywhere that has a Wi-Fi connection; however, public Wi-Fi should not be trusted as a cyber criminal could be eavesdropping on online activity. Laptops are also convenient targets for thieves because of their size and weight.
Laptop Security Best Practices
- Keep all software and operating systems up to date and patch when security fixes are released.
- If you are using a laptop to analyze DXP individual-level data, ensure that the device is full disk encrypted and the files are secured using container encryption.
- Information on Windows encryption: https://www.hawaii.edu/askus/927
- Information on macOS encryption: http://www.hawaii.edu/askus/676
- Do not leave your device unattended even for a brief period of time.
- Physically secure your laptop when it is not in use.
- Enable location tracking if available.
- Enable auto-lock on your device screen.
- Do not connect to the internet through public Wi-Fi.
More information on “Best Practices for Laptop Users” can be found here: https://www.hawaii.edu/askus/927
The cyber security landscape is constantly changing. Noted below are two of common types of cyber threats and tips on avoiding them, courtesy of the University of Hawai‘i.
Phishing
Phishing is a method used by cyber criminals to acquire personal data, such as passwords, bank accounts, Social Security Numbers (SSNs), etc., by masquerading as a legitimate business, government entity, or reputable person. The usual method is to send an email where they are trying to have you click on a link or download an attachment (and open a malicious file). However, while phishing emails are common, phishing can come in other forms such as text messages and phone calls/voice mails.
Be wary of messages that ask you to log in and change your password. Any unsolicited email or phone call requesting you to do so should be treated as a potential phishing attack.
Tips to Protect Against Phishing Attempts:
- Never open suspicious or unknown links or attachments, or scan QR codes in emails.
- Be aware if an email is poorly worded with misspellings. This is a common indicator of phishing.
- If a known contact sends you a suspicious email or text message (spoofed) and you would like to verify authenticity, you should contact the person through official methods of communication, such as their office phone number.
Learn more about phishing at https://www.hawaii.edu/infosec/phishing/
Ransomware
Ransomware is a type of malicious software (malware) that aims to disrupt a computer system by encrypting all of the files on that system. This renders that system useless and can only be recovered by paying the ransom demanded by the cyber criminals to decrypt these files. Once one system is infected, cyber criminals will also attempt to propagate the malware to other systems on the same network, causing further disruption.
Systems are commonly infected with ransomware by users unintentionally installing the ransomware by clicking on a link (in a phishing email) or by opening an attachment that contains malicious code. Due to this, it’s important to be aware of tips and best practices to follow to prevent ransomware from infecting your systems.
Tips to Prevent Ransomware:
- Ensure anti-virus / anti-malware software is up to date.
- Keep all software and operating systems up to date and patch when security fixes are released.
- Avoid clicking on unknown / suspicious links.
- Avoid downloading software from unofficial/untrusted sources.
- Never plug in removable media (USBs, external hard drives, etc.) from an unknown source.
For a more in-depth guide please refer to: https://www.cisa.gov/sites/default/files/publications/CISA_MS-ISAC_Ransomware%20Guide_S508C_.pdf
Transmission of Data
The University of Hawai‘i FileDrop site, a secure file exchange service, will be used to transmit individual-level data to you. Individual-level data must never be shared via email, even if de-identified.
Secure Deletion of Data
Per UH Information Technology Services: Using a computer’s standard “delete” function is not sufficient to permanently erase sensitive information. When you delete a file or folder using the “Recycle Bin” or “Trash”, your operating system simply flags the contents of the file/folder to be overwritten and re-used in the future. Until that space is overwritten, the contents of the file still exist on the media and can be recovered with disk/file recovery tools that are readily found on the Internet. With today’s large hard drives, the contents of these “deleted” files can remain, unchanged, on the media for a very long time.
All DXP data files, including individual-level datasets, unsuppressed aggregate data, etc., that have not been reviewed and approved by the DXP must be securely deleted upon the completion of the data request project.
Electronically Stored Data
Electronically stored data are files that have been saved to a computer hard drive, remote server, etc. Such data must be erased using secure deletion tools which overwrite the file(s) to be delete to make them unrecoverable.
For more information about available secure deletion tools: https://www.hawaii.edu/askus/706
Hard Copy (Paper) Data
Paper documents and printouts containing individual-level or unsuppressed DXP data must be shredded before disposal, ideally using a crosscut shredder.
-
True or False: About 80% of breaches were caused by users not practicing cyber hygiene.
-
Which of the following are considered cyber hygiene best practice(s) which help protect your computer against unauthorized access?
-
True or False: One strong and complex password is safe to reuse on multiple sites.
-
True or False: Individual-level data from the DXP can be shared via email because it has been de-identified.
-
After project completion, which of the following must be securely deleted to ensure no sensitive information remains on your system?
Module Review
Quiz | Status |
---|