Data Issues
Deleted Data
Causes: Whether inadvertent or automated, data loss can occur quickly and be widespread.
Best Practices: Implement retention and archiving policies along with regularly scheduled data "snapshots" for recovering lost files and folders.
Corrupted Data
Causes: Certain file types may become unusable due to numerous read/writes or data transfers.
Best Practices: Have multiple and up-to-date copies, versions and locations available for data.
Undocumented Data
Causes: Often the result of poor naming and/or versioning conventions.
Best Practices: Set guidelines for how files and folders are named and their specific locations.
Unsupported Data
Causes: File types may be incompatible with compute and storage resources.
Best Practices: Avoid multiple, smaller files and consider compressed, larger archives (i.e., zip files) for data once analysis steps have been completed.
Security and Privacy
Data in the Wrong Hands
Causes: Often more of a matter about managing permissions rather than security breach incidents.
Best Practices: Establish identity and access management to ensure only approved personnel have access to restricted data.
Data are Non-Compliant
Causes: Data are not correctly classified.
Best Practices: Ensure data types are in appropriate, secure locations for specific data classifications.
Platform Issues
Unexpected Expenses
Causes: Budget limitations can delay or halt ongoing research.
Best Practices: Know and anticipate costs up front as well as accommodate for scaling up compute or storage needs.
Unavailable Data
Causes: May be a matter of not having the correct privileges to access certain files or folders.
Best Practices: Establish identity and access management for allowing approved personnel to have access to the data they need to accomplish their work.
Unacceptable Performance
Causes: Unrealistic demands for how long computing steps will take to complete.
Best Practices: Set expectations for the durations necessary to process data and then factor the time frames into analyses.