Training and Test Dataset (TTD)

AI systems based on ML need training and test data in order to train and verify the systems for the intended behaviour. The training and test data are fit to the intended behaviour with respect to data type and quality. Training and test data should be validated for their currency and relevance for the intended purpose. The amount of training and test data required will vary based on the intended functionality and complexity of the environment. The training and test data should have sufficiently diverse features in order to provide strong predictive power for the AI system. Training and test data cannot be available in the company and has to be sourced externally. Data quality has to be ensured also in that case.
Controls related to this risk category are listed as below:
  • TTD 01 - Data Management Procedures
  • TTD 02 - Data Collection Assessment
  • TTD 03 - Dataset Governance Policies
  • TTD 04 - Dataset Annotations and Labels Information
  • TTD 05 - Dataset Cleaning, Enrichment and Aggregation
  • TTD 06 - Dataset Description, Assumptions and Purpose
  • TTD 07 - Dataset Transformation Rationale
  • TTD 08 - Dataset Bias Identification and Mitigation
  • TTD 09 - Dataset Bias Analysis Action and Assessment