Training and Test Dataset (TTD)

AI systems that are based on ML require both training and test data in order to successfully train and validate the systems for the behaviour that was intended. In terms of data variety and quality, the data used for training and testing are appropriate for the behaviour that was intended. The data used for training and testing ought to be checked to ensure that they are up to date and pertinent to the task at hand.
The amount of training and test data that will be necessary will change depending on the complexity of the environment and the functionality that is intended to be implemented. In order for the AI system to have a high level of predictive power, both the training data and the test data should have a significant amount of variety in their features. Data for training and testing cannot be found within the business; instead, it must be obtained from outside sources. In that scenario, ensuring the integrity of the data is also necessary.
Controls related to this risk category are listed as below:
  • TTD 01 - Data Management Procedures
  • TTD 02 - Data Collection Assessment
  • TTD 03 - Dataset Governance Policies
  • TTD 04 - Dataset Annotations and Labels Information
  • TTD 05 - Dataset Cleaning Enrichment and Aggregation
  • TTD 06 - Dataset Description Assumptions and Purpose
  • TTD 07 - Dataset Transformation Rationale
  • TTD 08 - Dataset Bias Identification and Mitigation
  • TTD 09 - Dataset Bias Analysis Action and Assessment