Skip to content
Machine Learning Workflow

Machine Learning Workflow

This document summarizes the machine learning workflow.

1. Machine Learning Workflow

[Figure 1] Machine Learning Workflow

[Figure 1] Machine Learning Workflow

1.1. Data Preparation

Data preparation is the process of preparing data for model training and validation.

  • Data transformation : Transforming data into a form that is easier to work with and reloading the transformed data.
  • Data cleaning : Removing or correcting inaccurate data.
  • Data normalization : When some features have very large variance, scaling them to a range such as 0–1 so that one feature does not dominate learning and others are reflected properly.
  • Data featurization : Extracting features from data for use in the model. Often existing fields are used as features; it also includes creating new features that are not present in the raw data but are derived from it.
  • Data validation : Final checks before using featurized data—typically type, range, and shape.
  • Data split : Splitting validated featurized data into training, validation, and test sets. A common split is roughly 60% training, 20% validation, and 20% test.

1.2. Model Training

Model training is the process of building, training, and validating a model using the prepared data.

  • Algorithm selection : Choosing which machine learning algorithm to use.
  • Model hyperparameter tuning : Configuring the model based on the chosen algorithm and deciding hyperparameter values.
  • Model training : Training the configured model to learn its parameters, using the training split from the data split step.
  • Model validation : Evaluating the trained model on the validation split to review metrics such as accuracy and performance and check whether requirements are met.
  • Model testing : Evaluating how the validated model behaves on data not used for training or validation, using the test split.

1.3. Model Deployment

Deploying and monitoring the model after testing is complete.

  • Model deployment : Deploying the tested model into production.
  • Model monitoring : Tracking deployed model metrics such as accuracy and performance.
  • Model retraining : Retraining the model when monitoring indicates it is needed.

2. References