Development Set vs Test Set

202502012206
tags: #machine-learning #data-splitting #evaluation

Development (validation) set is used for model selection and hyperparameter tuning, while test set provides final unbiased performance evaluation.

Development Set:

Used to compare different models and algorithms
Guides decisions about Feature Engineering and Regularization
Can be "looked at" multiple times during development
Typically 20-25% of available data

Test Set:

Used only once for final evaluation
Provides unbiased estimate of real-world performance
Should never influence model development decisions
Typically 20-25% of available data

Key principle: Your test set should reflect the data distribution you expect in production. If dev and test sets come from different distributions, you may have Data Distribution Mismatch.

Using a Single Number Evaluation Metric on both sets helps make clear comparisons between models.

Never use test set performance to make model decisions - this leads to overfitting to the test set.

Reference

Machine Learning Yearning by Andrew Ng