Model Card Template
Use this template to document your ML models following the standard model card framework established by Mitchell et al. (2019). Each section includes guidance on what to include.
1. Model Details
The official name of the model, including version identifier.
Person or organization that developed the model. Include contact information.
Date the model was developed or last updated (YYYY-MM-DD).
Version number or identifier. Use semantic versioning if applicable.
Architecture type (e.g., transformer, CNN, random forest) and training algorithm.
License under which the model is released (e.g., Apache 2.0, MIT, CC-BY-4.0).
Paper or resource to cite when using this model.
2. Intended Use
The primary intended use cases for the model. Be specific about tasks, domains, and deployment contexts.
Who is the model designed for? (e.g., researchers, developers, end-users, specific industries).
Use cases that the model is explicitly not designed for or should not be used for. Include known failure modes.
3. Factors
Demographic or phenotypic groups, instrumentation, and environment factors relevant to model performance.
Factors that were explicitly tested during evaluation. List the groups and conditions evaluated.
4. Metrics
Metrics used to evaluate the model (e.g., accuracy, F1, AUC-ROC, BLEU). Justify each choice.
Thresholds used for classification decisions and the rationale for choosing them.
How performance variation is measured (e.g., confidence intervals, standard deviation across folds).
5. Evaluation Data
Names and descriptions of evaluation datasets used. Include size, source, and any filtering applied.
Why these datasets were chosen. How do they represent the intended use cases?
Preprocessing steps applied to evaluation data (e.g., tokenization, normalization, augmentation).
6. Training Data
Names and descriptions of training datasets. Include size, source, collection methodology, and time period.
Preprocessing and data augmentation steps applied during training.
Origin and chain of custody for training data. Include any licensing or consent information.
7. Quantitative Analyses
Overall performance metrics on the evaluation dataset(s). Report all metrics listed in Section 4.
Disaggregated performance across relevant factors and their intersections. Report metrics for each subgroup.
8. Ethical Considerations
Known or potential harms from model use, including allocation harms, quality-of-service harms, and representational harms.
Particularly sensitive applications where extra caution is needed (e.g., healthcare, criminal justice, hiring).
Steps taken to reduce potential harms (e.g., debiasing techniques, output filtering, human-in-the-loop).
9. Caveats and Recommendations
Known limitations of the model. Include scenarios where performance degrades or the model is unreliable.
Recommendations for model users, including suggested monitoring, testing before deployment, and update cadence.
Areas for improvement, planned updates, and research directions.
Learn more: Read about evaluation methodology to understand how to select metrics and design evaluation protocols for Sections 4-7.