This is part of my notes taken while studying machine learning. I’m learning as I go along, and may update these with corrections and additional info.
Since the data in our world often holds errors and biases, the models we build will home these same errors and biases.
Example
Building a model to predict who is the best candidate for job, based on existing data.
Problem
If there are biases in the historical hiring patterns bases on gender, race, age, etc — then these biases are going to be held by any predictive model that is built with this data.
The bias held by humans will transfer to the data so long as humans are the creator of that data.
Human Created Data:
- Images
- Text
- Speech
- Video
Each of these are ingested by computers to make data-based decisions. And all are generated by humans.
There are always potential biases in models created from human generated data. Real world validation of any model is more important than statistical validation.
You must be logged in to post a comment.