Reflection to reading

Images are inherently complex and open to multiple interpretations, depending on the cultural background, personal experiences, and values of the viewer. Yet, when images are introduced into machine learning datasets, they must be simplified into fixed categories, such as woman, cat, or criminal. This process of labeling acts as a reduction of reality: it compresses the richness and ambiguity of an image into a single, narrow meaning.

The very structure of the taxonomy used to organize categories also reveals the biases of those who create it. As highlighted in the reading, ImageNet once included a category for “Hermaphrodite” bizarrely situated under Person > Sensualist > Bisexual, alongside categories like “Pseudohermaphrodite” and “Switch Hitter.” Such arrangements are not neutral but instead reflect and reproduce cultural prejudice.

This leads to the question of who holds the power to label. Ultimately, those who design and deploy these systems decide how categories are structured, while annotators merely carry out the task of labeling according to those rules. It is sad that the people depicted in the images themselves have no control over how their identities are defined or represented within the dataset.

The social consequences of this labeling process are significant. When datasets carry sexist, racist, or class-based biases, machine learning models internalize them and reproduce them in real-world applications. For example, Black men are more likely to be labeled as “criminals,” a bias that can feed into discriminatory practices in judicial systems using facial recognition technologies. Beyond perpetuating bias, these models create the illusion of objectivity. A machine-generated label often appears to be neutral or factual, but in reality it represents just one interpretation, shaped by the assumptions built into the system. This illusion hides discrimination even deeper.

Training & Coding Exercise

Training the model

I wanted to train a model to distinguish between orange juice and lychee juice.

I started by taking photos of orange juice bottle and lychee juice bottle with themselves, from different angles and distances from the camera, and the outcome is pretty accurate. But I notice that there were moments when there’s no juice in the camera, while the model didn’t have an option to say that. So I decided to add another class called “no juice”.

image.png

image.png

maybe need another class

maybe need another class

image.png

So I took a lot of pictures of me in front of the camera or leaving the background in front of the camera. Then I tried, and noticed that there were still some confusions, and I realized that I should add some photos of me with the juice bottles into the juice classes. After doing that, the model becomes very accurate as what I expected.

not-very-accurate version

not-very-accurate version

81cda596fb675182a7d6ee4aad2e9a72.mp4

Putting into p5.js

https://editor.p5js.org/yc5965/full/06eZbl8b2

728a320d3860213ce8d1bb2bff588481.mp4

/ test the model with some random objects :)