Lab Report – Sadia's Site

Who gets to be what?
A Demographic Analysis of AI-Generated Images.

This lab report will investigate the presence of demographic bias in AI-generated images while we only focus on three very different professions (pharmacist, babysitter, and mechanical engineer). We believe that AI image generators do not reflect the actual demographic makeup of the U.S. population and instead overrepresent certain groups while actively underrepresenting others.
Using Rabbithole, we generated 100 images of each profession. Then, we categorized them based on age, race, and gender and compared these outputs to current U.S. demographic statistics to prove our hypothesis. Doing that, our findings indicated that AI-generated images do not accurately reflect real-world diversity instead, they frequently reinforce stereotypes. These results suggest a need for more inclusivity in the datasets these AI-image generators use.

Artificial intelligence image generators, like artificial intelligence itself, have been taking over the modern world. These AI image generators have quickly become powerful tools that are used to create very realistic results from just a simple text prompt like “pharmacist”, “babysitter”, “mechanical engineer”, etc. But, as the use of these tools grows rapidly, so do the concerns about the bias in the outputs that these models are producing. Bias in AI-generated images can appear in different shapes or forms, including here underrepresentation or stereotypical depictions of race, gender, age, etc. These outcomes reflect the data that trains these models, data which often lacks diversity, data that reflects social inequalities that are already embodied in online media. As a result of this, AI systems purposefully or not, may reinforce stereotypes and marginalize certain groups even if they claim to offer “creative freedom” or “technological neutrality”.
Yiran Yang (2025), writing for the journal AI & Society, explored racial bias in AI-generated images and provides compelling visual examples of how image generation systems often reflect a narrowed or stereotyped view of cultural and racial diversity. Her article critiques the tendency of AI tools to marginalize other racial identities while favoring White or East Asian features. Yang’s background as a technology law researcher and a woman of Asian descent adds depth to her perspective because she is not only analyzing just the technical dimension of bias, but also its social and cultural implications. Her work is particularly relevant to our lab experiment because just like Yang, we are analyzing AI-generated portraits for demographic bias and are evaluating if these systems present a narrow view of gender, age, and/or racial diversity compared to the actual U.S population demographics. Her article supports are hypothesis that these biases are systematic and deeply rooted in the design choices and data behind generative AI. While Yang focuses on racial and cultural representation in creative image generation, we also wanted to find something more technical with high stake domains, which can affect people in real life. A peer-reviewed article by Yetisgen-Yildiz and Yetsigen (2024), published in Diagnostic and Interventional Radiology, examines how AI is used in medical imaging also shows bias, leading to often producing less accurate diagnostic results for the more underrepresented groups. The article presents different examples of skewed performances when AI systems were trained on more non-diverse datasets, which led to misdiagnoses and overlooked health conditions in patients of color. Even though this research promotes a more medical/health care perspective, it doesn’t fail to reinforce a key point very relevant to our experiment: the input data used to train AI systems ultimately shapes the fairness of the output. Whether it is generating portraits or reading radiology scans, Ai tools more often than not carry forward the same inequalities unless datasets and design strategies are actively balanced and inclusive.
Finally, to further support this argument, a 2024 study published in the Journal of Family Medicine and Primary Care highlights that AI-generated images inaccurately portray the demographics of surgical specialists. The authors, who are all AI and medical researchers, analyzed outputs from DALL-E 3 and found a significant underrepresentation of women and black individuals when prompted with roles like “microsurgeon” or “plastic surgeon”. Like Yang’s study, this work reflects how bias in AI-generated imagery extends beyond entertainment or marketing into critical areas like medicine, where representation has tangible impacts. This study is perfect for our lab report because it focuses on a single AI system, its structured methodology and alignment with broader concerns about AI fairness make it a valuable addition to understanding bias in generated visuals. Together, these sources provide a well-rounded foundation for our lab report, which investigates the visual representation of race, gender, and age in AI-generated images. Our experiment builds on these studies by comparing actual U.S. demographic data to the outputs of leading image generators. The research reviewed here suggests that AI tools, unless intentionally corrected, reflect and potentially amplify societal bias—making this an urgent issue for developers, users, and policymakers alike.

Our Hypothesis:

AI image generators do not reflect the actual demographic makeup of the U.S. population and instead overrepresent certain groups while actively underrepresenting others, particularly in the context of race, age and gender.

Materials and Methods

Materials:
AI image generator: RABBIT HOLE
Prompt list: “Pharmacist,” “Babysitter,” “Mechanical Engineer”
Spreadsheet or data collection software
U.S. Census demographic data (2023 or latest available)
Data visualization tools (e.g., Google Sheets, Excel, Canva)
Methodology:
We selected three professions that vary across stereotypes in gender, age, and race.
For each profession, we prompted the AI generator using only the job title (e.g., “pharmacist”) 20 times to gather a diverse sample of images.
Each group member analyzed the images based on three criteria: race, gender, and age group (child, young adult, middle-aged, senior).
We recorded the perceived demographics in a spreadsheet and compared these results to U.S. Census data.
We also documented example images that clearly represented bias or over/underrepresentation.
Finally, we used graphs and charts to visualize trends and determine whether the images aligned with real-world demographics.

Results (1–2 pages, include images/tables/graphs)
The data supported our hypothesis, as our experiment showed that the AI did not reflect real-world diversity. It mostly showed what society expects—for example, male engineers and female babysitters. This is exactly what we thought would happen if we tried to generate images. It shows that AI tools can copy unfair patterns they’ve seen in their training data. The biases that showed up the most were related to gender and age. Out of the 100 images that were generated for “babysitter,” almost all showed women between their 20s and 40s. Only four male figures appeared—three men and one young boy—and even then, their faces were not clearly shown. Some of these male images looked distorted or unrealistic, with strange body parts or blurry features.
Example 1: One image shows a man with three kids in a messy room. It looks very cartoon-like, and the children don’t look realistic.

Example 2: Another image shows a young boy holding a baby. The room is also messy, with toys and books everywhere.

One surprising thing was how messy all the babysitter images were. Even though we didn’t mention the mess in the prompt, most images had bottles, toys, and clothes scattered on the floor. This shows that the AI has learned to connect babysitting with messiness, which is a kind of bias. Some images also had errors, like extra fingers or unnatural-looking people. This shows that the AI still struggles with creating realistic images—especially when showing people who don’t match the usual patterns it has seen in training. Across all three job categories, the images showed limited diversity. Most people were young, and light skinned. There were almost no older adults, no visible disabilities, and very few darker-skinned individuals. These results show that AI image generators are not fair or neutral. They repeat stereotypes about gender, age, and race, just like the studies we read about (like Yang, 2025, and Yetisgen-Yildiz, 2024). This matters because these tools are being used more and more, and if we don’t fix this ongoing conflict, they can continue to spread bias in new ways. This reinforces how AI reflects patterns it has seen before, even when they are biased.

On the other hand, while generating images for Engineering, out of the 100 images, the majority were white males over the age of 30. The data included approximately 60% white engineers and over 70% were male. The most common age group was 30s to 50s. There were also other races present such as East Asians, south Asians, Hispanics, and Middle Easterns but all of these in a very small amount. The percentage of females was also very low with only approximately 10%. There were also a lot of images that were just not applicable because they showed things that were not human-like robots fully

Discussion (1+ pages)
Our results show that AI image generators often fail to represent real-world diversity. This means they reinforce stereotypes instead of challenging them. For example, most images of babysitters showed young women, while engineers were mostly men. analyzing the results show clear patterns of bias in the data. Most of the images showed white engineers and mostly male. The most common age group was also people in their 30s-50s. All this just reinforces the stereotypes of engineers being white men who are of that age. Even though the generator had inputted some people of different races and different genders, it was still so very low compared to the percentage of white men. The number of women was especially low which matches the stereotypes about engineers that people have in society.
Another shocking thing was the image generator generated a lot of images that were non-human including robots and very abstract looking figures who didn’t fit any of the gender, race and age categories. These results support the idea that ai image generators often reflect the same bias found in their data. So, without proper development, AI will continue to push the stereotypes that we already have for who belongs where in the world.
This supports my original hypothesis that the AI would reflect existing gender and racial biases. One thing that stood out was how few older adults or people with physical differences appeared in the images. It was almost like the AI “erased” them completely. This is concerning because if people rely on AI-generated images for learning or creative projects, they might get a false idea of who belongs in certain jobs or roles. It could influence how we think about careers and who is “fit” for them, even if that view is unfair or inaccurate. Another unexpected result was how the images of men—especially male babysitters—were either blurry, distorted, or looked cartoonish. This suggests that the AI might not have enough diverse examples of men in caregiving roles in its training data, which could explain why it struggled to create realistic images in those cases. These results confirm that AI is not neutral. It seems to follow patterns from online media and existing datasets, which are often biased themselves. That’s why the AI keeps repeating those same ideas, like showing a messy house when asked for a “babysitter” or defaulting to white men for engineering roles. If the results had been more mixed or unclear, I would’ve adjusted the study by using more AI platforms or prompts to see if the patterns still appeared. But since the patterns were so obvious, I didn’t need to change my hypothesis—it stayed the same throughout. To improve AI image generators in the future, we need to train them on more diverse, representative data. Developers should also regularly test and audit the models for bias. Otherwise, these tools will keep reinforcing unfair stereotypes, even when we expect them to be “creative” or “objective.”

Conclusion (1–2 paragraphs)
The purpose of this lab report was to explore whether AI image generators show bias when creating pictures of people in different jobs. We focused on three jobs: pharmacist, babysitter, and mechanical engineer. From the beginning, we hypothesized that the AI would show stereotypes—for example, men as engineers and women as babysitters. After analyzing 100 images, we found that the results matched our prediction. As we predicted when it comes to engineering that it will be mostly males in their 30s-40s and majority white in the engineering ai images. The percentage of women will be low. Therefore, when it came to babysitter The AI mostly portrayed young, light-skinned women as babysitters. There were very few older people or people with disabilities shown in the images. This tells us that AI is not showing real-world diversity. Instead, it repeats patterns it learned from biased data online. These results are important because AI-generated images are being used more and more in different fields. If they continue to reflect stereotypes, people might get a false or unfair idea of who can work in certain jobs. More research is needed to see how different AI systems behave, and how they can be improved. AI developers need to make sure their tools are trained on diverse data, so the results are fairer and more accurate for everyone.

References
(Format all in APA 7)

Yang, Y. (2025). Racial bias in AI-generated images. AI & Society, 40(2), 123–135. https://doi.org/10.1007/s00146-025-02282-1
Yetisgen-Yildiz, A., & Yetisgen, M. (2024). Bias in artificial intelligence for medical imaging: Fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagnostic and Interventional Radiology, 31(2), 75–84. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11880872/
Bias in AI-generated imagery of surgeons: An evaluation of DALL-E 3 outputs and demographic representation. (2024). Journal of Family Medicine and Primary Care. https://www.sciencedirect.com/science/article/pii/S0974322724005581

Click below to download the full essay with original formatting:

Final_Lab_Report_ (2)Download

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.