This study investigates occupational biases in Chinese language models (C-LLMs) by systematically analyzing correlations among variables like occupation, gender, age, education, and region. Statistically significant associations were found between occupation and gender, age, and educational background, but not with region, highlighting occupational stereotypes influenced by these variables. The study also evaluated gender bias in model outputs, discovering preferences in gender representation and underrepresentation of certain demographics. Age distribution analysis showed a tendency to favor younger individuals in certain fields, while educational backgrounds reflected a bias towards professions requiring higher education. Moreover, regional data showed uneven coverage and application of C-LLMs, with significant biases toward economically developed regions. The study underscores the systemic nature of these biases, tracing their roots through model training, evaluation, and deployment processes, emphasizing the need for more diverse training datasets to ensure fair and representative outputs across social dimensions.
Source link