Code
library(tidyverse)
library(ggiraph)
library(glue)
theme_set(theme_minimal(base_size = 7))
An Analysis by State, Ethnicity, Education, Occupation, Strata, and Age Group
Zahier Nasrudin
January 26, 2023
To explore the gender pay gap in Malaysia & its various dimensions, including state, ethnicity, education, occupation, strata, and age group. Using data from the Department of Statistics Malaysia (DOSM), we will analyze and visualize the differences in median salaries between male and female.
It is vital to note that this blog post is not intended to be political & also does not take into account other factors that may be contributing to the gender pay gap, while in our analysis may show a clear disparity between men & women’s pay in certain dimensions, it is worth noting that there are many other complex factors at play that contribute to the gap
Note that each graph is interactive, so you can hover over each bar/line chart to see the exact values and other details.
This blog post is based on publicly available data from the Department of Statistics Malaysia (DOSM)
## Links of dataset
link_salary <- c("https://storage.googleapis.com/dosm-public-economy/salaries_state_sex.csv",
"https://storage.googleapis.com/dosm-public-economy/salaries_industry_sex.csv",
"https://storage.googleapis.com/dosm-public-economy/salaries_ethnicity_sex.csv",
"https://storage.googleapis.com/dosm-public-economy/salaries_education_sex.csv",
"https://storage.googleapis.com/dosm-public-economy/salaries_occupation_sex.csv",
"https://storage.googleapis.com/dosm-public-economy/salaries_strata_sex.csv",
"https://storage.googleapis.com/dosm-public-economy/salaries_age_sex.csv")
## Read all
median_salary_all <- map_df(link_salary, ~ read.csv(.x) %>%
## Remove overall calculation from datasets
filter(sex != "overall", variable_en != "Overall") %>%
## Select only necessary columns
select(-c(variable_bm, variable,recipients)) %>%
## Put category whether its state, industry etc
mutate(Remark = str_to_title(str_extract(.x, "(?<=salaries_)[a-z]+"))))
This function is created to simplify the process of cleaning & plotting multiple datasets with the same format (State, Ethnic etc):
plot_median_salary_ratio <- function(data, remark_var, ncol) {
# create interactive ggplot with facet wrap by state and line color by sex
data %>%
## Filter based on state, industry etc
filter(Remark == remark_var) %>%
## Plot graph
ggplot(aes(x = year, y = `Female/Male`, color = variable_en)) +
## Point chart
geom_point_interactive(aes(tooltip = `Label Median`,
data_id = variable_en)) +
## Line chart
geom_line(size = 0.2) +
facet_wrap(variable_en ~ ., scales = "free_x", ncol = ncol) +
theme(legend.position = "none",
title = element_text(face = "bold"),
strip.text = element_text(face = "bold")) +
xlab("Year") +
ylab("Female to Male Median Salary Ratio") +
labs(caption = "Data from DOSM. Graph by Zahier Nasrudin") +
# Title
ggtitle(paste0("Ratio of Female to Male Median Salary by Year and by ", remark_var, " in Malaysia")) +
# add a horizontal line at the ratio of 1
geom_hline(yintercept = 1, linetype = "dashed", color = "black", size= 0.2)
}
In the first step of the analysis, the ratio of median salaries between female and male workers in Malaysia was calculated. This was done by dividing the median salary for female workers by the median salary for male workers in each year (2010-2021). The resulting ratio provides a measure of the gender pay gap in Malaysia, where values below 1 indicate that women earn less than men.
### Make it wider
median_salary_all_wider <- median_salary_all %>%
pivot_wider(names_from = c(sex),
values_from = c(mean, median))
### Calculate ratio
median_salary_all_wider <- median_salary_all_wider %>%
mutate(`Female/Male` = round(median_female / median_male, 2),
variable_en = str_to_upper(variable_en),
`Label Median` = glue("\nMedian Female: {median_female}\nMedian Male: {median_male}\nRatio: {`Female/Male`} ({year})"))
The graph below is displaying the ratio of female to male median salary in Malaysia by state (from 2010 - 2021). It shows the changes in the pay gap between male & female across different states in Malaysia. This will also help us identify which states have a wider/narrower gender pay gap & how this gap has changed/evolved over the years
median_graph_state <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "State",
ncol = 3)
girafe(ggobj = median_graph_state,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-state-download")
))
In this section, we analyze the gender pay gap by strata, which refers to the different levels of urbanization in Malaysia (Urban & Rural)
median_graph_strata <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "Strata",
ncol = 3)
girafe(ggobj = median_graph_strata,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-strata-download")
))
It is interesting to note that the ratio of median female to male salary in Malaysia is consistently below 1 across all levels of strata and over the years. To visualize this trend, we now then created a bar chart that shows the ratio of median female to male salary for each level of strata over the years (side by side).
median_graph_strata2 <- median_salary_all_wider %>%
filter(Remark == "Strata") %>%
ggplot(aes(x = year, y = `Female/Male`, fill = variable_en)) +
geom_bar_interactive(stat = "identity", position = "dodge",
aes(tooltip = `Label Median`,
data_id = year)) +
labs(x = "Year", y = "Ratio (Median Salary Female / Median Salary Male)",
title = "Gender Pay Ratio by Strata in Malaysia",
subtitle = "Values below 1 indicate women are paid less than men",
fill = "Strata") +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(face = "italic")) +
scale_x_continuous(breaks = unique(median_salary_all_wider$year))
girafe(ggobj = median_graph_strata2,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-strata-download2")
))
Next, we explored the gender pay ratio by ethnic group in Malaysia. The dataset prepared by DOSM in this analysis categorizes ethnicity into six levels: Bumiputera, Chinese, Citizen, Indian, Non-citizen, and Others. It is vital to note that the use of the ethnic levels in the analysis is based on the data provided by DOSM and may not necessarily align with individuals’ self identified ethnicities
median_graph_ethnic <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "Ethnicity",
ncol = 3)
girafe(ggobj = median_graph_ethnic,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-ethnic-download")
))
We also examined the gender pay gap by education level. There are four categories: no formal education, primary education, secondary education, and tertiary education.
median_graph_education <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "Education",
ncol = 2)
girafe(ggobj = median_graph_education,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-education-download")
))
For the occupation analysis, we will be going for a slightly different approach. Instead of using the ratio of median salaries in Malaysia, we will calculate the pay gap as the difference between the median salary for women and the median salary for men, divided by the median salary for men. In simpler words, we calculate (median salary female / median salary male) - 1. This gives us a percentage difference, with negative values indicating that women earn less than men, and positive values indicating the opposite. We then plotted this pay gap by occupation over the years:
median_occupation <- median_salary_all_wider %>%
filter(Remark == "Occupation") %>%
mutate(pay_gap = round(median_female/median_male-1, 3),
`Label Median Diff` = glue("\nMedian Female: {median_female}\nMedian Male: {median_male}\nDiff: {pay_gap * 100}% ({year})"))
median_graph_occupation1 <- median_occupation %>%
ggplot(aes(x = year, y = pay_gap, fill = factor(sign(pay_gap)))) +
geom_col_interactive(position = "dodge", aes(tooltip = `Label Median Diff`,
data_id = variable_en)) +
scale_fill_manual(values = c("red", "blue")) +
labs(x = "Year", y = "Pay Gap",
title = "Gender Pay Gap by Occupation in Malaysia",
subtitle = "Negative values indicate women are paid less than men",
fill = "Pay Gap") +
facet_wrap(variable_en ~., ncol = 2) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 90, vjust = 0.5),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(face = "italic")) +
scale_x_continuous(breaks = unique(median_occupation$year)) +
scale_y_continuous(labels = scales::percent)
girafe(ggobj = median_graph_occupation1,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-occupation-download1")
))
For those who favor the line chart (Ratio):
median_graph_occupation <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "Occupation",
ncol = 2)
girafe(ggobj = median_graph_occupation,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-occupation-download")
))
For the age group analysis, we will look at the median salaries for both genders across different age groups. The data was divided into six age groups: 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54 & 55-59. The ratio values are then plotted on the graph below to visualize the changes in the pay gap over time. The aim is to identify any trends or patterns in pay gap across different age groups over the years
median_graph_age <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "Age",
ncol = 3)
girafe(ggobj = median_graph_age,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-age-download")
))
Lastly, the chart below is displaying the ratio by industry (over the years). Again, ratio of 1 indicates that men and women are being paid equally, while a ratio below 1 indicates that women are being paid less than men.
median_graph_industry <- plot_median_salary_ratio(data = median_salary_all_wider,
remark_var = "Industry",
ncol = 2)
girafe(ggobj = median_graph_industry,
options = list(
opts_hover_inv(css = "opacity:0.1;"),
opts_hover(css = "stroke-width:2;"),
width_svg = 8, height_svg = 6,
opts_toolbar(position = "bottom", delay_mouseout = 3000, pngname = "median-salary-industry-download")
))
The interactive visualizations presented in this blog post provide a comprehensive overview of the pay gap trends by various factors. While this analysis sheds light on the extent of the gender pay gap in Malaysia, it is important to note that other factors beyond the scope of this analysis may contribute to the pay gap, such as differences in work experience and job preferences. Nonetheless, this analysis serves as a starting point for further exploration and discussion on how to close the gender pay gap in Malaysia.