Welcome to my website, a mindmap of the thought processes I go through each day.
Here you can find a portfolio over my research, bioinformatic projects, Python (as in a programming language) and even some scouts!
The site is built upon a wiki, so there is no clear hierarchy. Instead, follow links throughout the maze or if in doubt, consult the sitemap.
Come back regularly!
Plotting Likert-scale data with ggplot2, tidyverse, and lemon.
The following graph was presented in Avisen.dk on March 22nd 2018.
Hvor enig eller uenig er du i følgende udsagn:
“Vi har som samfund råd til at give de offentligt ansatte en mærkbar lønforhøjelse under forhandlingerne om nye overenskomster?”
Kilde: Wilke for Avisen.dk (marts 2018) og Analyse Danmark for Avisen.dk (oktober 2017). Note: I undersøgelsen foretaget af Wilke deltog 1.014 respondenter, mens 2.680 deltog i undersøgelsen for Analyse Danmark.
This translates to:
As a society, we can afford to pay public servants a considerable raise during the negotiations of the collective (union) agreements??
The article's topic was to compare the shift in the proportion of agreeing participants between October '17 and March '18. I thought the given visualisation could be approved upon; not only showing an increasing proportion of agree participants, but also that the increase was drawn from every other item.
I started by using divergent stacked bars, laid horizontally, as described in e.g. Data Revelations (and other posts I cannot recall). The issue with this, is the proportion of 'Don't know's (“Ved ikke”). Should they be excluded? Do they bundle up with the neutrals (“Hverken-eller”)? In this instance, I opted to put the 'Don't know's aside, but not excluded. The comparison was also partly to compare the agreeable vs. the disagreeable. The neutrals where grouped with the disagreeable; this puts the divergence in the centre of the plot. Keeping the same colours as the source, the result is as below:
Using less space, I would argue it is easier to digest.
To improve it further, the horizontal y-axis should have a split where the scale restarts between black and dark grey, and dark grey and grey.
library(dplyr) library(tibble) library(tidyr) library(lemon) df <- tribble( ~stat, ~Marts_2018, ~October_2017, 'enig', 66, 50, 'hverken-eller', 14, 18, 'uenig', 15, 20, 'ved-ikke', 5, 12) (p <- df %>% gather(year, val, Marts_2018, October_2017) %>% mutate(ymin=case_when(.$stat == 'uenig' ~ 18, .$stat == 'ved-ikke' ~ 18 + 20, TRUE ~ 0), year=c(October_2017='Oktober 2017', Marts_2018='Marts 2018')[year]) %>% mutate(y = case_when(.$stat == 'enig' ~ - .$val, TRUE ~ .$val + .$ymin)) %>% mutate(stat=factor(stat, levels=c('enig','hverken-eller','uenig','ved-ikke'))) %>% ggplot(aes(x=year, xend=year, y=y, yend=ymin, colour=stat)) + geom_segment(size=10) + coord_flex_flip() + scale_y_continuous('Procent', position='top', breaks=c(-60,-45,-30,-15,0,15,18,33,38,53), labels=c(60,45,30,15,0,15,0,15,0,15)) + scale_colour_manual(values=c(enig=rgb(235, 96, 29, maxColorValue = 255), 'hverken-eller'=gray(0.15), uenig=grey(0.5), 'ved-ikke'=grey(0.8)), labels=c(enig='Helt eller delvist enig', 'hverken-eller'='Hverken-eller', uenig='Helt eller delvist uenig', 'ved-ikke'='Ved ikke')) + theme(panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank(), axis.title.y=element_blank(), axis.ticks = element_blank(), legend.title=element_blank(), legend.position='bottom', panel.border = element_blank(), axis.line.x = element_line()) + labs(caption='Wilke for Avisen.dk (marts 2018, n=1.014) og Analyse Danmark (oktober 2017, n=2.680)\nhttps://www.avisen.dk/danskerne-i-ny-maaling-der-er-raad-til-loenstigninge_490703.aspx') )