The tidyverse team recently completed a 1.0.0 release for
dplyr, which was a pretty big deal, and it included a bunch of new features. One of the things that I really enjoyed was that they wrote a series of blog posts describing new features in the release. This was great, because we got to see what was coming up, and great because people tried them out and gave them feedback. Then, the tidyverse listened, and changed behaviour based on feedback from the community.
Isn’t that great?
Let’s celebrate something from the tidyverse today:
rowwise. This function has actually been around for a while, but I never really used it, for some reason. A student recently had an issue where they had data like this:
They wanted to calculate the median of
This presents an interesting problem, with a few steps:
- Separate the range values into two columns.
- Calculate the median of each of those pairs of numbers.
We can get ourselves into a better position by separating out
income_range into two columns,
upper, and converting the contents. We can use
tidyr. It is kind of magical. While you can specify a specific thing that separates the numbers,
separate has a nice bit og magic that just finds the most likely character to separate on.
So now have a lower and an upper range of values, and we want to calculate the median of these.
This…gets a little bit tricky.
At first instinct, you might try something like this:
calculate the median based on the lower and upper columns:
But this doesn’t give us what we want. It just gives us the median of the vector, I think?
Anyway, how do we solve this?
We can now call
rowwise() and calculate the median based on the
upper, and it will consider each row and take the median of those two numbers:
And I think that’s pretty neat.
Thanks, tidyverse team!