Remember that NYC Taxi data set that allowed you to see who visited a gentlemen’s clubs and which celebrity took a taxi where? Reddit user uluman now seems to have found a way to distinguish Muslim taxi drivers from the set. He explains how:
Since Islam instructs followers to pray 5x daily at specific times, I wondered if one could identify devout Muslim hacks solely from their trip data. For drivers that do pray regularly, there are surely difficulties finding a place to park, wash up and pray at the exact time, but in many cases banding near prayer times is quite clear. I plotted a few examples.
Each image shows fares for one cabbie in 2013. Yellow=active fare (carrying passengers). A minute is 1 pixel wide; a day is 2 pixels tall. Blue stripes indicate the 5 daily prayer start times which vary with the sun’s position throughout the year.
- Taxi data: http://www.andresmh.com/nyctaxitrips/
- Prayer times: http://www.islamicfinder.org/prayerDetail.php?city=New%20York&state=NY&country=usa&lang=english
- Tools: Python / Python Imaging Library
The result is an eerie prediction of the religion (and devoutness) of a cab driver. Not everyone is convinced, as is evidenced from the Reddit thread.
(In)activity as sensitive personal data?
This data plotting brings up some interesting legal questions, especially from an EU perspective. Under the EU Data Protection Directive, the processing of personal data is subject to certain restrictions. For a special category of data considered sensitive, the regime is even stricter as the default rule is that such processing is prohibited (Art. 8 Directive). This category of special data includes, ‘personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and the processing of data concerning health or sex life’ (Art. 8 Directive, emphasis added). A question that comes to mind when looking at this data plotting is: if you can deduce someone’s religion by their (in)activity at certain times of the day, such as around prayer times, is that data then sensitive personal data?
Whatever the answer may be, it is clear that those releasing data sets should be careful when it includes data on the (in)activity of people. Perhaps, this is something that providers of open data and companies like Uber can take into account, seeing as the latter has plans to share data with the city of Boston.
By: Anna Berlee, PhD-researcher at Maastricht University, the Netherlands.