Category: data

Michael Schmidt Interview with Trump

This is just a quick post, but this was too funny not to post.

Frankly there is absolutely no collusion…Virtually every Democrat has said there is no collusion. There is no collusion…I think it’s been proven that there is no collusion…I can only tell you that there is absolutely no collusion…There’s been no collusion…There was no collusion. None whatsoever…everybody knows that there was no collusion. I saw Dianne Feinstein the other day on television saying there is no collusion [note: not true]…The Republicans, in terms of the House committees, they come out, they’re so angry because there is no collusion…there was collusion on behalf of the Democrats. There was collusion with the Russians and the Democrats. A lot of collusion…There was tremendous collusion on behalf of the Russians and the Democrats. There was no collusion with respect to my campaign…But there is tremendous collusion with the Russians and with the Democratic Party…I watched Alan Dershowitz the other day, he said, No. 1, there is no collusion, No. 2, collusion is not a crime, but even if it was a crime, there was no collusion. And he said that very strongly. He said there was no collusion…There is no collusion, and even if there was, it’s not a crime. But there’s no collusion…when you look at all of the tremendous, ah, real problems [Democrats] had, not made-up problems like Russian collusion.

If you are wondering what I am getting at, the word “collusion” is used very often. How often you may ask?

If you want the code, I have posted it as a gist.

Chicago Murder Rate and Leaflet for R

Recently got hooked on Leaflet for R which is an amazing library that allows for interactive Javascript maps to be generated within R. Still have a bit to explore, but the way it handles layering on a map is really nice. I plan on writing a few UI extensions as well, since there is some interactivity I would love to add (sliders without server calls being the biggest).

Chicago Murder Rate by Year
Chicago Murder Rate by Year

Everyone was talking about the Chicago crime rates, so I figured I would give the dataset a go, and came up with the above animation. Each frame was a subset of the murder rates for that frame’s year. Years go from 2001 – 2016, and I will open up the code once I improve the UI.

Politics vs The_Donald Subreddit Domain Linkage

Ran this through Google’s Big Query on the two largest political subreddits on The two queries were

SELECT domain,count(*) as Count FROM [fh-bigquery:reddit_posts.2017_01] WHERE subreddit = "politics" GROUP BY domain ORDER BY Count DESC
SELECT domain,count(*) as Count FROM [fh-bigquery:reddit_posts.2017_01] WHERE subreddit = "The_Donald" GROUP BY domain ORDER BY Count DESC

Make your own conclusions, but the results are pretty telling for the diversity of posts and their content.

Reddit Account Ages

This is the data which resulted from scraping the top 1,000 “hot” posts from various subreddits. As you can see, the_donald posts originate heavily from new accounts. You could explain the discrepancy as “political season”, but SandersForPresident doesn’t seem to have that issue. Reddit was predominately left beforehand, but then you have to wonder why people are starting to gravitate toward reddit for their pro-trump discussions.

Seems like something is up. An explanation given by a pro-trump person was “conservatives get banned”, but they sure don’t get banned from the_donald, and it is against policy to create new accounts to access subreddits you have been banned from. Heck, when the_donald banned me they explicitly stated that. Anyhow, enjoy.

The Gas Tax Myth

Whipped this up based on the 2011-2013 ODOT data because someone was complaining about cyclists not paying the gas tax where they lived (Portland). Whelp, taking into account the damage (1/9,600 the damage of a car, and roads cost a pretty penny to repair) and that many cyclists have cars if not at least a license, (89%) it is clear that cyclists subsidize motorists. Oh yeah, pollution. It is 2016, can we stop hating cyclists?

Using Police Data to Track Citizens

While looking for city data to keep up with my data analysis and visualization skills I found a disturbing data set.

The Minneapolis Police Department uses automatic license plate recognition to help aid in solving crimes in real-time. They also maintain this data about innocent citizens for months after it was collected. These data were obtained legally, via the Minnesota Government Data Practices Act. They have been pre-processed and deidentified. This file contains over 800k plate readings. –

I decided to check it out, and see what a limited exploratory analysis could determine from the top 20 license plate hits, I will start with the biggest issue (in my opinion),

Determine When a Person is Home

Here is a map of the top 20 license plates and the data points for when the police scanner ran their plate.

License Plates Aggregated
License Plates Aggregated

Now if we look at a time where people are most likely at work (say 11am) we can grab the hashes of the license plates that work at specific locations.

License Plates at 11am
License Plates at 11am

You can see above that the purple person is almost always SE of the cities, The lat/long is 44.935696,-93.38466 which corresponds to the Home Choice Parking lot in Hopkins, MN. Where will they be 12 hours from then?

License Plates at 11pm
License Plates at 11pm

Above you can see they pretty much always return to Dinkytown, Minneapolis. Using this method you could potentially

  • Determine whether or not a person is likely to be home at a certain time
  • Determine where someone likely lives
  • Determine where someone likely works
  • Determine if your employee took their car out when they called in sick (two data points already known)
  • Determine what someone is up to on their time off (if work and home locations known)

Obviously with more data points the more reliable the data becomes, and this was only run for a few months with much of the data tagged with a 0/0 for longitude and latitude.

Using Police Data to Track Targeted Neighborhoods

Using the same data it is also possible to track which neighborhoods are targeted by these scans

Plate Counts
Plate Counts