BY MICHAEL YANG 4:01 PM 7/21/2020
The original purpose of the data we’re using was a Kaggle competition, where individuals would use machine learning to predict information about crimes, something I actually feel a bit of conflict over.
Why? Well, it’s because it's hard to predict crimes. If we rely on a machine learning solution based on the information provided, we can only look at a few things: type of crime, location, and time.
Of course, it's not hard to decide that if vehicle break-ins have been happening at location x over and over for years, x will probably see future break-ins.
But crime isn't something so easy to quantify, and I'm not sure how an ML algorithm will account for things like the backgrounds of the perpetrators, the average income/real estate value of a region, etc. There are simply too many things that help “cause” crime that are way beyond what the data offers. I would go as far as to say that there are things that simply aren’t quantifiable, but still have a sizable impact on crime.
For example, how does one quantify a region’s tolerance for crime? As crazy as that might sound, it is something to consider. In slums where crime is a constant occurrence, it is, to some extent, tolerated by the locals. I don’t mean that they accept it, but that, to some level, they accept its existence and their inability to prevent it from happening. There are cities and neighborhoods all over the world where murder, theft, etc. are all common things. I live in suburban America. If something like that were to happen in my backyard, everyone living here would be absolutely shocked. The difference between those two reactions, while apparent, isn’t readily quantifiable the same way we can’t numerically measure one’s love or hate. Just like my first example, the two are only really “measurable” through relativity. There’s no true metric.
Another issue I have is that I don't know if it's right for us to be presenting predictions of crime sites in the first place. Can you imagine if the house you lived in were predicted to have a break-in or a murder-type crime? Realistically speaking, of course, it probably wouldn't happen (SF isn't a terribly dangerous place anyway), but you can imagine how terrifying it would be.
And something that I’ve also been considering is the potential effect on real estate or personal property. Similar to how you can file defamation claims for someone accusing you of something bad, there are definitely going to be legal issues if the site flags a hotel or a restaurant as a potential location for crime. It’s bad for business. Also consider this: if people know your house is an area with a “high” chance of criminal activity, how are you going to sell it? Even if you’re not selling it, what do you think happens to the valuation of your house, of the houses nearby? Who’s going to want to pay rent for a building with that kind of stigma?
But we’re still doing it. Why? Well, I personally don’t expect many people to actually see this site. I’m kind of hoping so, mostly because I don’t want to end up paying the fees if somehow the Firebase and Google Maps API usage ends up getting pretty exorbitant. But also because science! A part of me wants to know. And maybe, if the city of San Francisco releases more data, we’ll be able to see if we got anything right.
Before collaborating with Anant for this hackathon, I had originally planned for this project to revolve around the display of historical data for awareness (if any judge is actually reading this, thank you for reading! But also, I actually did start the project from scratch. All the HTML, CSS, JS, Java was written the week of 7/20/2020, which you can probably tell seeing as it’s not super detailed or impressive). But I think this step in a more controversial direction will be good for both of us. Aside from learning programming, I think this’ll be an interesting perspective into how crime in San Francisco is distributed (hint: like most cities, some neighborhoods are hotspots for this activity). I think that, although the data we’re visualizing is no longer purely historical (it’s still based on historical data), it will still provide a great deal of insight regarding where in San Francisco could use some infrastructure improvements or additional police/security presence. It’s not like any homeowners or property owners of those regions are going to see this anyway (if any of you judges do live in a marked region, please don’t be concerned. We mark the places with the highest chance of criminal activity. Those chances could be the highest of that crime type, but that doesn’t mean they’re guaranteed).
But at the end of the day, to very deliberately say that a crime is going to take place in a certain location is bold. And I don't think that any existing algorithm has the capacity to do that with certainty. Especially one that utilizes historical data and nothing else. The world is constantly changing, and as time goes on, the data being used will slowly fall into irrelevance. Times change. Cities develop, crime rates have been historically going down.
The relevance of the data today (7/21/2020) has yet to become obsolete, of course. Some things don’t change quite as fast as we’d like them to.