When I check the approved dataset list, I find the OpendataNYC is an amazing website containing tons of interesting datasets. Therefore, I decided to choose one from these datasets.
Here are several factors in my consideration.
- Timeliness and freshness
- Size of dataset
- The relevance of each dimension and measures
At the beginning, I want to analyze the date of CityBike, because the shared biking business is a rising industry in China. It must be interesting to analyze the relevance of subway station, bike renting station, climate, subscriber, and the biking route. However, to my surprise, the monthly dataset of CityBike contains over 500’000 rows of data. Considering my limited time of this assignment and feasibility, I have to quit this Idea.
Then, I tried Noise Report, Facebook Ad, Driving Violation, and Trump New Hired List, but all failed for some reason. These data somehow were too old and pointless to analyze anymore. Most of them contain lots of NULL value in the dataset and hard to ease the noise data.
Finally, I find the dataset of NYC Park Crime Dataset from Q3 2016 to Q2 2017.
The data size is doable and fresh. The dataset contains park name, the borough, size, category, and numbers of crime, which enable me to analyze the relationships among them. I also have four quarters records so that I can compare the number of each season. It also involves the location, which seems enable me to use the map visualization 🙂
Then, I use Tableau to do some easy analyses.
In the following days, I will try to optimize my visualization