The Blog for Data Visualization

Project Phase 1 / Post 3 __ Reflection

The presentation of project phase 1 is accomplished today. I’m satisfied with these 3 different parts and especially the crime ranking through Tableau.  I learned a lot from the online course and the help of the online community.  The feedback of my classmate seems not that bad. However, I do realize some aspect needed to be polished.

  • The color chosen seems to be carefully considered. I choose the color before from designing perspective. However, the pale color of the chart confuses my audience. I should have thought more about the functional part of my visualization by combining the meaning of color and crime name.
  • The explanation of my chart is not sufficient enough to understand the content. I was trying to explain more through context but forget to make it clear thought eh title of charts.
  • It’s still hard to combine three different parts in different platforms. I will make a website to show them on the same page to enforce the consistency.

Data Visualization Phase 1 Project

The Sample DataSet I download from Open Data NYC contains 5 attributes including:

  • ParkName
  • Borough
  • Size
  • Category
  • CrimeNumber

There are 1158 Parks and 7 kinds of crime.

There are 4 workbooks in total from Q3/2016 to Q4/2017

My Phase 1 Project contain 3 parts:

  • Number Of Park Crimes From Q3/2016 — Q4/2017 (I cannot use iframe to embed 🙁 ) [LINK]
    • In the first section, I focus on the total number of each crime from Q3/2016 to Q4/2017. I find it interesting that the number of crime decrease in Q1 and Q2. I guess it’s about the weather. Therefore, I combine the average temperature and the crime number together.
  • Park Crime Number Ranking [LINK]
    • In the second section, I focus on the most dangerous park which the number of crime is large. I rank the 4 quarters and obtain the top 10 of each quarter. I also made a search for end users to search by park Name. I combine the line chart in the ranking chart to make it easy to see the change,
  • NYC Borough Park Crime Number [LINK]
    • In the third section, I focus on the borough part. I want to utilize the map and mark each park. However, I find it too hard to write 1158 destination in  geojson file. I change my mind and decide to use the area map to see the level of security of each borough. I download a geojson file and add “Crime_Number” property inside.

The Whole 3 sections look like this 🙂


Project Phase 1_/Post 2_ Learning __Visualizing Dense Data: How to Show Rank without Overcrowding

Today, when I was trying to visualize my park crime data to a dangerous park ranking. I’m having some problems. I grab the data through the crime ranking number and made this sheet 🙂

It’s kind of hard to visualize it in a simple way because it’s all about the park name and number. Not all parks show up in all quarters.  I cannot directly show their name through the color connection. Then I google and find this article I want to share [How to Show Rank without Overcrowding Your Viz].

The author of the article combines the line chart and ranking chart together to visualize the connection. Also, He uses the abbr to concisely show the park, which makes it easy to read. This is the optimized sheet:)


Project Phase 1_/Post 1 _The Process Of How I Choose My Dataset

When I check the approved dataset list, I find the OpendataNYC is an amazing website containing tons of interesting datasets. Therefore, I decided to choose one from these datasets.

Here are several factors in my consideration.

  • Timeliness and freshness
  • Size of dataset
  • The relevance of each dimension and measures
  • expansibility

At the beginning,  I want to analyze the date of CityBike, because the shared biking business is a rising industry in China. It must be interesting to analyze the relevance of subway station, bike renting station, climate,  subscriber, and the biking route. However, to my surprise, the monthly dataset of CityBike contains over 500’000 rows of data. Considering my limited time of this assignment and feasibility, I have to quit this Idea.


Then,  I tried Noise Report, Facebook Ad, Driving Violation, and Trump New Hired List, but all failed for some reason. These data somehow were too old and pointless to analyze anymore. Most of them contain lots of NULL value in the dataset and hard to ease the noise data.

Finally, I find the dataset of NYC Park Crime Dataset from Q3 2016 to Q2 2017.

The data size is doable and fresh. The dataset contains park name, the borough, size, category, and numbers of crime, which enable me to analyze the relationships among them. I also have four quarters records so that I can compare the number of each season. It also involves the location, which seems enable me to use the map visualization 🙂

Then, I use Tableau to do some easy analyses.



In the following days, I will try to optimize my visualization

My Toolkit Presentation

Download (PDF, 310KB)

© 2017 dATAv

Theme by Anders NorenUp ↑