Final Project link: Donation Impacts.
Presentation link: Final Presentation.
Final Paper link: Final Paper
Boston Housing Voucher Holders Map made with CARTO: 2014
Final visualization image:
This week we communicate with Keven asked him about the companies that Brainpop is interested in. He gave us eight company’s name and told us how to search these company’s information in the messy dataset. One of the issues we met was that these companies cannot be searched in “vendor name”. So we asked Professor and NYU Data services for some technical help. Thankfully，Professor showed us how to use python to search higher base and lower base strings, which called “regular expressions”. That’s really helpful.
Chian and I also talked about what kind of visualization we want to express. We listed these questions : “What should we visualize?” “How to visualize how successful over from donor’s perspective and students who receive the donation?” “How to show the impact of each individual company?” “What is EdTech?”
We also made a rough visualization on the sketch book. We listed the specific class we think is useful and decided to drop others.
We back to clean data again. First, we drop the useless class column and export the project file to a smaller one.
Then we start to clean the vendor’s data sources called ” project resources”. This dataset has more about 800,000 rows. It’s really hard to open this data if we didn’t clean it. We use the “regular expression” which professor told us to export each vendor as an individual csv file.
Finally, we try to use merge each vendor’s file with the project file.
The data cleaning part is the most time-consuming part and we really try lots of methods to clean this huge dataset. Finally, we find that python is the best way. It’s also a great chance to learn some basic Python through this experience.
Then we start our first visualization. One of our problems is that we try to use a dashboard to collect each visualization into one page, which we think is the clearest and most concise way to present data.
However, when we try to connect the vendor map with other visualization sheets, the data can not be connected. We want to make our visualization more reasonable so that we arrange an appointment with NYU data services to figure out this problem.
These two week Chian and I use Jupyter notebook clean the data. However, we meet some problems when we clean the data.
The first problem is : when I try to use “select” to clean the row, some company have too much data but some just have one data. We have emailed Kevin to decide how many useful company they decide to use.
Even “Brainpop” has only one group.
While waiting for Kevin’s feedback, we clean the column fist. We “delete” some useless column and try to do some visualization.
I think “item_quantity” could show the success of one company so that I use tableau to select first top 10 best seller company.
I also add the “grade_level” into tableau and try to analyze the pattern of these data. From the histogram we could see that different company focus on different grade level.
From our last meeting with Kevin, he told us they want to see some pattern or trends over time. I added the “date_completed” column in this chart. From the trend we can see the “selling trend” in different grade of different company.
A different display mode of different company selling trend.
However, because of the huge dataset, we still need to choose some company to clean the data again.
Last week I went to Museum of the City of NYC. One of a exhibition named “NY AT ITS CORE” give me a lot of inspiration about data visualization. This exhibition shows 400 years of NYC history. They use different visualizations to show the city’s development in different part and tell the story about “What make New York New York?” I think one of the most interesting visualization is about the New Yorker’s life (following video).
This exhibition give me lots of inspiration about data visualization. It also have a special exhibition called “Future City Lab”, people could stand in front of the camera, so that the system could shot you in a future street. This interactive way to let people explore the city is more useful and impressive than just reading the data or watching the video. When we face a huge dataset (like our phase3 project), we can jump out of tradition patterns (like chart or graph) to visualize our data.
About our Phase 3 project:
Chian and I had successfully open one of our dataset through Tableau. However, we have three dataset to connect and the data was too huge to open in one sheet. We try to understand our data and choose the useful column. We decided to meet Kevin to talk about the dataset because we couldn’t understand some column’s meaning.
Meet with Kevin, talk about cleaning the dataset and choose some specific values to narrow down the dataset.
Make our first visualization.
Hi, here is my Phase2 Paper link: