Over the course of the last few weeks, much of my fellowship work has included hunting down data resources for two major projects: the DOE HVAC Hackathon and the job creation analysis. One of the great things about working at the city level is the state-of-the-art data sharing platform used by city government. NYC OpenData is regularly updated by all NYC agencies to reflect non-individualized data that serves a public purpose. Since both the Hackathon and my summer-long research project have been tailored to narratives at the municipal level, NYC OpenData is the first resource I use to come up with relevant datasets like the School Quality Report series or DOB Job Permit Issuance.
However, sometimes there are datasets created, used and publicized by the city that are not published on OpenData. In the case of the Hackathon, one such resource is the Department of Education’s Building Ventilation Status report, which is the foundational building block for the competition centered on air quality in public schools during COVID-19. The dataset is used and displayed prominently on the DOE’s website under School Search, which is a resource linking to all major information regarding public schools run by the city. However, the ventilation system information is nested under each general school report, making parsing out the information really difficult, if not impossible under the time constraints of the Hackathon.
This week, I’m spending most of my time (1) contacting folks at the DOE to see if they can send me the dataset, and (2) creating a contingency plan to scrape the data from the website if necessary. For the job creation project, the process is similar, as the data needed to assess job growth at the city level and project numbers from the Build it Back program is housed in various places across NYC OpenData but does not exist as a single resource. This is one of the challenges of data-driven work, especially if you don’t collect your own data.
Conor Brady says
Very interesting. Thanks for sharing. I’m curious to learn what success you have getting that data from them, and what the reasons are for some of the seemingly public data not being available on OpenData.