The synthetic population includes nearly 20 million individuals and 7.5 million households in the whole New York State using the PUMS from 2021 5-year ACS. The marginals obtained from the synthetic population well matches the census marginals. When coming to attribute combinations, the synthetic population can still generally follow what the input sample depict. In addition, the synthetic population reconstructs the associations among household members that the input sample shows.
We propose a population synthesis framework that involves both the deterministic model and ciDATGAN to generate households and corresponding personal synthetic populations. The framework is illustrated in the figure below.
A wide range of socio-demographic variables are included, and the variables selected for this study can be found in Table 1. We aggregate categories of some attributes deemed too granular, such as age and working industry (NAICS). To capture potential spatial heterogeneity of the population between New York City (NYC) and non-NYC regions, we separate PUMS by filtering regions within and outside of NYC using the Public Use Microdata Areas (PUMAs).
The data can be accessed via https://zenodo.org/records/13732330
Because NYC is the most densely populated region in the US with high population diversity, we want higher population resolutions. Therefore, we further assign the NYC specific PUMS from PUMA level to Census Tract (CT) levels by using Popgen.
Household attribute | ||||
Non-NYC region attribute (label name) |
No. of values (range if continuous) |
NYC region attribute (label name) |
No. of values |
|
Residence area (PUMA) |
90 |
Residence area (CT) |
2313 |
|
Income level (HINCP) |
9 |
Income level (HINCP) |
9 |
|
Vehicle ownership (VEH) |
4 |
Vehicle ownership (VEH) |
4 |
|
Personal attribute | ||||
Age (AGEP) |
7 |
Age (AGEP) |
7 |
|
English proficiency (ENG) |
5 |
English proficiency (ENG) |
5 |
|
Commute trip length (JWMNP) |
0-140 min |
Gender (SEX) |
2 |
|
Commute mode (JWTRNS) |
13 |
Disability (DIS) |
2 |
|
School status (SCH) |
3 |
Working industry (NAICSP) |
2 |
|
Gender (SEX) |
2 |
Race white/non-white (RACWHT) |
2 |
|
Disability (DIS) |
2 |
|||
Working industry (NAICSP) |
20 |
|||
Race white/non-white (RACWHT) |
2 |