Parks and COVID-19: What’s the Connection?

Hi everyone! My name is Hong Anh. I’m a rising junior, majoring in Computer Science and minoring in Data Science. Currently, I’m working with Prof. Ryan Johnson on a data science project about COVID-19.

Here’s a fun fact about me: I really like to go hiking, and I particularly enjoy green spaces like parks. On weekends, I often stroll around Culp’s Hill or Little Round Top with my friends and enjoy the greenery of Gettysburg. And guess what? It’s these adventures that sparked my data science questions: How does the proportion of open green space in a state or county affect the COVID-19 incident rate? Do places with more greenery have fewer or more COVID cases? Let the data answer our questions!

For the first weeks of the project, what I did was collect data and I have to say that was the most challenging part of the project so far. While acquiring data on COVID cases and death rates from USAFacts was a straightforward task, finding information on open green space was quite a hassle. Eventually, I managed to obtain a dataset containing information about all the parks in the United States from ParkServe.

My next step involves cleaning and manipulating the dataset, which includes information about covid cases, deaths, and green space (as measured by public parks). To perform the data analysis, I utilize Jupyter Notebook along with various Python libraries such as pandas, numpy, matplotlib, fiona, folium, etc. Specifically, pandas, numpy, and matplotlib are used for data storage, manipulation, and accessing plotting tools, while libraries like geopandas and folium enable the creation of visualizations such as choropleth maps. Here’s an example of the dataset I have:

Using the list of parks, I calculate the total size of parks within each county. Then, I create a new data frame that includes the county’s FIPS code, county and state names, and the percentage of park size within that county. With this data frame, I’m able to generate a choropleth map illustrating the distribution of green space across the United States. Additionally, I produce another choropleth map representing the cumulative COVID-19 cases in the country. In these choropleth maps,  darker colors indicate higher values, making it easy to identify areas of greater concentration or intensity.

Distribution of open green space
Distribution of the cumulative covid cases

Overall, looking at the maps, we can see that places with a higher percentage of open green space tend to link with higher covid cases. I find it intriguing because initially, it was commonly believed that areas with a higher proportion of green space would have cleaner air and less crowding, leading to lower COVID-19 case numbers. However, an alternate perspective suggests that such areas could actually experience a higher number of cases as people sought outdoor activities in parks during lockdowns, potentially contributing to an increase in COVID-19 transmission.

Professor Johnson and I then decided to create scatterplots to see if there was any potential strong correlation between variables. To analyze the COVID-19 pandemic, we divided it into three phases, each spanning six months, and plotted the data points on a logarithmic scale. Here’s an illustration of one such scatterplot, where the x-axis represents the proportion of green space, and the y-axis represents the number of COVID-19 cases:

After looking at the scatterplots, it is more apparent that there exists a correlation between COVID cases with green space. The data indicates that areas with a higher abundance of greenery tend to have a higher number of COVID cases. In my future research, I will delve deeper into specific phases of the pandemic and include additional datasets on open green spaces to validate this correlation.

Doing summer research has been fun so far as I’ve been learning so much along the way. When it comes to the technical stuff, I’ve been diving deep into the world of Python and working with data, which is totally new to me. Apart from that, I’ve learned some really important life lessons too. Like, patience is key when you can’t find the data you need. And when your hypotheses turn out to be way off, you gotta be flexible and not let it bring you down. Plus, being creative is crucial for figuring out what to do next in your research. By the way, in our research group, we make it a tradition to play hacky sack for a solid 45 minutes every day. It’s actually been really beneficial for my mental health, especially when I’m spending most of my time looking at code. It’s like a little break from all the intense coding! Overall, it has been a great summer and I can’t wait to learn more!

References:

  1. COVID-19 covid cases and deaths: https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/
  2. Park data: https://www.tpl.org/park-data-downloads

Leave a comment