Login

Register

Login

Register

✆+91-9916812177 | contact@beingdatum.com

Analyzing COVID-19 using Python

We are in this together – And we will get through this together 

 

The Coronavirus outbreak since late January in India has caused a dystopia. Consequently, India underwent weeks of shutdown. An attempt to halt the spread of this virus. It has been difficult to analyze SARS-CoV-2 (Coronavirus). This can be contributed to a lack of necessary data and the innate nature of the virus itself. The first case of this virus infection was reported in November 2019. Since then, efforts have been made to gather more data to perform research. This blog attempts to analyze some of that data. Subsequently, draw some conclusions based on those results about the COVID-19 pandemic in India.

Structure :

       1. About the Data

       2. The code

      1.  Libraries used
      2.  Importing dataset
      3.  Sorting and extracting data
      4.  Plotting scatter graphs

       3. Conclusions

       4. Limitations

       5.  Links and references

1. About the Data

The dataset covid2.csv is compiled and updated to August 7, 2020. Firstly, India has an immense and diverse population. Secondly, it lacks proper socio-economic infrastructure. Subsequently, it is difficult to find usable public data in India. The data that is available has been tabulated meticulously. Population data was taken from the 2011 census. It seemed outdated at first, but the huge scale of the Indian population compensates for that. Sometimes data isn’t available for all the states of India. Efforts had been made to minimize inaccuracies.

Data used in this analysis for each state:

  1. The number of active, total, and cured/discharged/migrated cases.
  2. The number of deaths.
  3. Population
  4. Literacy rate
  5. Per capita and total GDP
  6. Healthcare expenditure
  7. Number of homeless people (estimate)

Sources and references to the data used are linked in the “Data and References” section later.

2. The Code

The program was written and run on Python version 3.8.2. It is split into several sections for better understanding and clarity. The source code is available here.

Libraries used

The code was run on python version 3.8.2 with the following libraries installed via pip.

  1. matplotlib
  2. seaborn
  3. pandas
  4. numpy
  5. os

#Libraries used
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import os

Importing dataset

Using the pandas library to read the file covid2.csv.


data0=pd.read_csv("covid2.csv")

Sorting and extracting data

This section of code is by far the longest and may seem confusing. In short, all it does is extract useful data using the loc[] in pandas. Meanwhile, storing it in the initialized lists using for loops. This helps in reusing the same list for plotting different scatter graphs later.


#Sorting data
cases=x.loc[:,"Active Cases"]
populationlist=x.loc[:,"Population"]
cases2=x.loc[:,"Total Cases"]
literacyrate=x.loc[:,"Literacy rate "]
deaths=x.loc[:,"Deaths"]
gdspc=x.loc[:,"GDP per capita"]
gdspt=x.loc[:,"Total GDP"]
gdphealth=x.loc[:,"Healthcare  expenditure"]
deaths2=x.loc[:,"Death2"]
homeless=x.loc[:,"Homeless"]
totals2=x.loc[:,"Total2"]
deaths3=x.loc[:,"Death3"]

#Initialising lists
pop=list()
active=list()
total=list()
literacy=list()
death=list()
gdp=list()
gdpt=list()
gdph=list()
death2=list()
home=list()
total2=list()
death3=list()

#extracting useful data
for index in cases:
    active.append(index)
for index in populationlist:
    pop.append(index)
for index in cases2:
    total.append(index)
for index in literacyrate:
    literacy.append(index)
for index in deaths:
    death.append(index)
for index in gdspc:
    gdp.append(index)
for index in gdspt:
    gdpt.append(index)
for index in gdphealth:
    if index>0.0:
        gdph.append(index)
for index in deaths2:
    if index>0.0:
        death2.append(index)
for index in homeless:
    if index>0.0:
        home.append(index)
for index in totals2:
    if index>0.0:
        total2.append(index)
for index in deaths3:
    if index>0.0:
        death3.append(index

Plotting scatter graphs

This section is the longest section of this program. It runs a choice-driven menu which makes it easier to revisit any scatter graph for any number of times, and exit the program when the user demands it. Choices 1-10 plot scatter graphs using matplotlib.pyplot and choice 11 exits the program.


i=True
while i==True:
    y=input('''Select your choice of Plot :
                1) Active Corona cases vs Population in various states
                2) Total Corona cases vs Population in various states
                3) Total number of deaths vs Active cases of state
                4) Total number of deaths vs Total cases of state
                5) Total Corona cases vs Literacy rate in various states
                6) Active Corona cases vs Literacy rate in various states
                7) Total number of deaths vs Total GDP of state
                8) Total number of deaths vs Total Healthcare expenditure of state
                9) Total number of deaths vs Total homeless people in  state
                10) Total number of cases vs Percentage of Homeless people
                11) Exit \n''')
    if y == '1':
        plt.scatter(np.log(active),np.log(pop))
        plt.title("Active Corona cases vs Population in various states")
        plt.xlabel("log(Active cases)")
        plt.ylabel("log(Population in a state)")
        plt.show()
    elif y == '2':
        plt.scatter(np.log(total),np.log(pop))
        plt.title("Total Corona cases vs Population in various states")
        plt.xlabel("log(Total cases)")
        plt.ylabel("log(Population in a state)")
        plt.show()
    elif y=='3':
    #3)Active cases vs Death
        plt.scatter(np.log(active),np.log(death))
        plt.title("Total number of deaths vs Active cases of state")
        plt.xlabel("Active cases")
        plt.ylabel("Total number of deaths)")
        plt.show()
    elif y=='4':
        #4)Total cases vs Death
        plt.scatter(np.log(total),np.log(death))
        plt.title("Total number of deaths vs Total cases of state")
        plt.xlabel("Total cases")
        plt.ylabel("Total number of deaths)")
        plt.show()
    elif y=='5':
        #5)Total cases vs Literacy rate
        plt.scatter(np.log(literacy),np.log(total))
        plt.title("Total Corona cases vs Literacy rate in various states")
        plt.xlabel("Literacy rate in states")
        plt.ylabel("log(Total cases)")
        plt.show()
    elif y=='6':
        #6) Active cases vs literacy
        plt.scatter(literacy,active)
        plt.title("Active Corona cases vs Literacy rate in various states")
        plt.xlabel("Literacy rate in states")
        plt.ylabel("log(Active cases)")
        plt.show()
    elif y=='7':
        #7)Total deaths vs GSDP
        plt.scatter(gdpt,np.log(death) )
        plt.title("Total number of deaths vs Total GDP of state")
        plt.xlabel("Total GDP of states")
        plt.ylabel("Total number of deaths)")
        plt.show()
    elif y=='8':
        #8)Healthcare expenditure vs Death
        plt.scatter(np.log(gdph),np.log(death2))
        plt.title("Total number of deaths vs Total Healthcare expenditure of state")
        plt.xlabel("Total Healthcare expenditure of states")
        plt.ylabel("Total number of deaths)")
        plt.show()
    elif y=='9':
        #8)Healthcare expenditure vs Death
        plt.scatter(np.log(death3),home)
        plt.title("Total number of deaths vs Total homeless people in  state")
        plt.xlabel("Total number of deaths")
        plt.ylabel("Total number of Homeless people")
        plt.show()
    elif y=='10':
        #8)Healthcare expenditure vs Death
        plt.scatter(total2,home)
        plt.title("Total number of deaths vs Homeless people of state")
        plt.xlabel("Total numbe of cases in states")
        plt.ylabel("Total number of homeless people in states)")
        plt.show()

    elif y == '11':
        i=False
        print("***************************Have a nice day*******************************")
    else :
        print("Terminate and try again")
        break
    os.system('clear')

 

3. Conclusions

1) Active/Total cases vs Total population

The scatter plot obtained from the total number of active cases vs the total population indicates that with increasing population the total number of active cases is rising exponentially since the scale used is logarithmic. Similarly, the number of total cases also increase exponentially for a higher population in states. Both of these results are an expected and simple observation intuitively, and graphically.

 

2) Active/Total cases vs Number of Deaths

These two graphs indicate that the number of death increases exponentially for both the active and total number of infection cases in a state. This is again an expected and intuitive result.

 

3) Number of cases/deaths vs Total number of homeless people

This is a very interesting result as the scatter plot reveals multiple straight lines with different slopes in both the graphs. But in general, the number of total cases and deaths seem to be increasing with the number of homeless people with a few exceptions.

4) Total number of deaths vs Total GDP of a state

An interesting result which is very counter-intuitive. States with higher GDP are expected to have better healthcare systems. Hence reducing the number of deaths. But the following graph shows otherwise. States with higher GDP have more number of deaths due to the infection. This may be because states with higher GDP usually have a large population.

 

5) Other results

Some of the results of this analysis did not show a definite pattern.  Some of these were expected to show a pattern. I have listed all those results below.

 

 

4. Limitations

  1. Although definite trends and patterns are observed in the scatter plots, they all have exceptions in them. Moreover, some scatter plots did not even show a pattern,
  2. Data on all states and union territories wasn’t available. Some scatter plots were based on the data of 15-20 states of India.
  3. Scatter plots are not clear to understand. Since there is no one-to-one relationship, only scatter plots can represent such data.
  4. The population census of 2011 is outdated.
  5. Data is not updated after August 7th, 2020.

 

5. Links and references

  1. COVID-19 cases, deaths, and cured/discharged data as of August 7th, 2020.
  2. The population of Indian states as reported in the 2011 census.
  3. Literacy rate in Indian states.
  4. GDP and GDP per capita of Indian states.
  5. Healthcare expenditure of major states in India.

 

 

 

 

 

 

 

 

 

 

August 27, 2020

2 responses on "Analyzing COVID-19 using Python"

  1. Hi,

    It was an interesting insight!

    Thanks

Leave a Message

Your email address will not be published. Required fields are marked *

© BeingDatum. All rights reserved.
X