COVID Vaccination Data Analysis

 COVID-19 Vaccination Data Analysis




covid vaccine

The Covid-19 pandemic is the most critical health disaster to hit the world. Predicting the trend of COVID-19 vaccination has become a challenge. Many healthcare professionals, statisticians, and researchers are tracking the spread of the virus in different parts of the world using a variety of approaches. The increase in a variety of vaccines developed by talented scientists and the vaccination process sparked curiosity to learn more about current immunization programs and a keen interest in finding meaningful information from the data. After looking up multiple websites, I found a few datasets to work with. 

The complete code is available on GitHub along with the dataset.

Data Collection

First, we will import libraries and the dataset. The dataset we’ll use for this project- we’ll call it cowin-2.csv .


Required packages:(pandas ,matplotlib,seaborn)


        import pandas as pd
        import matplotlib.pyplot as plt
        import seaborn as sb

Basic Analysis

Next step is to analyze the data using functions like head(), describe(), info(), etc.



Data Cleaning:


Removing columns which have no significance in the prediction process like ID or S.No .
    
#Removing S No and COWIN ID because they have no
# significance in the analysis

data.drop('S No',axis=1,inplace=True)
data.drop('Cowin Key',axis=1,inplace=True)

Data Visualization (Note that the graphs are plotted for values of a single day)


1.Which state has the highest number of vaccinated people?

Let us find out the state having the most vaccinated people. Here we have used seaborn and matplotlib to plot the data. For plotting, we need a set of values from the data to be arranged in an ascending or descending order. It can be achieved by using methods like groupby(), max(), sort_values(), etc. For easy visualization, we will consider the top 10 districts from the dataset.

    data=data.sort_values('Total_Individuals_Vaccinated',ascending=False)
Total Individuals vaccinated state wise

From the visualization above, it is clear that Maharashtra leads the country in terms of the number of doses of vaccination.

The states having the highest number of individuals vaccinated are Maharashtra ,UP,Rajasthan ,Gujarat and Karnataka.

2. Gender wise distribution of vaccinations

Number of males vaccinated are greater than the number of transgenders and females being vaccinated.(Maybe because there are more number of males infected with corona as compared to females).

Gender wise (district) Individuals Vaccinated

Mumbai and Pune are some of the cities having the highest number of individuals vaccinated followed by cities like Bengaluru, Chennai, Ahmedabad, Thane and Kolkata.

                            

3. What are the different vaccines used by different districts?


In almost all the districts, covishield is used more than covaxin except for few places like Khordha, Barpeta ,etc.

        Total Covishield & Covaxin
    administered per district




Total Sessions conducted and

Individuals Vaccinated


Linear Regression Algorithm :

After cleaning and visualizing the data we will move onto the selection, training, and testing of the 
algorithm- Linear regression .
We are predicting the total number of individuals vaccinated on any day given a set of values for several days as the training data.
LinearRegression
library: sklearn
    module : linear_model
    class :LinearRegression
Training(or learning part):
Fit the training data to the algorithm    
  Model_Name.fit(Training)
                                
Testing
(check the efficiency of the algorithm):
This step includes - 1.Predicting the outcomes of new data Model_Name.predict(Features-of-testing-set)   


2.Checking the accuracy of the
    algorithm(testing set) Model_Name.score(Arrays-of-testing-set)

You can find the code and the dataset here




        

Inferences and Conclusion



References:


Comments