Turkish Vaccination Data

This is a little project I did to learn a bit of data science and Juptyer notebooks. It takes a CSV file of vaccination data from the Turkish government and outputs a few graphs. The data is from Kaggle The data came in the format of an SQL database backup with +3 million rows, so I had to restore it, get numbers from, then convert those numbers to a CSV file (download)

I did this by using the Pandas library in python, which required the file to be imported as a DataFrame data = pd.read_csv(path), then I calculated the total doses, percentages and other data by using commands like data["diffOfDose"] = data["1DOSE"] - data["2DOSE"] Then I plotted the data in a Juptyer notebook, using the plotly library, which, after some headaches with other libraries, I found to be the best. I wanted to create a chotopleth map of the percentages for each county, which required the geojson data of turkey. After some searching, I found the data in this GitHub repo. The code I use for generating the map is:


              def makeMap(value, colorrange, labelName):
                geojsonlink = "https://raw.githubusercontent.com/alpers/Turkey-Maps-GeoJSON/master/tr-cities.json"
          
                with urlopen(geojsonlink) as response:
                  cities = json.load(response)
        
                  fig = px.choropleth(data,
                                      geojson=cities,
                                      color=value,
                                      locations="CITY",   
                                      featureidkey="properties.name",
                                      range_color=colorrange,
                                      labels={"CITY" :"City", value: labelName},
                                      color_continuous_scale="tempo") #matter algae
                  fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
                  fig.update_geos(fitbounds="locations", visible=False)
                  fig.show()
            

This code generates the map you see in the title. I also made a grouped bar chart of the doses. The graph below shows the first dose of the top 5 cities. I was going to make it top 5, then a nice seperator, then bottom 5 for the percentages in one chart, but that proved too difficult, so I instead made 2 charts:

The graphs are also generated using the plotly library, with the following code:


                def top5Bar(mainNum, otherNum, colours):
                  name1 = "percOfDose" + str(mainNum)
                  name2 = "percOfDose" + str(otherNum)

                  mainDose = data.nlargest(5, name1)[["CITY", name1]]
                  otherDose = data.loc[data['CITY'].isin(mainDose["CITY"])].copy()

                  mainDose[name1] = mainDose[name1].round(decimals=1)
                  otherDose[name2] = otherDose[name2].round(decimals=1)

                  fig = go.Figure(data=[
                      go.Bar(name="% of Dose " + str(mainNum), x = mainDose["CITY"],y=mainDose[name1],
                            marker=dict(color=colours[0],opacity=0.8)),
                      go.Bar(name="% of Dose " + str(otherNum), x = otherDose["CITY"],y=otherDose[name2],
                            marker=dict(color=colours[1],opacity=0.8))
                  ])
                  fig.update_layout(barmode="group", title_text="Top 5 cities sorted by the highest percentage of "+
                                                                str(mainNum) + findSuffix(mainNum) + " Dose (not " +
                                                                str(otherNum) + findSuffix(otherNum) + ")")
                  fig.show()
              
The code for the bottom 5 is similar.