import math
from pathlib import Path
import folium
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster
"display.max_columns", None)
pd.set_option(
"figure.facecolor"] = (1, 1, 1, 0) # RGBA tuple with alpha=0
plt.rcParams["axes.facecolor"] = (1, 1, 1, 0) # RGBA tuple with alpha=0
plt.rcParams[
= Path.cwd().parent.parent PROJECT_ROOT
Interactive Maps with Folium
We begin by creating a relatively simple map with folium.Map()
.
= folium.Map(location=[42.32, -71.0589], tiles="openstreetmap", zoom_start=10)
m_1
m_1
Several arguments customize the appearance of the map:
location
sets the initial center of the map. We use the latitude (42.32° N) and longitude (-71.0589° E) of the city of Boston.tiles
changes the styling of the map; in this case, we choose the OpenStreetMap style. If you’re curious, you can find the other options listed here.zoom_start
sets the initial level of zoom of the map, where higher values zoom in closer to the map.
Boston Crime Data
= pd.read_csv(
crimes f"{PROJECT_ROOT}/data/kaggle_geospatial/crimes-in-boston/crimes-in-boston/crime.csv", encoding="latin-1"
)
# Drop rows with missing locations
=["Lat", "Long", "DISTRICT"], inplace=True)
crimes.dropna(subset
# Focus on major crimes in 2018
= crimes[
crimes
crimes.OFFENSE_CODE_GROUP.isin(
["Larceny",
"Auto Theft",
"Robbery",
"Larceny From Motor Vehicle",
"Residential Burglary",
"Simple Assault",
"Harassment",
"Ballistics",
"Aggravated Assault",
"Other Burglary",
"Arson",
"Commercial Burglary",
"HOME INVASION",
"Homicide",
"Criminal Harassment",
"Manslaughter",
]
)
]= crimes[crimes.YEAR >= 2018]
crimes
crimes.head()
INCIDENT_NUMBER | OFFENSE_CODE | OFFENSE_CODE_GROUP | OFFENSE_DESCRIPTION | DISTRICT | REPORTING_AREA | SHOOTING | OCCURRED_ON_DATE | YEAR | MONTH | DAY_OF_WEEK | HOUR | UCR_PART | STREET | Lat | Long | Location | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | I182070945 | 619 | Larceny | LARCENY ALL OTHERS | D14 | 808 | NaN | 2018-09-02 13:00:00 | 2018 | 9 | Sunday | 13 | Part One | LINCOLN ST | 42.357791 | -71.139371 | (42.35779134, -71.13937053) |
6 | I182070933 | 724 | Auto Theft | AUTO THEFT | B2 | 330 | NaN | 2018-09-03 21:25:00 | 2018 | 9 | Monday | 21 | Part One | NORMANDY ST | 42.306072 | -71.082733 | (42.30607218, -71.08273260) |
8 | I182070931 | 301 | Robbery | ROBBERY - STREET | C6 | 177 | NaN | 2018-09-03 20:48:00 | 2018 | 9 | Monday | 20 | Part One | MASSACHUSETTS AVE | 42.331521 | -71.070853 | (42.33152148, -71.07085307) |
19 | I182070915 | 614 | Larceny From Motor Vehicle | LARCENY THEFT FROM MV - NON-ACCESSORY | B2 | 181 | NaN | 2018-09-02 18:00:00 | 2018 | 9 | Sunday | 18 | Part One | SHIRLEY ST | 42.325695 | -71.068168 | (42.32569490, -71.06816778) |
24 | I182070908 | 522 | Residential Burglary | BURGLARY - RESIDENTIAL - NO FORCE | B2 | 911 | NaN | 2018-09-03 18:38:00 | 2018 | 9 | Monday | 18 | Part One | ANNUNCIATION RD | 42.335062 | -71.093168 | (42.33506218, -71.09316781) |
Plotting points
= crimes[((crimes.OFFENSE_CODE_GROUP == "Robbery") & (crimes.HOUR.isin(range(9, 18))))] daytime_robberies
folium.Marker
= folium.Map(location=[42.32, -71.0589], tiles="cartodbpositron", zoom_start=12)
m_2
for idx, row in daytime_robberies.iterrows():
"Lat"], row["Long"]]).add_to(m_2)
Marker([row[
m_2
folium.plugins.MarkerCluster
If we have a lot of markers to add, folium.plugins.MarkerCluster() can help to declutter the map. Each marker is added to a MarkerCluster object.
= folium.Map(location=[42.32, -71.0589], tiles="cartodbpositron", zoom_start=12)
m_3
= MarkerCluster()
mc for idx, row in daytime_robberies.iterrows():
if not math.isnan(row["Long"]) and not math.isnan(row["Lat"]):
"Lat"], row["Long"]]))
mc.add_child(Marker([row[
m_3.add_child(mc)
m_3
Bubble Maps
A bubble map uses circles instead of markers. By varying the size and color of each circle, we can also show the relationship between location and two other variables.
We create a bubble map by using folium.Circle() to iteratively add circles. In the code cell below, robberies that occurred in hours 9-12 are plotted in green, whereas robberies from hours 13-17 are plotted in red.
= folium.Map(location=[42.32, -71.0589], tiles="cartodbpositron", zoom_start=12)
m_4
def color_producer(val):
if val <= 12:
return "forestgreen"
else:
return "darkred"
for i in range(0, len(daytime_robberies)):
Circle(=[daytime_robberies.iloc[i]["Lat"], daytime_robberies.iloc[i]["Long"]],
location=20,
radius=color_producer(daytime_robberies.iloc[i]["HOUR"]),
color
).add_to(m_4)
m_4
Note that folium.Circle()
takes several arguments:
location
is a list containing the center of the circle, in latitude and longitude.radius
sets the radius of the circle.- Note that in a traditional bubble map, the radius of each circle is allowed to vary. We can implement this by defining a function similar to the
color_producer()
function that is used to vary the color of each circle.
- Note that in a traditional bubble map, the radius of each circle is allowed to vary. We can implement this by defining a function similar to the
color
sets the color of each circle.- The
color_producer()
function is used to visualize the effect of the hour on robbery location.
- The
Heatmaps
To create a heatmap, we use folium.plugins.HeatMap()
. This shows the density of crime in different areas of the city, where red areas have relatively more criminal incidents.
As we’d expect for a big city, most of the crime happens near the center.
= folium.Map(location=[42.32, -71.0589], tiles="cartodbpositron", zoom_start=11)
m_5
=crimes[["Lat", "Long"]], radius=10).add_to(m_5)
HeatMap(data
m_5
As you can see in the code cell above, folium.plugins.HeatMap()
takes a couple of arguments:
data
is a DataFrame containing the locations that we’d like to plot.radius
controls the smoothness of the heatmap. Higher values make the heatmap look smoother (i.e., with fewer gaps).
Choropleth Maps
To understand how crime varies by police district, we’ll create a choropleth map.
As a first step, we create a GeoDataFrame where each district is assigned a different row, and the geometry
column contains the geographical boundaries.
= gpd.read_file(
districts_full f"{PROJECT_ROOT}/data/kaggle_geospatial/Police_Districts/Police_Districts/Police_Districts.shp"
)= districts_full[["DISTRICT", "geometry"]].set_index("DISTRICT")
districts districts.head()
geometry | |
---|---|
DISTRICT | |
A15 | MULTIPOLYGON (((-71.07416 42.39051, -71.07415 ... |
A7 | MULTIPOLYGON (((-70.99644 42.39557, -70.99644 ... |
A1 | POLYGON ((-71.05200 42.36884, -71.05169 42.368... |
C6 | POLYGON ((-71.04406 42.35403, -71.04412 42.353... |
D4 | POLYGON ((-71.07416 42.35724, -71.07359 42.357... |
We also create a Pandas Series that shows the number of crimes in each district.
= crimes.DISTRICT.value_counts()
plot_dict plot_dict.head()
DISTRICT
D4 2885
B2 2231
A1 2130
C11 1899
B3 1421
Name: count, dtype: int64
It’s very important that plot_dict has the same index as districts - this is how the code knows how to match the geographical boundaries with appropriate colors.
Using the folium.Choropleth()
class, we can create a choropleth map.
= folium.Map(location=[42.32, -71.0589], tiles="cartodbpositron", zoom_start=11)
m_6
Choropleth(=districts.__geo_interface__,
geo_data=plot_dict,
data="feature.id",
key_on="YlGnBu",
fill_color="Major criminal incidents (Jan-Aug 2018)",
legend_name
).add_to(m_6)
m_6
Note that folium.Choropleth()
takes several arguments:
geo_data
is a GeoJSON FeatureCollection containing the boundaries of each geographical area.- In the code above, we convert the districts GeoDataFrame to a GeoJSON FeatureCollection with the
__geo_interface__
attribute.
- In the code above, we convert the districts GeoDataFrame to a GeoJSON FeatureCollection with the
data
is a Pandas Series containing the values that will be used to color-code each geographical area.key_on
will always be set tofeature.id
.- This refers to the fact that the GeoDataFrame used for
geo_data
and the Pandas Series provided in data have the same index. To understand the details, we’d have to look more closely at the structure of a GeoJSON Feature Collection (where the value corresponding to the “features” key is a list, wherein each entry is a dictionary containing an “id” key).
- This refers to the fact that the GeoDataFrame used for
fill_color
sets the color scale.legend_name
labels the legend in the top right corner of the map.