Splitting and Expanding a Pandas Column
Cleaning up data is a big part of the analytics process. Luckily there are some very helpful tools in the Pandas library to help us
We load in a CSV full of match results via it’s URL (see a previous post) but then find that the score is stored in one column named ‘FT’. Ideally, we want the home goals and away goals in separate columns.
This can be achieved in one line of code:
dat['FT'].str.split('-',expand=True)
But let’s explain it in parts:
1. Take the dat[‘FT’] column
2. It’s a series so use .str to let us use str functions
3. Use split(), to split the values like ‘1-2’ into ‘1’ and ‘2’ using the delimiter ‘-’
4. Use expand = True to return two additional columns
To add the new columns to our initial data frame we need to name the new columns at the beginning of our line;
dat[['home_goals','away_goals']] = dat['FT'].str.split('-',expand=True)
We now have more useful data!
Parts of the Mini-Series
Full Code so Far
import pandas as pd
dat = pd.read_csv("https://raw.githubusercontent.com/footballcsv/england/master/1990s/1992-93/eng.1.csv")
dat[['home_goals', 'away_goals']] = dat['FT'].str.split('-', expand=True)