Grouping non-sorted year column values to a single column using pandas
Solution 1:
Group your columns by virtual groups:
out = df.groupby(np.repeat(np.arange(len(df.columns) // 2), 2), axis=1) \
.apply(lambda x: x.set_index('year')).droplevel(0, axis=1).reset_index()
print(out)
# Output
year Jan Feb March
0 1990 25 30 45
1 1991 27 34 47
2 1992 23 38 49
3 1993 21 36 43
With your data, use this code before:
df = pd.read_csv('England_SE_and_Central_S.txt', skiprows=5, sep='\s+')
df.columns = df.columns.str.split('.').str[0]
out = df.groupby(np.repeat(np.arange(len(df.columns) // 2), 2), axis=1) \
.apply(lambda x: x.set_index('year')).droplevel(0, axis=1).reset_index()
Output:
>>> out
year jan feb mar apr may jun jul aug sep oct nov dec win spr sum aut ann
0 1884 3.5 2.6 2.5 2.5 6.3 9.0 11.6 11.5 10.5 4.8 2.1 2.1 --- 3.77 10.75 5.79 5.76
1 1885 -0.0 3.2 0.4 3.2 4.8 9.2 10.8 9.4 8.3 3.9 3.2 0.4 1.69 2.78 9.82 5.11 4.72
2 1886 -1.1 -1.8 0.2 3.4 6.1 8.6 11.0 11.1 9.9 7.8 3.1 -1.0 -0.80 3.23 10.27 6.91 4.81
3 1887 -1.6 -0.1 -0.1 1.2 5.3 9.2 11.4 10.0 7.6 2.7 1.7 -0.1 -0.92 2.13 10.23 3.98 3.96
4 1888 -0.3 -1.6 -0.1 2.0 5.3 8.9 10.1 10.1 8.5 2.8 5.6 2.0 -0.66 2.39 9.73 5.58 4.45
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
133 2017 0.2 3.6 5.5 4.2 8.7 11.9 13.2 11.8 9.8 9.1 3.3 2.0 2.07 6.14 12.29 7.44 6.96
134 2018 2.9 -0.5 2.1 6.7 8.0 11.1 13.7 12.5 9.3 6.9 5.3 4.4 1.55 5.60 12.47 7.16 6.92
135 2019 0.7 2.2 4.8 4.4 6.4 10.5 12.8 12.4 10.2 7.5 3.6 3.1 2.43 5.23 11.95 7.12 6.59
136 2020 3.8 3.8 3.2 5.1 7.1 10.9 11.7 14.1 9.9 7.8 5.9 2.8 3.55 5.12 12.24 7.87 7.18
137 2021 0.7 2.3 3.1 1.2 6.0 11.5 13.3 12.1 11.6 8.8 3.9 4.6 1.92 3.44 12.28 8.10 6.61
[138 rows x 18 columns]
Solution 2:
The code below fetches the data directly from the URL you provided and should generate the output you expect.
import pandas as pd
df = pd.read_csv(
"https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Tmin/ranked/England_SE_and_Central_S.txt",
sep="\s+",
skiprows=5,
storage_options={"User-Agent": "Mozilla/5.0"}
)
months = []
for i in range(df.shape[1] // 2):
tmp = pd.DataFrame(df.iloc[:, 2*i]).set_index(df.iloc[:, 2*i + 1])
months.append(tmp)
final_df = pd.concat(months, axis=1)
Output:
>>> final_df
jan feb mar apr may jun jul aug sep oct nov dec win spr sum aut ann
1884 3.5 2.6 2.5 2.5 6.3 9.0 11.6 11.5 10.5 4.8 2.1 2.1 --- 3.77 10.75 5.79 5.76
1885 -0.0 3.2 0.4 3.2 4.8 9.2 10.8 9.4 8.3 3.9 3.2 0.4 1.69 2.78 9.82 5.11 4.72
1886 -1.1 -1.8 0.2 3.4 6.1 8.6 11.0 11.1 9.9 7.8 3.1 -1.0 -0.80 3.23 10.27 6.91 4.81
1887 -1.6 -0.1 -0.1 1.2 5.3 9.2 11.4 10.0 7.6 2.7 1.7 -0.1 -0.92 2.13 10.23 3.98 3.96
1888 -0.3 -1.6 -0.1 2.0 5.3 8.9 10.1 10.1 8.5 2.8 5.6 2.0 -0.66 2.39 9.73 5.58 4.45
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2017 0.2 3.6 5.5 4.2 8.7 11.9 13.2 11.8 9.8 9.1 3.3 2.0 2.07 6.14 12.29 7.44 6.96
2018 2.9 -0.5 2.1 6.7 8.0 11.1 13.7 12.5 9.3 6.9 5.3 4.4 1.55 5.60 12.47 7.16 6.92
2019 0.7 2.2 4.8 4.4 6.4 10.5 12.8 12.4 10.2 7.5 3.6 3.1 2.43 5.23 11.95 7.12 6.59
2020 3.8 3.8 3.2 5.1 7.1 10.9 11.7 14.1 9.9 7.8 5.9 2.8 3.55 5.12 12.24 7.87 7.18
2021 0.7 2.3 3.1 1.2 6.0 11.5 13.3 12.1 11.6 8.8 3.9 4.6 1.92 3.44 12.28 8.10 6.61