How to merge data vertically so there are multiple lines instead of multiple columns
Solution 1:
Your first idea is doable (and seems to be the best approach).
First we merge and select the 'new' columns that we want. We also set 'Lineage Step to 0 as we will sort later:
df_up = pd.merge(sample_data, upstream_data, left_on = ['From Country', 'From Town','FromStreet'], right_on = ['ToCountry', 'ToTown',
'ToStreet'], how = 'left', suffixes= ('_current', ''))[['AttributeName','Lineage Step','From Country', 'From Town','FromStreet','ToCountry', 'ToTown',
'ToStreet']].dropna()
df_up['Lineage Step'] = 0
df_up
df_up
looks like this:
AttributeName Lineage Step From Country From Town FromStreet ToCountry ToTown ToStreet
-- --------------- -------------- -------------- ----------- ------------ ----------- --------- ----------
0 John 0 France Paris French St Spain Madrid Spanish St
4 Sally 0 Germany Berlin Gerrman St Scotland Edinburgh London St
now we append this dataframe to sample_data and sort
df_jn = sample_data.append(df_up, ignore_index = True).sort_values(['AttributeName','Lineage Step'])
df_jn['Lineage Step'] +=1
df_jn
looks like this:
AttributeName Lineage Step From Country From Town FromStreet ToCountry ToTown ToStreet
-- --------------- -------------- -------------- ----------- ------------ ----------- --------- ----------
6 John 1 France Paris French St Spain Madrid Spanish St
0 John 2 Spain Madrid Spanish St Scotland Edinburgh Lower St
1 John 3 Scotland Edinburgh Main St England London Middle St
2 John 4 England London Lower St England London Upper St
3 John 5 England London Middle St England London Upper St
7 Sally 1 Germany Berlin Gerrman St Scotland Edinburgh London St
4 Sally 2 Scotland Edinburgh London St England Liverpool new St
5 Sally 3 England Manchester Scotland St England London Old St