split one row into multiple record python

I have an input dataframe as follows

Class   Duration    StudentID   Age Startdate   Start Time  Enddate     End Time    TimeDifference
5th     XX          20002       5   04/12/2021  17:00:00    04/14/2021  20:00:00    3000

And I would like to split the same into three different rows based on the start and end date as follows.

Class   Duration    StudentID   Age     Startdate   Start Time  Enddate     End Time    TimeDifference
5th     XX          20002       5       04/12/2021  17:00:00    04/12/2021  23:59:59    360
5th     XX          20002       5       04/13/2021  0:00:00     04/13/2021  23:59:59    1440
5th     XX          20002       5       04/14/2021  0:00:00     04/14/2021  20:00:00    1200

I am trying with python. Please help.

Input Output is here


I get a slightly different value for 'Time Difference', but this is an approach you can tweak and and use.

Step 1: You can start by using melt() with your id_vars being all your columns except your 'Startdate' and 'Enddate'.

Step 2: Then you can set your index to be your StartEndDate column, created after melting your dataframe.

Step 3: Then using reindex() you can add the new row with your missing dates.

Lastly what's left is to calculate the time difference column and rearrange your dataframe to get to your final output.

I assume your dataframe is called df:

# Step 1
ids = [c for c in df.columns if c not in ['Startdate','Enddate']]
new = df.melt(id_vars=ids,value_name = 'StartEndDate').drop('variable',axis=1)
new.loc[new.StartEndDate.isin(df['Startdate'].tolist()),'Start Time'] = "00:00"

print(new)
  Class Duration  StudentID  Age Start Time End Time  TimeDifference  \
0   5th       XX      20002    5      00:00    20:00            3000   
1   5th       XX      20002    5      17:00    20:00            3000   

  StartEndDate  
0   04/12/2021  
1   04/14/2021  

# Step 2
new['StartEndDate'] = pd.to_datetime(new['StartEndDate']).dt.date
new.set_index(pd.DatetimeIndex(new.StartEndDate),inplace=True)

# Step 3
final = new.reindex(pd.date_range(new.index.min(),new.index.max()), method='ffill').reset_index()\
    .rename({'index':'Startdate'},axis=1).drop('StartEndDate',axis=1)
final['Enddate'] = final['Startdate']

final['TimeDifference'] = (final['End Time'].str[:2].astype(int) - final['Start Time'].str[:2].astype(int))*60

Prints:

final = final[['Class','Duration','StudentID','Age','Startdate','Start Time','Enddate','End Time','TimeDifference']]

   Class Duration  StudentID  Age  Startdate Start Time    Enddate End Time  \
0   5th       XX      20002    5 2021-04-12      00:00 2021-04-12    20:00   
1   5th       XX      20002    5 2021-04-13      00:00 2021-04-13    20:00   
2   5th       XX      20002    5 2021-04-14      17:00 2021-04-14    20:00   

   TimeDifference  
0            1200  
1            1200  
2             180  

I think some information is missing from your question, so I would suggest running line by line and do the necessary adjustments to suit your task.