How Can I Take The First Four Elements Of A Column In Each Row and Append it To A Newly Created Column Using Python Pandas?
I'm trying a project to get the average stock price of each year but currently, I'm stuck with a problem. I have a CSV file with two columns: Date(YYYY-MM-DD) and High. Basically, I want to create a third column called 'Year' and for every row, I want to take just the year from the date column and add it to the 'Year' column.
Here is my initial table:
Here is my desired output table:
Note: I just know how to add a column but I am not sure how to index the date of each row and append it to the 'Year' column for each row. So for example, for the row with the date '1980-12-12', I want the year column to have just '1980', for the row with the date '1980-12-18', I want the year column to have just '1980', etc.
Here is my code currently:
import pandas as pd
appleStock = pd.read_csv("Apple_stock_history.csv")
for i in appleStock["Date"]:
appleStock["Year"] = i[0:4]
print(appleStock.head())
My output for the code is:
I figured out that my code is pretty inconsistent; basically there is are more rows in the original CSV file... The last row has a date of '2022-01-03' (which probably explains why I am getting that in my year column every time. In line 4 of my code, when I change it to appleStock["Year"] = i[0:]
, it gives me the entire date (2022-01-03).
If your df['date']
is str format like this :
df = pd.DataFrame({
'Date' : ['1980-12-12','1981-12-12'],
'High' : [0.1, 0.2]
})
print(df['Date'][0],type(df['Date'][0]))
1980-12-12 <class 'str'>
You can try this :
df['year'] = df['Date'].str[0:4]