Read specific columns with pandas or other python module

I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner).

Let's say I want 2 columns:

59 which in the header is star_name
60 which in the header is ra.

However, for some reason the authors of the webpage sometimes decide to move the columns around.

In the end I want something like this, keeping in mind that values can be missing.

data = #read data in a clever way
names = data['star_name']
ras = data['ra']

This will prevent my program to malfunction when the columns are changed again in the future, if they keep the name correct.

Until now I have tried various ways using the csv module and resently the pandas module. Both without any luck.

EDIT (added two lines + the header of my datafile. Sorry, but it's extremely long.)

# name, mass, mass_error_min, mass_error_max, radius, radius_error_min, radius_error_max, orbital_period, orbital_period_err_min, orbital_period_err_max, semi_major_axis, semi_major_axis_error_min, semi_major_axis_error_max, eccentricity, eccentricity_error_min, eccentricity_error_max, angular_distance, inclination, inclination_error_min, inclination_error_max, tzero_tr, tzero_tr_error_min, tzero_tr_error_max, tzero_tr_sec, tzero_tr_sec_error_min, tzero_tr_sec_error_max, lambda_angle, lambda_angle_error_min, lambda_angle_error_max, impact_parameter, impact_parameter_error_min, impact_parameter_error_max, tzero_vr, tzero_vr_error_min, tzero_vr_error_max, K, K_error_min, K_error_max, temp_calculated, temp_measured, hot_point_lon, albedo, albedo_error_min, albedo_error_max, log_g, publication_status, discovered, updated, omega, omega_error_min, omega_error_max, tperi, tperi_error_min, tperi_error_max, detection_type, mass_detection_type, radius_detection_type, alternate_names, molecules, star_name, ra, dec, mag_v, mag_i, mag_j, mag_h, mag_k, star_distance, star_metallicity, star_mass, star_radius, star_sp_type, star_age, star_teff, star_detected_disc, star_magnetic_field
11 Com b,19.4,1.5,1.5,,,,326.03,0.32,0.32,1.29,0.05,0.05,0.231,0.005,0.005,0.011664,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2008,2011-12-23,94.8,1.5,1.5,2452899.6,1.6,1.6,Radial Velocity,,,,,11 Com,185.1791667,17.7927778,4.74,,,,,110.6,-0.35,2.7,19.0,G8 III,,4742.0,,
11 UMi b,10.5,2.47,2.47,,,,516.22,3.25,3.25,1.54,0.07,0.07,0.08,0.03,0.03,0.012887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2009,2009-08-13,117.63,21.06,21.06,2452861.05,2.06,2.06,Radial Velocity,,,,,11 UMi,229.275,71.8238889,5.02,,,,,119.5,0.04,1.8,24.08,K4III,1.56,4340.0,,

An easy way to do this is using the pandas library like this.

import pandas as pd
fields = ['star_name', 'ra']

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print df.keys()
# See content in 'star_name'
print df.star_name

The problem here was the skipinitialspace which remove the spaces in the header. So ' star_name' becomes 'star_name'

According to the latest pandas documentation you can read a csv file selecting only the columns which you want to read.

import pandas as pd

df = pd.read_csv('some_data.csv', usecols = ['col1','col2'], low_memory = True)

Here we use usecols which reads only selected columns in a dataframe.

We are using low_memory so that we Internally process the file in chunks.

Above answers are in python2. So for python 3 users I am giving this answer. You can use the bellow code:

import pandas as pd
fields = ['star_name', 'ra']

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print(df.keys())
# See content in 'star_name'
print(df.star_name)

Got a solution to above problem in a different way where in although i would read entire csv file, but would tweek the display part to show only the content which is desired.

import pandas as pd

df = pd.read_csv('data.csv', skipinitialspace=True)
print df[['star_name', 'ra']]

This one could help in some of the scenario's in learning basics and filtering data on the basis of columns in dataframe.

Read specific columns with pandas or other python module

Related

Recent Posts