Ruby on Rails - Import Data from a CSV file
I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.
This is my table:
create_table "mouldings", :force => true do |t|
t.string "suppliers_code"
t.datetime "created_at"
t.datetime "updated_at"
t.string "name"
t.integer "supplier_id"
t.decimal "length", :precision => 3, :scale => 2
t.decimal "cost", :precision => 4, :scale => 2
t.integer "width"
t.integer "depth"
end
Can you give me some code to show me the best way to do this, thanks.
Solution 1:
require 'csv'
csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
Moulding.create!(row.to_hash)
end
Solution 2:
Simpler version of yfeldblum's answer, that is simpler and works well also with large files:
require 'csv'
CSV.foreach(filename, headers: true) do |row|
Moulding.create!(row.to_hash)
end
No need for with_indifferent_access
or symbolize_keys
, and no need to read in the file to a string first.
It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.
Solution 3:
The smarter_csv
gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.
require 'smarter_csv'
options = {}
SmarterCSV.process('input_file.csv', options) do |chunk|
chunk.each do |data_hash|
Moulding.create!( data_hash )
end
end
You can use the option chunk_size
to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.
See also: https://github.com/tilo/smarter_csv
Solution 4:
You might try Upsert
:
require 'upsert' # add this to your Gemfile
require 'csv'
u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
setter = row.to_hash
u.row selector, setter
end
If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name
. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.