Hadoop/Hive : Loading data from .csv on a local machine

As this is coming from a newbie...

I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server?

How do I load .csv data into Hive then? What does it have to do with Hadoop and which mode I should run that one?

What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier?


Let me work you through the following simple steps:

Steps:

First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.

hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';

Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.

hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;

Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded

hive> SELECT * FROM Staff;

Thanks.


if you have a hive setup you can put the local dataset directly using Hive load command in hdfs/s3.

You will need to use "Local" keyword when writing your load command.

Syntax for hiveload command

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

Refer below link for more detailed information. https://cwiki.apache.org/confluence/display/Hive/LanguageManual%20DML#LanguageManualDML-Loadingfilesintotables