Where in linux file system can i see files of Hadoop HDFS?

Solution 1:

You can use hdfs fsck utility to locate the name of the block and then you can manually find it in the local filesystem:

$ echo "Hello world" >> test.txt
$ hdfs dfs -put test.txt /tmp/
$ hdfs fsck /tmp/test.txt -files -blocks
/tmp/test.txt 12 bytes, 1 block(s):  OK
    0. BP-1186293916-10.25.5.169-1427746975858:blk_1075191146_1451047 len=12 repl=1

Note the blk_.... string. Use that to locate the file:

$ find /hadoop/hdfs/data/current/BP-1186293916-10.25.5.169-1427746975858/current/finalized -name 'blk_1075191146*'
/hadoop/hdfs/data/current/BP-1186293916-10.25.5.169-1427746975858/current/finalized/subdir22/subdir29/blk_1075191146_1451047.meta
/hadoop/hdfs/data/current/BP-1186293916-10.25.5.169-1427746975858/current/finalized/subdir22/subdir29/blk_1075191146

$ cat /hadoop/hdfs/data/current/BP-1186293916-10.25.5.169-1427746975858/current/finalized/subdir22/subdir29/blk_1075191146
Hello world

You can see full example with explanation here

Solution 2:

You cannot directly browse HDFS from terminal using cat or similar commands. HDFS is a logical file system and does not directly map to Unix file system. You should have an HDFS client and your Hadoop cluster should be running. When you browse HDFS, you are getting your directory structure from namenode and actual data from datanodes.

Although you cannot browse, data is there stored by datanode daemon. Its path is specified by dfs.data.dir property in hdfs-site.xml.

Directory structure is stored by namenode daemon and its path is specified by dfs.name.dir property in hdfs-site.xml

Solution 3:

Hadoop stores it data locally in forms of block on each datanode and that property is configurable in hdfs-site.xml file under dfs.data.dir property

In most of the case it is

$HADOOP_HOME/data/dfs/data/hadoop-${user.name}/current

Solution 4:

In fact you can cat the contents of your file using;

hdfs dfs -cat /user/test/somefile.txt

In Hadoop Namenode holds all the information about files like filename, metadata, directory, permission, the blocks which form the file, and block locations. In case of namenode failure you will lose the files since you dont know which blocks form which file although you have all the content on datanodes.

Since files are stored as blocks in Hadoop, if you know the blockid and datanodes of files you can see the content of them. Here we are assuming the files are text files.

Finally HDFS supports mapping an HDFS directory to a local NFS share. This way you can access hdfs without using any hdfs specific commands.

Where in linux file system can i see files of Hadoop HDFS?

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Related

Recent Posts