Hadoop: ...be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

This error is caused by the block replication system of HDFS since it could not manage to make any copies of a specific block within the focused file. Common reasons of that:

Only a NameNode instance is running and it's not in safe-mode
There is no DataNode instances up and running, or some are dead. (Check the servers)
Namenode and Datanode instances are both running, but they cannot communicate with each other, which means There is connectivity issue between DataNode and NameNode instances.
Running DataNode instances are not able to talk to the server because of some networking of hadoop-based issues (check logs that include datanode info)
There is no hard disk space specified in configured data directories for DataNode instances or DataNode instances have run out of space. (check dfs.data.dir // delete old files if any)
Specified reserved spaces for DataNode instances in dfs.datanode.du.reserved is more than the free space which makes DataNode instances to understand there is no enough free space.
There is no enough threads for DataNode instances (check datanode logs and dfs.datanode.handler.count value)
Make sure dfs.data.transfer.protection is not equal to “authentication” and dfs.encrypt.data.transfer is equal to true.

Also please:

Verify the status of NameNode and DataNode services and check the related logs
Verify if core-site.xml has correct fs.defaultFS value and hdfs-site.xml has a valid value.
Verify hdfs-site.xml has dfs.namenode.http-address.. for all NameNode instances specified in case of PHD HA configuration.
Verify if the permissions on the directories are correct

Ref: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

Ref: https://support.pivotal.io/hc/en-us/articles/201846688-HDFS-reports-Configured-Capacity-0-0-B-for-datanode

Also, please check: Writing to HDFS from Java, getting "could only be replicated to 0 nodes instead of minReplication"

Another reason could be that your Datanode machine hasn't exposed the port(50010 by default). In my case, I was trying to write a file from Machine1 to HDFS running on a Docker container C1 which was hosted on Machine2. For the host machine to forward the requests to the services running on the container, the port forwarding should be taken care of. I could resolve the issue after forwarding the port 50010 from host machine to guest machine.

Check if the jps command on the computers which run the datanodes show that the datanodes are running. If they are running, then it means that they could not connect with the namenode and hence the namenode thinks there are no datanodes in the hadoop system.

In such a case, after running start-dfs.sh, run netstat -ntlp in the master node. 9000 is the port number most tutorials tells you to specify in core-site.xml. So if you see a line like this in the output of netstat

tcp        0      0 120.0.1.1:9000        0.0.0.0:*               LISTEN       4209/java

then you have a problem with the host alias. I had the same problem, so I'll state how it was resolved.

This is the contents of my core-site.xml

<configuration>
   <property>
       <name>fs.default.name</name>
       <value>hdfs://vm-sm:9000</value>
   </property>
</configuration>

So the vm-sm alias in the master computer maps to the 127.0.1.1. This is because of the setup of my /etc/hosts file.

127.0.0.1       localhost
127.0.1.1       vm-sm
192.168.1.1     vm-sm
192.168.1.2     vm-sw1
192.168.1.3     vm-sw2

Looks like the core-site.xml of the master system seemed to have mapped on the the 120.0.1.1:9000 while that of the worker nodes are trying to connect through 192.168.1.1:9000.

So I had to change the alias of the master node for the hadoop system (just removed the hyphen) in the /etc/hosts file

127.0.0.1       localhost
127.0.1.1       vm-sm
192.168.1.1     vmsm
192.168.1.2     vm-sw1
192.168.1.3     vm-sw2

and reflected the change in the core-site.xml, mapred-site.xml, and slave files (wherever the old alias of the master occurred).

After deleting the old hdfs files from the hadoop location as well as the tmp folder and restarting all nodes, the issue was solved.

Now, netstat -ntlp after starting DFS returns

tcp        0      0 192.168.1.1:9000        0.0.0.0:*               LISTEN ...
...

I had the same error, re-starting hdfs services solved this issue. ie re-started NameNode and DataNode services.

Hadoop: ...be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

Related

Recent Posts