Data resiliency in Ceph or Gluster
Solution 1:
The purpose of RAID isn't to protect from data loss whether due to hardware failure, accidental deletion, or anything else. It's to increase performance (RAID0, RAID0+1) and/or prevent downtime (RAID1, RAID5, RAID6). If you want to prevent data loss, then you need a backup solution. Preferably, one that's kept onsite and another that's offsite.
You requested that cloud and replication answers not be provided but they are the only way to prevent data loss. RAID1 and RAID5 will protect if a single disk fails and RAID6 will allow two disks to fail but none of them will protect against corrupt data, accidental deletion, or malicious activity. If this data is important, then you're going to need those things that you asked not be provided.
Solution 2:
The general gist of this question is "how do multi-node storage clusters relate to concepts such as RAID?"
The answer is that they somewhat relate. A RAID array is designed to replicate and/or distribute data across failure domains. In the case of RAID, these failure domains are separate disks. The loss of a disk in an array that targets redundancy does not represent the loss of data (durability) or access to that data (availability).
Multi-node storage clusters can be thought of in a very similar way, with the option of addressing entire nodes or groups of nodes as failure domains rather than just disks or groups of disks in a single node. Data can be distributed among nodes without care for replication, or data can be replicated between two or potentially more nodes (or groups of nodes).
As a subject, storage clustering is MUCH more complicated than concepts like RAID, and the blurb I wrote above is close to the end of their similarities. They are not mutually exclusive technologies, and may be mixed - one can choose to use RAID within storage cluster nodes, or even establish RAID arrays of many clustered storage targets. Again, it's complicated - so complicated in fact, that it's very easy to make horrible clusters that cause more problems than they solve.
I would recommend ensuring anyone understand a given storage clustering technology very well before attempting to use it in any serious capacity. Thankfully, Ceph, Gluster, DRBD, and related technologies are all open source and available to study just as openly.
Solution 3:
Some RAID configurations prevents data loss due to hardware issues — one drive may fail, while another has a copy of the data still available. Other RAID configurations instead increase performance.
Ceph replicates the data at the object level (the RADOS layer), storing multiple copies of the data on separate drives located on different hosts (most commonly three copies are used), or, in alternative, data is split into erasure coded chunks - this would be similar to RAID's parity scheme in your mental model.
This is data resiliency, and is measured in how many hosts or how many drives a cluster can lose while still providing a guarantee no data is lost. In replica-3 storage pools, you can lose two drives simultaneously and lose no data. If events give the cluster time between the two drive failures in my example, it will self-heal and copy data affected by the first failure, returning to replica-3 redundancy.
Let's look at your query for three hosts with one hard disk each. In that configuration, a Ceph replica-3 pool could lose two hosts and still make the data available, the cluster would continue working. After the first failure, the cluster would continue operating and warn the administrator that resiliency has decreased from two failures to one. After a second failure, with only a single copy of the data remaining, the cluster would continue serving data, but switch to read-only mode and force the admin to address the loss of resiliency. EC resiliency depends on the coding scheme chosen, but in your example one would simply not use an erasure-coded pool with just three hosts.
Generally, software-defined storage like Ceph makes sense only at a certain data scale. Traditionally, I have recommended half a petabyte or 10 hosts with 12 or 24 drives each as a sensible threshold. Recent UX improvements in self-managing and automation make 5 hosts a reasonable minimum threshold.
Neither Ceph nor RAID replication is a solution for Backup — that is a data recovery scenario, not a data resiliency one. But while Ceph object-based replication scales almost indefinitely, RAID's drive-based replication cannot scale very far.