Prometheus Alert Rule for Absent Discovered Target
Solution 1:
Or do I need to hard code rules for each service/node/pod explicitly, even though it was auto discovered?
Yes, you need a rule for every individual thing to you to alert on being missing as Prometheus doesn't know about their labels from anywhere - service discovery is not returning it.
The usual alert is absent(up{job="kubernetes-pods"})
Solution 2:
We've been solving something similar. Our setup: when some service starts somewhere, some metrics appear with a non-zero value. Then, if any of those metrics go missing, we want an alert.
In our case, the proper expression to achieve that is
count (our_metric offset 1h > 0) by (some_name) unless count(our_metric) by (some_name)
This returns a vector which contains metrics which have been present an hour ago, but aren't present now. The values of the metrics are the count(...)
from the LHS (which can even be useful).
You can use any LHS/RHS. Read more about the unless operator.