If $P \ne NP$, is every language not contained in $NP$ $NP$-hard?
So it looks like the answer to this question is as follows: If $P \ne NP$, then there exists a language not contained in NP that is not NP-hard. This follows from Mahaney's theorem, which says that $P = NP$ iff there is some sparse language L such that SAT is polynomial-time reducible to L. In particular, this says that if $P \ne NP$, then SAT is not polynomial-time reducible to any sparse language. So consider the unary halting language $UNARYHALT$ consisting of unary encodings of TM/string pairs where the given TM halts on the particular input. This language is sparse, since for any length there is either zero or one strings in the language with that length. Moreover, this language is undecidable by a reduction from the halting problem, so it cannot be in NP. Therefore, if $P \ne NP$, by Mahaney's theorem this language is not NP-hard, because there is no polynomial-time reduction from SAT to it.
It seems that the probabilistic method works.
Consider $\{0,1\} \times \{0,1\} \times \dots$ as a probability space, corresponding to throwing a coin countably many times (product measure).
Any event $a \in \{0,1\} \times \{0,1\} \times \dots$ encodes a subset $L \subseteq \{0,1\}^{\ast}$, the consecutive bits decide if $\epsilon \in L, 0 \in L, 1 \in L, 00 \in L, \dots$. We will now select random $L$, equivalently random $a$, and check its properties.
With probability 1, $L \notin \mathsf{NP}$ (since $\mathsf{NP}$ is countable), even more: with probability 1, $L$ is undecidable.
Now, suppose you have a reduction $f \colon A^{\ast} \to \{0,1\}^{\ast}$ that attempts to reduce SAT to $L$. Since $\mathsf{P} \neq \mathsf{NP}$ by assumption, the image of $f$ must be infinite; otherwise you could convert the reduction to a decision procedure for $SAT$. However, for any $x$, it must hold $x \in SAT \iff f(x)\in L$, and that happens with probability 1/2. Since there are infinitely many values for $f(x)$, the probability that $f$ is a valid reduction from SAT to $L$ is 0.
Since there are countably many reductions, and countable intersection of sets of measure 1 has measure 1, the overall probability of $L$ satisfying all conditions is 1.
So a "generic" random language is neither decidable nor $\mathsf{NP}$-hard, unless $\mathsf{P} = \mathsf{NP}$.
It is possible to convert this proof to diagonalization: on $2i$-th stage, you diagonalize against $i$-th $\mathsf{NP}$ problem; on $2i+1$-th stage, you diagonalize against $i$-th polynomial time reduction with infinite image. This gives a constructive example; with more careful bookkeeping you can get a decidable example.
The answer to this question depends on the complexity assumptions.
If $\mathrm P = \mathrm {NP}$, then every nontrivial language $L$ is $\mathrm{NP}$-hard. [A language is said to be nontrivial if it contains at least one yes instance and at least one no instance.] To reduce a given $\mathrm {NP}$ problem $A$ to $L$, we simply ignore $L$ and use the polynomial-time algorithm for $A$ -- this is guaranteed to exist under our assumption.
Assuming $\mathrm{NP} \neq \text{co-}\mathrm{NP}$, no problem in $\text{co-}\mathrm{NP}$ would be $\mathrm{NP}$-hard.
Take any undecidable language L and let L' be the language of pairs (n,x) with x in L and n=2^|x| in unary. Then L' is in P/poly but not in NP. If L' is NP-hard then NP$\subset$P/poly and the polynomial hierarchy collapses by the Karp-Lipton theorem.