“Most intuitive” average of $P$ for all $x\in A \cap [a,b]$, where $A\subseteq\mathbb{R}$?
There is no unique intuitive average of $P$ over an arbitrary countable subset $A$. This is for two reasons. First, viewing $A$ as just an abstract set of points to be averaged over is problematic. Any way to define an average over $A$ would have to use a specific ordering of the elements of $A$, and thus no unique or intuitive average could be achieved. You thus want to use the fact that $A$ is a subset of $\mathbb{R}$ (after all, why did you decide you wanted $A$ to be a subset of $\mathbb{R}$ and not just an abstract infinite set!). So we want to use some of the structure of $\mathbb{R}$; note that the Lebesgue average does this -- it uses intervals as the building blocks for the Lebesgue measure. Second, and related to the first, is that $P$ is an arbitrary function. If $P$ is an arbitrary function, then it doesn't care about the structure of $\mathbb{R}$; $A$ very well might then be an abstract infinite set on which $P$ is defined, and then there is no way to use that $A$ is a subset of the structured set $\mathbb{R}$. We therefore want to put some restrictions on $P$. One natural one is that $P$ is continuous.
With this context and these assumptions in mind, I'll try to provide a satisfactory answer to your (refined) question. We can assume $A \subseteq [a,b]$, since that's where everything is happening. For ease, I'll have $[a,b] = [0,1]$. Based on the above discussion, I'll just have $P$ be defined on all of $[0,1]$ and continuous. Let $E_1 = [0,1], E_2 = [0,\frac{1}{2}], E_3 = [\frac{1}{2},1], E_4 = [0,\frac{1}{4}], E_5 = [\frac{1}{4},\frac{1}{2}], E_6 = [\frac{1}{2},\frac{3}{4}]$, $E_7 = [\frac{3}{4},1], E_8 = [0,\frac{1}{8}]$, etc.
If $A$ is finite, it's obvious how to define the average of $P$ (just do $\frac{1}{|A|}\sum_{x \in A} P(x)$). So, assume $A$ is infinite. Consider the sets $A\cap E_1, A\cap E_2, \dots$. Let $x_1$ be a point in the first nonempty one of these sets. Let $x_2$ be a point in the second nonempty of these sets, etc. Look at the measures $\delta_{x_1}, \frac{\delta_{x_1}+\delta_{x_2}}{2}, \dots, \frac{\delta_{x_1}+\dots+\delta_{x_N}}{N},\dots$. Since $[0,1]$ is a compact metric space, there is some probability measure $\mu$ on $[0,1]$ that is a weak* limit of some subsequence of these measures, i.e. there is some $(N_k)_k$ with $\frac{1}{N_k}\sum_{j=1}^{N_k} f(x_j) \to \int_0^1 f d\mu$ for each $f \in C([0,1])$.
We then define the average of $P$ over $A$ to be $\int_0^1 Pd\mu$.
Benefits of this definition: (1) It coincides with the Lebesgue measure when $A = \mathbb{Q}$; in fact, it coincides with the Lebesgue measure whenever $A$ is dense in $[0,1]$. (2) It is localized to the right places (e.g. $A \subseteq [0,\frac{1}{2}]$ implies $\mu$ lives on $[0,\frac{1}{2}]$). (3) It is intuitively an average; we are sampling "randomly" from the interval $[0,1]$ and taking a limit of these empirical averages of samples.
Cons of this definition: (1) It is not unique (for two reasons: (a) the choice of the $x_i$'s is not unique; (b) there might be multiple weak* limits). However, I don't think this can be avoided -- I don't think one can get a unique, intuitive average over an arbitrary countably infinite set.
My answer to your question from 2 years ago (!) might be useful (good you're still studying this stuff).
I'll end with an interesting example. Consider $A = \{1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\frac{1}{5},\dots\}$. We can have, for example, $x_1 = \frac{1}{3},x_2 = 1, x_3 = \frac{1}{5},x_4 = \frac{1}{2},\dots$. Note, for fun, that each element of $A$ will eventually be included in $(x_n)_n$ (to see this, note that any number $\frac{1}{l}$ is the only element of $A$ in an interval of the form $[\frac{m}{2^k},\frac{m+1}{2^k}]$ if $k$ is large enough). In any event, the measure $\mu$ obtained will just be $\delta_0$, the delta mass at $0$, so that $\int Pd\mu = P(0)$ is just the value of $P$ at $0$. This is intuitive, since the elements of $A$ are nearly all near $0$.
.
Added: Let $P: \mathbb{Q} \cap [0,1] \to \mathbb{R}$ be $P(x) = x^2$. Let $\tilde{P}: [0,1] \to \mathbb{R}$ be $\tilde{P}(x) = x^21_{\mathbb{Q}}(x)$, and let $T: [0,1] \to \mathbb{R}$ be $T(x) = x^2$. The Lebesgue integral of $\tilde{P}$ over $[0,1]$ is $0$, and the Lebesgue integral of $T$ over $[0,1]$ is $\frac{1}{3}$. It doesn't really make sense to say "the Lebesgue integral over $\mathbb{Q}$", but, for example, what one would mean when one says "the Lebesgue integral of $x^2$ over $\mathbb{Q}$ is $0$" is "the Lebesgue integral of $\tilde{P}$ over $[0,1]$ is $0$". The interval $[0,1]$ has the Lebesgue measure on it, so we can integrate functions over it. Since $\tilde{P}(x) = 0$ for almost every $x \in [0,1]$, it makes sense that the integral of $\tilde{P}$ is $0$. The Lebesgue integral is intuitive.
What you want, though, is to define a measure $\mu$ over $\mathbb{Q}$, so that "the average of $P$ over $\mathbb{Q}$" is simply $\int Pd\mu$. You want something different than the Lebesgue measure over $[0,1]$. (Note there is some confusing terminology here. The integral $\int Pd\mu$ is still called a "Lebesgue integral" even though the measure we are integrating over is not the Lebesgue measure). The way I defined that measure $\mu$ is described above. If $A = \mathbb{Q}$, or $\mathbb{Q}\cup \{\frac{\ln(m+\sqrt{3})}{100} : m \in \mathbb{N}\}$, or just any dense set, then what $\int Pd\mu$ turns out to be is $\int_{[0,1]} T dx$, the Lebesgue/Riemann integral of $T$ with respect to the Lebesgue measure, where $T$ is the continuous extension of $P$ to $[0,1]$. In particular, if $P = x^2$ and $A = \mathbb{Q}$, the "average of $P$ over $A$" given by my answer $\int Pd\mu$ is $\int_0^1 x^2dx = \frac{1}{3}$, the intuitive answer you sought.
The essence of this question moreso seems to be how to intuitively integrate on a countably infinite set. So, before we get into integrating with respect to some probability functions, lets see if we can integrate with respect to $f(x)=x$.In it's purest form, an integral is a bit like a fancy average, so let's try and think what behaviors we would want:
Let's say we have a set $A = \{C, 0,0,0,0 \dots \}$. Then treating it like a psuedo-Cesaro-sum, if we make $a_0=C$, we get: $$\lim_{n \to \infty} \frac{1}{n} \sum_{i=0}^n a_i = \lim_{n \to \infty}\frac{1}{n}(C+ \sum_1^n 0) = \lim_{n \to \infty}\frac{C}{n} = 0$$ This works out quite nicely. Generally, it should make sense that any finite number of points should not impact the average of $A$. This is a bit analogous to how in Lesbegue integration, any countably infinite amount of points doesn’t effect the integral. Explicitly:
Property 1: $\forall S= \{C_1,C_2,C_3\dots C_k\}, average(A)=average(A \setminus S)$ However, psuedo-Cesaro-sums aren’t perfect. Consider the set $A = \{0,1,0,1,0,1,0,1 \dots\}$ where $0$ and $1$ occur infinite times. There are many ways we can order $a_i$ such that both $0$ and $1$ appear infinite times, that still different results:
$$ 0+1+0+1 \dots = \frac{1}{2} $$ $$ 0+0+1+0+0+1 \dots = \frac{1}{3} $$ $$ 0+0+0+1+0+0+0+1 \dots = \frac{1}{4} $$ $$ 0+0+0+1+1+0+0+0+1+1 \dots = \frac{2}{5} $$
In fact, you can end up with any average in $[0,1]$ being your average. In fact, this is tangentially similar to the Riemann Series Theorem, which is about how you can rearrange the terms certain kinds of sums to get different values. So, how do we handle this? If we tried to take the “average” of psuedo-Cesaro-sums, that doesn’t really help, the whole reason we’re in this mess is because averaging with infinities is hard. Despite this being an adhoc proposal, let’s assert:
Property 2: if you have a finite set $S = \{a,b,c\dots z \}$, and $A$ has countably infinite copies of each element of $S$, $average(A) = average(S) = \sum_{x \in S} x/|S|$
I realize with some clever model of infinite permutations you could possibly rigorously prove that the average of a psuedo-Cesaro-sum for all permutations of $A$ will be this, but that sounds rather messy and I hope we can all intuitively accept property 2.
However, we are not done, in fact, here is where things get really iffy. Consider the sets $A_1 = \{0,.9,1.1,0,.99,1.01,0,.999,1.001\dots\}$ and $A_2 = \{0,.9,0,.99,0,.999\}$. For this I suggest the analysis concept of “limit points”. $x$ is a limit point of a set $A$ if $\forall \epsilon > 0, \exists y \in A \textrm{ s.t. } |x-y| < \epsilon$. We will say $x$ is a dense limit point of $A$ if $\forall S \subset A$ where $S$ is a finite set, $x$ is a limit point of $A \setminus S$. Now, let $x$ be dense limit point of $A$, and $y_0, y_1 \dots$ be a sequence of $y \in A$ s.t. $|x-y_i| < \epsilon_i$ with $\epsilon_0 > \epsilon_1 > \epsilon_2 \dots$ being a sequence with approaches zero. If we took the psuedo-Cesaro-sum with $a_i = y_i$, we would get that it approaches $x$, and it would stay that way even if we permitted the sequence, which also occurs if $a_i = x$ and we take the psuedo-Cesaro-sum. So, going out on a limb, let’s do the following:
Let $D(A) := \{x|\forall S \subset A,∀ \epsilon > 0, ∃y \in A \setminus S \textrm{ s.t. } |x-y| < \epsilon \}$, for finite sets $S$. If $A$ is a countably infinite set, let $average(A) = average(D(A))$. This satisfies properties 1 & 2. Now, if $D(A)$ is a finite set, we can rely on the elementary average where we sum all elements and divide by $|S|$. If $D(A)$ has non-zero measure, then we can simply get the average by integrating over $A$, and divide by it’s measure. Finally, we have the case where $D(A)$ is also countably infinite, where we recursively say $average(D(A)) = average(D(D(A))$. I don’t believe that you can have infinite countably infinite $D(A)$, but lack a proof for this. Neveretheless, let’s just classify $A$ with that property as being pathological, and out of our ability to average.
With these rules in line, we can average any non-pathological countably infinite set $A$ bounded by some interval $[a,b]$. By this is I mostly meant okay that we will not go from a countably infinite set $D_n(A)$ to an empty set $D_{n+1}(A)$, which occurs if $A = \mathbb{Z}$. As a rough proof of this, we can chop $[a,b]$ in half, and we get two shorter intervals, one of which still has countably infinite points. We can repeatedly halve all our intervals, and still one will have infinitely many points. As these intervals get smaller and smaller, we end up creating at least one dense limit point. Since it’s not pathalogical, eventually $D_n(A)$ will not be countably infinite, and it won’t immediately be empty too, thus it must be finite or have non-zero measure, allowing us to average it.
So, tada! I present a method of averaging countably infinite sets which intuitively makes sense (at least to me). Now what? What about probabilities and integrals? Typically integrals are done in respect to measure, they have scale and what not. However, countable sets lack this, so I think an averages are more akin to what actually works. So, with that in mine, let us define our “integral” off of this. First, let’s say:
$$ \int_{A} x = average(A) = \frac{\sum_{x \in D_n(A)} x}{|D_n(A)|} \textrm{ if } D_n(A) \textrm{ is finite or } \frac{\int_{D_n(A)} x dx}{\int_{D_n(A)} 1 dx} \textrm{ if } D_n(A) \textrm{ has nonzero measure}$$
Then, we can extend this for general $f$ like so: $$ \int_{A} f(x) = \frac{\sum_{x \in D_n(A)} f(x)}{|D_n(A)|} \textrm{ if } D_n(A) \textrm{ is finite or } \frac{\int_{D_n(A)} f(x) dx}{\int_{D_n(A)} 1 dx} \textrm{ if } D_n(A) \textrm{ has nonzero measure}$$
For continuous $f$, this should be consistent with psuedo-Cesaro-sums on dense limit points. So, this would be really interesting if the justification of property 2 is consistent with “random” psuedo-Cesaro-sums. Thus concludes integration. From there, hopefully we’re just a hop and a skip away from incorporating probabilities. Yet unfortunately, my attention has been drained writing this, and lack the energy to decipher what precise model of probability you desire to implement. Perhaps you can take it from here, basing things off of Bayesian priors, or convolutions. If you are confused, or have further thoughts and ideas, just let me know, and I’ll be excited to offer some more effort into this.
Potential Answer
Definition of Measure
Consider $f:A\to\mathbb{R}$ where $A\subseteq[a,b]$, $a,b \in \mathbb{R}$ and $S\subseteq A$. Here $S$ is a fixed subset of $A$
Suppose we define a Lebesgue Probability Measure $\lambda_{A}(S)$ depending on a fixed set $A$ and $S$. The length of the intervals $I$ and $J$ are given by $\ell(I)=\ell(J)=b-a$. If $\left(I_{k,\epsilon}\right)_{k=1}^{m}$ are subsets of $I$ covering $S$ and $\left(J_{k,\epsilon}\right)_{k=1}^{n}$ are subsets $J$ covering $A$, if $\lambda^{*}$ is the Lebesgue Outer Measure, then the Lebesgue Outer Probability Measure is:
$ \lambda^{*}_{A}(S)= \inf\left\{\frac{\sum\limits_{k=1}^{m}\ell(I_{k,\epsilon}) \bigl[1{-}\mu(A)(1{-}\mu(S\cap I_{k,\epsilon}))\bigr]\text{sign}(|A|)}{\sum\limits_{k=1}^{n}\ell(J_{k,\epsilon}) \bigl[1{-}\mu(A)(1{-}\mu(A\cap J_{k,\epsilon}))\bigr]}: S\subseteq\bigcup\limits_{k=1}^{m} I_{k,\epsilon}, A\subseteq\bigcup\limits_{k=1}^{n} J_{k,\epsilon}, \left|\lambda^{*}(S)-\sum\limits_{k=1}^{m}\ell(I_{k,\epsilon})\right|\le \epsilon, \left|\lambda^{*}(A)-\sum\limits_{k=1}^{n}\ell(J_{k,\epsilon})\right|\le \epsilon, 1\le m \le \max\left\{|S|,1\right\}, 1 \le n \le \max\left\{|A|,1\right\}; P = S\cap I_{k,\epsilon},\\ P= A\cap J_{k,\epsilon} \ \text{or} \ P= A; \mu(P)=\inf\left\{\text{sign}\left(\bigcup\limits_{s=1}^{t}G_s\right): P\subseteq \bigcup\limits_{s=1}^{t} G_s\right\}, 1 \le t \le \max\left\{|P|,1\right\} \right\}$
Where $\text{sign}(0)=0$, when $S$ and $A$ are uncountable $|S|=+\infty$ and $|A|=+\infty$, $\lim\limits_{m \rightarrow \infty} \sum\limits_{k=1}^{m} \ell(I_k)\to\lambda(S)$, $\lim\limits_{n \rightarrow \infty} \sum\limits_{k=1}^{n} \ell(J_k) \rightarrow \lambda(A)$ and in most cases $\epsilon$ should approach zero. Moreover, $\mu(P)=0$ when $P$ is countable and $\mu(P)=1$ when $P$ is uncountable.
Finally
$\ell(I_{k,\epsilon})=c$ for all $k\in \{1,...,m\}$, $\ell(J_{k,\epsilon})=c$ for all $k\in \{1,...,n\}$, and $c\in \mathbb{R}^{+}$
If $\lambda_{A}(S)\neq(\lambda(S)/(b-a))$, we could split $I_{k,\epsilon}$ and $J_{k,\epsilon}$ into two cases.
Case 1) When $\lambda(A)>0$, $\ell(I_{k,\epsilon})=c_k$ for all $k\in\left\{1,...,m\right\}$ and $\ell(J_{k,\epsilon})=d_k$ for all $k\in\left\{1,...,m\right\}$ where $c_k,d_k \in \mathbb{R^{+}}$.
Case 2) If $\lambda(A)=0$, $\ell(I_{k,\epsilon})=c$ for $k\in\left\{1,...,m\right\}$ and $\ell(J_{k,\epsilon})= c$ for all $k\in\left\{1,...,n\right\}$ where $c\in\mathbb{R}^{+}$.
From these restrictions, Inner Generalized Lebesgue Measure $\lambda_{A*}(S)$ is
\begin{equation} \lambda_{A*}(S)=\lambda_{A}^{*}(A)-\lambda_{A}^{*}(A\setminus S) \end{equation}
And when the limit of the inner and outer measure equal eachother
\begin{align} \lambda_{A}^{*}(S)=\lambda_{A*}(S) =\lambda_{A}(S) \end{align}
Where $\lambda_{A}(S)$ is the Full Lebesgue Probability Measure.
Properties of the Measure
It seems the properties of my measure are:
\begin{equation} \lambda_{\emptyset}(\emptyset)=0 \end{equation}
\begin{equation} \lambda_{A}(\emptyset)=0 \end{equation}
\begin{equation} \lambda_{A}(A)=1 \end{equation}
If $A=A_1\cup A_2$ and both subsets of $A$ are disjoint.
\begin{equation} \lambda_{A}(A_1\cup A_2)=\lambda_{A}(A_1)+\lambda_{A}(A_2) \end{equation}
If $A=\bigcup_{i=1}^{\infty}(A_i)$ then
\begin{equation} \lambda_{A}\left(\bigcup_{i=1}^{\infty}A_i\right)=\sum_{i=1}^{\infty}\lambda_{A}(A_i) \end{equation}
If $A_1=A\setminus A_2$
\begin{equation} \lambda_{A}(A_1)=\lambda_{A}(A)-\lambda_{A}(A_2)=1-\lambda_{A}(A_2) \end{equation}
And if $A=[a,b]$
\begin{equation} \lambda_{A}(A)(b-a)=\lambda(A) \end{equation}
I am not sure how to prove these properties are true. I'd appreciate if someone could help.
Average and Integral
From this, we can start defining the average $f$. We start by defining $f$ as a linear combination of characteristics functions.
\begin{equation} 1_A = \begin{cases} 1 & x \in A \\ 0 & \text{otherwise} \end{cases} \end{equation}
We define the sum as:
\begin{equation} \int 1_A d\lambda_{A} = \lambda_{A}(A) \end{equation}
\begin{equation} S_n = \sum_{i=1}^{n} \int f(c_i) 1_{A_i} d\lambda = \sum_{i=1}^{n} f(c_i) \times \lambda_{A}(A_i) \end{equation}
Where $c_i \in [x_{i-1}, x_{i}]$ and $A_i=A\cap[x_{i-1},x_i]$ such that $a= x_0 \le \dots \le x_n =b$. Note as $n\to\infty$ we get our generalized average.
To get the "integral" multiply $S_n$ by $b-a$ and set $n\to\infty$. My hunch is the "integral" will not give the area under the curve (where we can use the anti-derivatives of $f$) unless $A$ is dense in $[a,b]$.