PyMC3/Arviz: CDF value from trace

I have a sample from PyMC3 and I'm trying to get a cumulative probability from it, e.g. P(X < 0). I currently use this:

trace = pymc3.sample(return_inferencedata=True)
prob_x_lt_zero = (trace.posterior.X < 0).sum() / trace.posterior.X.size

Is there a better way to do this, either with some helper function from Arviz or XArray?

I haven't found any .cdf() method or anything similar. It's weird that such basic functions are missing, but more advanced ones are there, such as trace.posterior.X.quantile().

I would recommend your original approach evaluating the condition and averaging (that is basically using the empirical cdf) instead of using the KDE.

There is no equivalent that I know of, probably also because there is no equivalent in numpy either (which has both quantile and percentile). There is one in scipy: scipy.stats.percentileofscore but I wouldn't recommend it either unless you are working with discrete data and need the kind argument to evaluate ties (i.e. would you care or notice any difference between using < or <=?). This scipy function also takes only a scalar as value to evaluate the ecdf against.

My recommendation therefore is to stick with your method, but modify a bit the implementation, so it also works when evaluating multiple values at the same time and when not reducing all the dimensions:

import arviz; import xarray
x = xarray.DataArray([-.1, 0, .1])  # skip that if working with scalars
post = arviz.load_arviz_data("rugby").posterior
prob_x_lt_zero = (post.atts < x).mean(("chain", "draw"))

which returns the probabilities for each of the 3 values we are evaluating at all 6 teams.

<xarray.DataArray (team: 6, dim_0: 3)>
array([[0.    , 0.    , 0.0485],
       [0.347 , 0.975 , 1.    ],
       [0.    , 0.004 , 0.4245],
       [0.64  , 0.994 , 1.    ],
       [1.    , 1.    , 1.    ],
       [0.    , 0.    , 0.    ]])
Coordinates:
  * team     (team) object 'Wales' 'France' 'Ireland' ... 'Italy' 'England'
Dimensions without coordinates: dim_0

You could approximate the CDF with a kernel density estimate, but I am not convinced that this is better than your current approach:

import arviz
grid, pdf = arviz.kde(trace.posterior.X.values, cumulative=True)
idx = np.sum(grid < 0) - 1
prob_x_lt_zero = pdf[idx]

Mocking a script with no `if __name__=="__main__":` block

Can some One explain && and || operators for the following code please? [duplicate]

Can I reduce size of array in CUDA

Module not found, did you mean "*js"?

Warning: Different store and key passwords not supported for PKCS12 KeyStores. Ignoring user-specified -keypass value. in Android Studio

How to determine the exact lib/include variable generated by find_packge() in config mode? [duplicate]

Why we should use a URL for the Go module name?

How to place independent groups of radio buttons on same row and place well panels around the groups in order to delineate them?

Format string with dictionary in C#

Window Function not working in HackerRank MySQL-8.0.20? [duplicate]

count, print out and delete rows that do not contain a digit

What does two return statements inside an SQL function mean?

PyMC3/Arviz: CDF value from trace

Related

Recent Posts