Using python to compute relative risk (risk ratio) from a dataframe with support of the zepid package (simulate the riskratio from epitools r pack.)
Solution 1:
The error is a result of how RiskRatio
is parsing your input data set behind the scenes.
When using RiskRatio
, the default reference category is set to 0
. So, when you independent variable is being processed internally, zEpid is looking for age_group=0
. However, there are no instances of 0
in your data set.
To fix this, you can specify the optional argument reference
. By default reference=0
but you can set it to 1
, which will set age_group=1
as the reference risk for the risk ratio.
The following is a simple example with some simulated data with 'A'
and 'Y'
import numpy as np
import pandas as pd
from scipy.stats import norm
from zepid import RiskRatio
np.random.seed(20220120)
df = pd.DataFrame()
df['A'] = np.random.randint(1, 4, size=100)
df['Y'] = np.random.binomial(n=1, p=0.25, size=100)
# Generating some generic data
np.random.seed(20220120)
df = pd.DataFrame()
df['A'] = np.random.randint(1, 4, size=80) # Note: A \in {1,2,3}
df['Y'] = np.random.binomial(n=1, p=0.25, size=80) # Note: Y \in {0,1}
# Estimating Risk Ratios with zEpid
rr = RiskRatio(reference=1)
rr.fit(df, exposure='A', outcome='Y')
# Calculating P-values
est = rr.results['RiskRatio'][1:]
std = rr.results['SD(RR)'][1:]
z_score = np.log(est)/std
p_value = norm.sf(abs(z_score))*2
# Displaying results
print("RR: ", list(est))
print("P-value:", p_value)
Which should output the following
RR: [1.0266666666666666, 0.7636363636363636]
P-value: [0.93990517 0.5312407 ]
I generated some generic data rather than use the example data set provided because there is another issue in that data that will result in an error. Below is a 2-by-3 table of the data set
adhd_parent 0 1
age_group
1 62 0
2 0 32
3 0 6
These structural zeroes in the data will through a PositivityError
in zEpid. Basically, you can't calculate the risk due to a division by zero (the risk in the referent is 0).