Using python to compute relative risk (risk ratio) from a dataframe with support of the zepid package (simulate the riskratio from epitools r pack.)

Solution 1:

The error is a result of how RiskRatio is parsing your input data set behind the scenes.

When using RiskRatio, the default reference category is set to 0. So, when you independent variable is being processed internally, zEpid is looking for age_group=0. However, there are no instances of 0 in your data set.

To fix this, you can specify the optional argument reference. By default reference=0 but you can set it to 1, which will set age_group=1 as the reference risk for the risk ratio.

The following is a simple example with some simulated data with 'A' and 'Y'

import numpy as np
import pandas as pd
from scipy.stats import norm
from zepid import RiskRatio

np.random.seed(20220120)
df = pd.DataFrame()
df['A'] = np.random.randint(1, 4, size=100)
df['Y'] = np.random.binomial(n=1, p=0.25, size=100)

# Generating some generic data
np.random.seed(20220120)
df = pd.DataFrame()
df['A'] = np.random.randint(1, 4, size=80)           # Note: A \in {1,2,3}
df['Y'] = np.random.binomial(n=1, p=0.25, size=80)   # Note: Y \in {0,1}

# Estimating Risk Ratios with zEpid
rr = RiskRatio(reference=1)
rr.fit(df, exposure='A', outcome='Y')

# Calculating P-values
est = rr.results['RiskRatio'][1:]
std = rr.results['SD(RR)'][1:]
z_score = np.log(est)/std
p_value = norm.sf(abs(z_score))*2

# Displaying results
print("RR:     ", list(est))
print("P-value:", p_value)

Which should output the following

RR:      [1.0266666666666666, 0.7636363636363636]
P-value: [0.93990517 0.5312407 ]

I generated some generic data rather than use the example data set provided because there is another issue in that data that will result in an error. Below is a 2-by-3 table of the data set

adhd_parent   0   1
age_group          
1            62   0
2             0  32
3             0   6

These structural zeroes in the data will through a PositivityError in zEpid. Basically, you can't calculate the risk due to a division by zero (the risk in the referent is 0).