R: x 'probs' outside [0,1]
I am using the R programming language.
I am trying to follow the instructions from the "optimization" package in R (https://cran.r-project.org/web/packages/optimization/optimization.pdf) and use functions from this package on a specific function.
For this example, I first generate some random data:
#load libraries
library(dplyr)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
From here, I define the function that I want to optimize ("fitness"). This function takes 7 inputs and calculates a "total" mean (a single scalar value). The inputs required for this function are:
- "random_1" (between 80 and 120)
- "random_2" (between "random_1" and 120)
- "random_3" (between 85 and 120)
- "random_4" (between random_2 and 120)
- "split_1" (between 0 and 1)
- "split_2" (between 0 and 1)
- "split_3" (between 0 and 1 )
The function to optimize is defined as follows:
library(optimization)
fitness <- function(x) {
#bin data according to random criteria
train_data <- train_data %>%
mutate(cat = ifelse(a1 <= x[1] & b1 <= x[3], "a",
ifelse(a1 <= x[2] & b1 <= x[4], "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = x[5])))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = x[6])))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = x[7])))
#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
mean = mean(final_table$diff)
}
From here, I am trying to run the following optimization function:
Output <- optim_nm(fitness, k = 7, trace = TRUE)
plot(output)
plot(Output, 'contour')
But these return the following errors:
Error: Problem with `mutate()` column `quant`.
i `quant = quantile(c1, prob = x[6])`.
x 'probs' outside [0,1]
Run `rlang::last_error()` to see where the error occurred.
Error in plot(Output) : object 'Output' not found
I think error is that the "split_1", "split_2" and "split_3" variables are being assigned values outside of 0 and 1 : since the function is using these variables are used to calculate percentiles (e.g. quant = quantile(c1, prob = x[5]
) , this will naturally result in an error?
I tried to use another optimization algorithm from this package where the ranges for these 7 inputs are explicitly defined, but this is also producing the same error :
ro_sa <- optim_sa(fun = fitness,
start = c(runif(7, min = -1, max = 1)),
lower = c(80,80,80,80,0,0,0),
upper = c(120,120,120,120,1,1,1),
trace = TRUE,
control = list(t0 = 100,
nlimit = 550,
t_min = 0.1,
dyn_rf = FALSE,
rf = 1,
r = 0.7
)
)
Error: Problem with `mutate()` column `quant`.
i `quant = quantile(c1, prob = x[6])`.
x 'probs' outside [0,1]
This also doesn't work if you provide an initial starting point:
optim_nm(fitness, start = c(80,80,80,80,0.5,0.6,0.7))
Error: Problem with `mutate()` column `quant`.
i `quant = quantile(c1, prob = x[5])`.
x 'probs' outside [0,1]
i The error occurred in group 1: cat = a.
Question: Can someone please show me how to fix this, so that I can run the optimization functions?
#desired functions to run:
Output <- optim_nm(fitness, k = 7, trace = TRUE)
plot(output)
plot(Output, 'contour')
ro_sa <- optim_sa(fun = fitness,
start = c(runif(7, min = -1, max = 1)),
lower = c(80,80,80,80,0,0,0),
upper = c(120,120,120,120,1,1,1),
trace = TRUE,
control = list(t0 = 100,
nlimit = 550,
t_min = 0.1,
dyn_rf = FALSE,
rf = 1,
r = 0.7
)
)
optim_nm(fitness, start = c(80,80,80,80,0.5,0.6,0.7))
Thanks
The x
values generated are random and they can be positive or negative. probs
argument of quantile
needs to have values between 0 and 1. One way would be to take the absolute value of x[5:7]
and turn them to ratio using prop.table
.
x[5:7] <- prop.table(abs(x[5:7]))
Complete function -
library(optimization)
fitness <- function(x) {
#bin data according to random criteria
train_data <- train_data %>%
mutate(cat = ifelse(a1 <= x[1] & b1 <= x[3], "a",
ifelse(a1 <= x[2] & b1 <= x[4], "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
x[5:7] <- prop.table(abs(x[5:7]))
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = x[5])))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = x[6])))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = x[7])))
#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
mean = mean(final_table$diff)
}
You can apply and plot this function -
Output <- optim_nm(fitness, k = 7, trace = TRUE)
plot(Output)