Duplicate values in R and Python
Solution 1:
-
Your inner loop is extending beyond the length of
ourlist
. With this examplei
will iterate from 1 to 8 (length(ourlist)
); on the last iteration wheni
is 1, then you callk <- i+1
, making it9
. You then iteratej
fromk
tolength(ourlist)
which evaluates to9:8
(a decreasing sequence, length 2).The answer, knowing that you want to compare an element with the element(s) after it, is that your
i
must iterate up to but not includinglength(ourlist)
. In that way, yourk <- i+1
will never be longer than the length ofourlist
.A literal fix for that:
similars <- function(ourlist, container){ for (i in 1:(length(ourlist)-1)) { k <- i+1 for (j in k:length(ourlist)){ if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container[i] <- ourlist[i] } } } return (container) } similars(numbers, container1) # [1] 10 NA 32
- Next issue: why the
NA
? That's because you are assigning to the output at indexi
, not necessarily "append one element to the output". Let's do the append:
similars <- function(ourlist, container){ for (i in 1:(length(ourlist)-1)) { k <- i+1 for (j in k:length(ourlist)){ if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container <- c(container, ourlist[i]) # container[i] <- ourlist[i] } } } return (container) } similars(numbers, container1) # [1] 10 32
- Next issue: why the
-
(Minor.) Inside an
if
clause, the conditional must always be length-1. Use&&
instead of&
.if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){
Why? Primarily for short-circuiting.
&
and|
are vectorized, which means it accepts something likec(TRUE,FALSE) | c(FALSE, TRUE)
, and it always iterates all aspects of both sides.&&
is single only, but it short-circuits such that if the first resolves perfectly then the second will not even attempt to evaluate. Examples:TRUE || stop("oops") # [1] TRUE FALSE && stop("oops") # [1] FALSE TRUE && stop("oops") # Error: oops
-
(Minor.) Passing
container
seems unnecessary here. R passes by-reference, so it is not as if you are pre-allocating memory here. I suggest you remove it from the argument list, and pre-define it in the function.similars <- function(ourlist) { container <- c() for (i in 1:(length(ourlist)-1)) { k <- i+1 for (j in k:length(ourlist)){ if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container <- c(container, ourlist[i]) # container[i] <- ourlist[i] } } } return (container) }
-
(More minor.) Let's think along the computer-science-y (CS) lines of "allow 0 or more". In this sense, is it "reasonable" to pass an empty vector? If that is given as the argument, then one might expect an empty vector be returned as well. However ...
1:length(.)
will not work here. Demo:vec <- 2:4 1:length(vec) # [1] 1 2 3 seq_along(vec) # [1] 1 2 3 seq_len(length(vec)) # [1] 1 2 3 vec <- c() 1:length(vec) # [1] 1 0 # this is broken seq_along(vec) # integer(0) seq_len(length(vec)) # integer(0)
I suggest you use
seq_len(length(ourlist))
(orlength(.)-1
), making the final version in this answer:similars <- function(ourlist) { container <- c() for (i in seq_len(max(0, length(ourlist)-1))) { k <- i+1 for (j in (k-1) + seq_len(max(0, length(ourlist)-(k-1)))) { if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container <- c(container, ourlist[i]) # container[i] <- ourlist[i] } } } return (container) } similars(numbers)#, container1) # [1] 10 32 similars(c()) # NULL
Solution 2:
The loop can be a single loop instead of nested - loop over the sequence from the 2nd element to the last (length
), then if
the current element ourlist[i]
is present %in%
the seq
uence of previous elements and not (!
) present in the storage container, concatenate (c
) with the 'container' with the current element and update by assignment (<-
)
similars <- function(ourlist, container){
for(i in 2:length(ourlist)) {
if(ourlist[i] %in% ourlist[seq(i-1)] & !(ourlist[i] %in% container)) {
container <- c(container, ourlist[i])
}
}
container
}
-testing
> container1 <- c()
> similars(numbers, container1)
[1] 10 32
Here, we don't want to use a nested loop because %in%
is vectorized and thus save a lot of unnecessary iterations
It can be done in a more easier way with duplicated
in R
> numbers[duplicated(numbers)]
[1] 10 32
Regarding why there is an error, it is already specified in the comments Regarding the issue in code your outer loop will be till the last element, then you are assigning k <- i + 1, which will be outside the index