Duplicate values in R and Python

Solution 1:

  1. Your inner loop is extending beyond the length of ourlist. With this example i will iterate from 1 to 8 (length(ourlist)); on the last iteration when i is 1, then you call k <- i+1, making it 9. You then iterate j from k to length(ourlist) which evaluates to 9:8 (a decreasing sequence, length 2).

    The answer, knowing that you want to compare an element with the element(s) after it, is that your i must iterate up to but not including length(ourlist). In that way, your k <- i+1 will never be longer than the length of ourlist.

    A literal fix for that:

    similars <- function(ourlist, container){
      for (i in 1:(length(ourlist)-1)) {
        k <- i+1
        for (j in k:length(ourlist)){
          if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){
            if (is.na(ourlist[i])) browser()
            container[i] <- ourlist[i]
          }
        }
      }
      return (container)
    }
    similars(numbers, container1)
    # [1] 10 NA 32
    
    1. Next issue: why the NA? That's because you are assigning to the output at index i, not necessarily "append one element to the output". Let's do the append:
    similars <- function(ourlist, container){
      for (i in 1:(length(ourlist)-1)) {
        k <- i+1
        for (j in k:length(ourlist)){
          if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){
            if (is.na(ourlist[i])) browser()
            container <- c(container, ourlist[i])
            # container[i] <- ourlist[i]
          }
        }
      }
      return (container)
    }
    similars(numbers, container1)
    # [1] 10 32
    
  2. (Minor.) Inside an if clause, the conditional must always be length-1. Use && instead of &.

    if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){
    

    Why? Primarily for short-circuiting. & and | are vectorized, which means it accepts something like c(TRUE,FALSE) | c(FALSE, TRUE), and it always iterates all aspects of both sides. && is single only, but it short-circuits such that if the first resolves perfectly then the second will not even attempt to evaluate. Examples:

    TRUE || stop("oops")
    # [1] TRUE
    FALSE && stop("oops")
    # [1] FALSE
    TRUE && stop("oops")
    # Error: oops
    
  3. (Minor.) Passing container seems unnecessary here. R passes by-reference, so it is not as if you are pre-allocating memory here. I suggest you remove it from the argument list, and pre-define it in the function.

    similars <- function(ourlist) {
      container <- c()
      for (i in 1:(length(ourlist)-1)) {
        k <- i+1
        for (j in k:length(ourlist)){
          if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){
            if (is.na(ourlist[i])) browser()
            container <- c(container, ourlist[i])
            # container[i] <- ourlist[i]
          }
        }
      }
      return (container)
    }
    
  4. (More minor.) Let's think along the computer-science-y (CS) lines of "allow 0 or more". In this sense, is it "reasonable" to pass an empty vector? If that is given as the argument, then one might expect an empty vector be returned as well. However ... 1:length(.) will not work here. Demo:

    vec <- 2:4
    1:length(vec)
    # [1] 1 2 3
    seq_along(vec)
    # [1] 1 2 3
    seq_len(length(vec))
    # [1] 1 2 3
    
    vec <- c()
    1:length(vec)
    # [1] 1 0             # this is broken
    seq_along(vec)
    # integer(0)
    seq_len(length(vec))
    # integer(0)
    

    I suggest you use seq_len(length(ourlist)) (or length(.)-1), making the final version in this answer:

    similars <- function(ourlist) {
      container <- c()
      for (i in seq_len(max(0, length(ourlist)-1))) {
        k <- i+1
        for (j in (k-1) + seq_len(max(0, length(ourlist)-(k-1)))) {
          if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){
            if (is.na(ourlist[i])) browser()
            container <- c(container, ourlist[i])
            # container[i] <- ourlist[i]
          }
        }
      }
      return (container)
    }
    similars(numbers)#, container1)
    # [1] 10 32
    similars(c())
    # NULL
    

Solution 2:

The loop can be a single loop instead of nested - loop over the sequence from the 2nd element to the last (length), then if the current element ourlist[i] is present %in% the sequence of previous elements and not (!) present in the storage container, concatenate (c) with the 'container' with the current element and update by assignment (<-)

similars <- function(ourlist, container){
  for(i in 2:length(ourlist)) {  
      if(ourlist[i] %in% ourlist[seq(i-1)] & !(ourlist[i] %in% container)) {
         container <- c(container, ourlist[i])      
        }
     }
  
     container   
   }

-testing

> container1 <- c()
> similars(numbers, container1)
[1] 10 32

Here, we don't want to use a nested loop because %in% is vectorized and thus save a lot of unnecessary iterations


It can be done in a more easier way with duplicated in R

> numbers[duplicated(numbers)]
[1] 10 32 

Regarding why there is an error, it is already specified in the comments Regarding the issue in code your outer loop will be till the last element, then you are assigning k <- i + 1, which will be outside the index