Python Numpy. Delete an element (or elements) in a 2D array if said element is located between a pair of specified elements

I have a 2D NumPy array exclusively filled with 1s and 0s.

a = [[0 0 0 0 1 0 0 0 1]
     [1 1 1 1 1 1 1 1 1]
     [1 1 1 1 1 1 1 1 1]
     [1 1 1 1 0 0 0 0 1]
     [1 1 1 1 1 1 1 1 1]
     [1 1 1 0 1 1 1 1 1]
     [1 1 1 1 1 1 0 0 1]
     [1 1 1 1 1 1 1 1 1]]

To get the location of the 0s I used the following code:

new_array = np.transpose(np.nonzero(a==0))

As expected, I get the following result showing the location of the 0s within the array

new_array = [[0 0]
             [0 1]
             [0 2]
             [0 3]
             [0 5]
             [0 6]
             [0 7]
             [3 4]
             [3 5]
             [3 6]
             [3 7]
             [5 3]
             [6 6]
             [6 7]]

Now comes my question: Is there way to get the location of the 0s at the start and end of and horizontal group if said group is larger than 2?

EDIT: If group were to finish at the end of a row and continue on the one below it, it would count as 2 separate groups.

My first thought was to implement a process that would delete 0s if they are located in-between 0s but I was not able to figure out how to do that.

I would like "new_array" output to be:

new_array = [[0 0]
             [0 3]
             [0 5]
             [0 7]
             [3 4]
             [3 7]
             [5 3]
             [6 6]
             [6 7]]

Thanks beforehand!!

EDIT 2:

Thanks you all for your very helpful insights, I was able to solve the problem that I had.

To satisfy the curiosity, this data represents musical information. The purpose of the program I'm working on is to create a musical score based on a image (that consist exclusively of horizontal lines).

Once the image conversion to 1s and 0s is done, I needed to extract the following information from it: Onset, Pitch, and Duration. This translates into position in the "x" axis, position on the "y" axis and total length of group.

Since X and Y locations are fairly easy to get, I decided to process them separately from the "Duration" calculation (which was the main problem to solve in this post).

Thanks to your help I was able to solve the Duration problem and create a new array with all necessary information:

[[0 0 4]
 [5 0 3]
 [4 3 4]
 [6 6 2]]

Note that 1st column represent Onset, 2nd column represents Pitch, and 3rd column represents Duration.

It has also come to my attention the comment that suggested to add an identifier to each event. Eventually I will need to implement that to differentiate between different instruments (and later sending them to individual Midi channels). However, for this first iteration of the program that only aims to create a music score for a single instrument, it is not necessary since all events belong to a single instrument.

I have very little experience with programming, I don't know if this was the most efficient way of achieving my goal. Any suggestions are welcomed.

Thanks!


Solution 1:

One possible solution that is easier to follow is:

b = np.diff(a, prepend=1)  # prepend a column of 1s and detect
                           # jumps between adjacent columns (left to right)
y, x = np.where(b > 0)  # find positions of the jumps 0->1 (left to right)
# shift positive jumps to the left by 1 position while filling gaps with 0:
b[y, x - 1] = 1
b[y, x] = 0
new_array = list(zip(*np.where(b)))

Another one is:

new_array = list(zip(*np.where(np.diff(a, n=2, prepend=1, append=1) > 0)))

Both solutions are based on the np.diff that computes differences between consecutive columns (when axis=-1 for 2D arrays).