I want to slice a NumPy nxn array. I want to extract an *arbitrary* selection of m rows and columns of that array (i.e. without any pattern in the numbers of rows/columns), making it a new, mxm array. For this example let us say the array is 4x4 and I want to extract a 2x2 array from it.

Here is our array:

```
from numpy import *
x = range(16)
x = reshape(x,(4,4))
print x
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
```

The line and columns to remove are the same. The easiest case is when I want to extract a 2x2 submatrix that is at the beginning or at the end, i.e. :

```
In [33]: x[0:2,0:2]
Out[33]:
array([[0, 1],
[4, 5]])
In [34]: x[2:,2:]
Out[34]:
array([[10, 11],
[14, 15]])
```

But what if I need to remove another mixture of rows/columns? What if I need to remove the first and third lines/rows, thus extracting the submatrix `[[5,7],[13,15]]`

? There can be any composition of rows/lines. I read somewhere that I just need to index my array using arrays/lists of indices for both rows and columns, but that doesn't seem to work:

```
In [35]: x[[1,3],[1,3]]
Out[35]: array([ 5, 15])
```

I found one way, which is:

```
In [61]: x[[1,3]][:,[1,3]]
Out[61]:
array([[ 5, 7],
[13, 15]])
```

First issue with this is that it is hardly readable, although I can live with that. If someone has a better solution, I'd certainly like to hear it.

Other thing is I read on a forum that indexing arrays with arrays forces NumPy to make a copy of the desired array, thus when treating with large arrays this could become a problem. Why is that so / how does this mechanism work?

As Sven mentioned, `x[[[0],[2]],[1,3]]`

will give back the 0 and 2 rows that match with the 1 and 3 columns while `x[[0,2],[1,3]]`

will return the values x[0,1] and x[2,3] in an array.

There is a helpful function for doing the first example I gave, `numpy.ix_`

. You can do the same thing as my first example with `x[numpy.ix_([0,2],[1,3])]`

. This can save you from having to enter in all of those extra brackets.

To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array `x`

from your question. The buffer assigned to `x`

will contain 16 ascending integers from 0 to 15. If you access one element, say `x[i,j]`

, NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect `i*x.shape[1]+j`

(and multiplying with the size of an int to get an actual memory offset).

If you extract a subarray by basic slicing like `y = x[0:2,0:2]`

, the resulting object will share the underlying buffer with `x`

. But what happens if you acces `y[i,j]`

? NumPy can't use `i*y.shape[1]+j`

to calculate the offset into the array, because the data belonging to `y`

is not consecutive in memory.

NumPy solves this problem by introducing *strides*. When calculating the memory offset for accessing `x[i,j]`

, what is actually calculated is `i*x.strides[0]+j*x.strides[1]`

(and this already includes the factor for the size of an int):

```
x.strides
(16, 4)
```

When `y`

is extracted like above, NumPy does not create a new buffer, but it *does* create a new array object referencing the same buffer (otherwise `y`

would just be equal to `x`

.) The new array object will have a different shape then `x`

and maybe a different starting offset into the buffer, but will share the strides with `x`

(in this case at least):

```
y.shape
(2,2)
y.strides
(16, 4)
```

This way, computing the memory offset for `y[i,j]`

will yield the correct result.

But what should NumPy do for something like `z=x[[1,3]]`

? The strides mechanism won't allow correct indexing if the original buffer is used for `z`

. NumPy theoretically *could* add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.

This is covered in depth in the NumPy documentation on indexing.

Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:

```
x[[[1],[3]],[1,3]]
```

This is because the index arrays are broadcasted to a common shape. Of course, for this particular example, you can also make do with basic slicing:

```
x[1::2, 1::2]
```

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow