Python 数据科学和科学计算领域的基石Numpy
Python科学计算库Numpy
本文从实战角度从几下几个方面对于Numpy进行全面演示,人工智能关键在神经网络,而神经网络又都是矩阵运算,矩阵运算的基础就是Numpy.全面掌握好Numpy就为人工智能开发打下深厚基础。
1 Numpy概述
2 array结构
3 数值计算
4 排序
5 数组形状
6 数组生成
7 运算
8 随机模块
9 读写
10 练习题
确保第一个事,咱们要用的库已经安装好了
import numpy as np
array = [1,2,3,4,5]
array + 1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], line 2
1 array = [1,2,3,4,5]
----> 2 array + 1
TypeError: can only concatenate list (not "int") to list
要想给一个list加1,不能直接这么操作。但是我们把它转成一个numpy最底层的ndarray结构,就可以对list做数学运算了
array = np.array([1,2,3,4,5])
print (type(array))
<class 'numpy.ndarray'>
array2 = array + 1
array2
array([2, 3, 4, 5, 6])
array2 +array
array([ 3, 5, 7, 9, 11])
array2 * array
array([ 2, 6, 12, 20, 30])
array[0]
np.int64(1)
array[3]
np.int64(4)
array
array([1, 2, 3, 4, 5])
array.shape
(5,)
只有adarray有shape属性,对于普通的list,没有这个shape
list1 = [1,2,3,4,5]
list1.shape
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[13], line 2
1 list1 = [1,2,3,4,5]
----> 2 list1.shape
AttributeError: 'list' object has no attribute 'shape'
np.array([[1,2,3],[4,5,6]])
array([[1, 2, 3],
[4, 5, 6]])
结构
对于ndarray结构来说,里面所有的元素必须是同一类型的 如果不是的话,会自动的向下进行转换.把一个list转成ndarray
list = [1,2,3,4,5]
ndarray = np.array(list)
ndarray
array([1, 2, 3, 4, 5])
ndarray基本属性操作,dtype看类型,shape形状,ndim维度,fill填充
type(ndarray)
numpy.ndarray
ndarray.dtype
dtype('int64')
ndarray.shape
(5,)
ndarray.ndim
1
ndarray1 = np.array([[1,2,3],[4,5,6]])
ndarray1.ndim
2
ndarray1.shape
(2, 3)
fill可以全部填充
ndarray.fill(0)
ndarray
array([0, 0, 0, 0, 0])
索引与切片:跟Python都是一样的 还是从0开始的
list = [1,2,3,4,5]
array = np.array(list)
array[0]
np.int64(1)
array[1:3]
array([2, 3])
矩阵格式,多维形式
array = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
array
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
array.shape
(3, 3)
array.size
9
array.ndim
2
修改里面的值
array[1,1] = 10
array
array([[ 1, 2, 3],
[ 4, 10, 6],
[ 7, 8, 9]])
array[1]
array([ 4, 10, 6])
array[:,1]
array([ 2, 10, 8])
array[0,0:2]
array([1, 2])
array2 = array
array2
array([[ 1, 2, 3],
[ 4, 10, 6],
[ 7, 8, 9]])
array2[1,1] = 100
array2
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 9]])
array
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 9]])
可以看到=这种赋值,是一种浅拷贝,就是2个变量指向同一个地址,其中一个变量改了值,另一个也就改了。如何进行深拷贝,可以使用copy
array2 = array.copy()
array2
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 9]])
array2[1,1] = 1000
array2
array([[ 1, 2, 3],
[ 4, 1000, 6],
[ 7, 8, 9]])
array
array([[ 1, 2, 3],
[ 4, 100, 6],
[ 7, 8, 9]])
array = np.arange(0,100,10)
mask = np.array([0,0,0,1,1,1,0,0,1,1],dtype=bool)
mask
array([False, False, False, True, True, True, False, False, True,
True])
array[mask]
array([30, 40, 50, 80, 90])
random_array=np.random.rand(10)
random_array
array([0.81861097, 0.09494575, 0.81619337, 0.14018824, 0.11683594,
0.38988826, 0.81800008, 0.5675678 , 0.41470347, 0.55486108])
mask = random_array > 0.5
mask
array([ True, False, True, False, False, False, True, True, False,
True])
array = np.array([10,20,30,40,50])
array > 30
array([False, False, False, True, True])
拿到满足条件的索引值,把索引传进去,就可以找到值
np.where(array > 30)
(array([3, 4]),)
array[np.where(array > 30)]
array([40, 50])
数组类型,查看占有字节一共有多少个
array = np.array([1,2,3,4,5],dtype=np.float32)
array
array([1., 2., 3., 4., 5.], dtype=float32)
array.dtype
dtype('float32')
array.nbytes
20
asarray可以改变数组类型,但是不会改变原始数组
array=np.array([1,10,3.5,'str'])
array
array(['1', '10', '3.5', 'str'], dtype='<U32')
array = np.array([1,2,3,4,5])
np.asarray(array,dtype=np.float32)
array([1., 2., 3., 4., 5.], dtype=float32)
array
array([1, 2, 3, 4, 5])
array.astype(np.float32)
array([1., 2., 3., 4., 5.], dtype=float32)
array数组的数值计算,prod是数组里面全部相乘,全局的最大值,最小值,以及最大最小值的索引,均值,标准差,方差
array = np.array([[1,2,3],[4,5,6]])
array
array([[1, 2, 3],
[4, 5, 6]])
np.sum(array)
np.int64(21)
指定要进行操作的是沿什么轴(维度)
np.sum(array,axis=0)
array([5, 7, 9])
array.ndim
2
np.sum(array,axis=1)
array([ 6, 15])
np.sum(array,axis=-1)
array([ 6, 15])
array.sum()
np.int64(21)
array.sum(axis=0)
array([5, 7, 9])
array.prod()
np.int64(720)
array.min()
np.int64(1)
找到索引位置
array.argmin()
np.int64(0)
array.argmin(axis=1)
array([0, 0])
array.argmax()
np.int64(5)
array.mean()
np.float64(3.5)
array.mean(axis=0)
array([2.5, 3.5, 4.5])
标准差
array.std()
np.float64(1.707825127659933)
array.std(axis=1)
array([0.81649658, 0.81649658])
方差计算
array.var()
np.float64(2.9166666666666665)
array
array([[1, 2, 3],
[4, 5, 6]])
clip小于左边的,取左边值,大于右边的取右边值,round四舍五入,decimals保留小数位数
array.clip(2,4)
array([[2, 2, 3],
[4, 4, 4]])
array1 = np.array([1.2,3.34,4.56])
array1.round()
array([1., 3., 5.])
array1.round(decimals=1)
array([1.2, 3.3, 4.6])
排序
array = np.array([[1.5,1.3,7.5],[5.6,7.8,1.2]])
array
array([[1.5, 1.3, 7.5],
[5.6, 7.8, 1.2]])
np.sort(array)
array([[1.3, 1.5, 7.5],
[1.2, 5.6, 7.8]])
np.sort(array,axis=0)
array([[1.5, 1.3, 1.2],
[5.6, 7.8, 7.5]])
array
array([[1.5, 1.3, 7.5],
[5.6, 7.8, 1.2]])
np.argsort(array)
array([[1, 0, 2],
[2, 0, 1]])
array=np.linspace(0,10,10)
array
array([ 0. , 1.11111111, 2.22222222, 3.33333333, 4.44444444,
5.55555556, 6.66666667, 7.77777778, 8.88888889, 10. ])
valuses是即将插入的值,searchsorted是插入到array中返回的位置索引
values=np.array([2.5,6.5,9.5])
np.searchsorted(array,values)
array = np.array([[1,0,6],[1,7,0],[2,3,1],[2,4,0]])
数组的操作
shape,reshape,arange,newais增加维度,squeeze,把没必要维度进行压缩,transpose翻转,T也是一样。
concatenate数组连接,vstack竖着连,hstack横着连,flatten和ravel都是拉平。
import numpy as np
array = np.arange(10)
array
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array.shape
(10,)
array.shape = 2,5
array
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
array.reshape(1,10)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
大小必须不能变,总共十个元素,2,5可以,2,4就不行
array.shape = 2,4
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 array.shape = 2,4
ValueError: cannot reshape array of size 10 into shape (2,4)
array = np.arange(10)
array.shape
(10,)
array = array[np.newaxis,:]
array.shape
array = np.arange(10)
array.shape
(10,)
array = array[:,np.newaxis,np.newaxis]
array.shape
(10, 1, 1)
array = array.squeeze()
array.shape
(10,)
array.shape = 2,5
array
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
array.transpose()
array([[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]])
array.T
array([[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]])
a = np.array([[123,456,564],[345,678,754]])
a
array([[123, 456, 564],
[345, 678, 754]])
b = np.array([[532,567,8970],[123,345,765]])
b
array([[ 532, 567, 8970],
[ 123, 345, 765]])
c = np.concatenate((a,b))
c
array([[ 123, 456, 564],
[ 345, 678, 754],
[ 532, 567, 8970],
[ 123, 345, 765]])
c.shape
(4, 3)
np.vstack((a,b))
array([[ 123, 456, 564],
[ 345, 678, 754],
[ 532, 567, 8970],
[ 123, 345, 765]])
np.hstack((a,b))
array([[ 123, 456, 564, 532, 567, 8970],
[ 345, 678, 754, 123, 345, 765]])
a
array([[123, 456, 564],
[345, 678, 754]])
a.flatten()
array([123, 456, 564, 345, 678, 754])
a.ravel()
array([123, 456, 564, 345, 678, 754])
## 数组生成,arange构造数组,linspace,在给定范围均价的取多少值,logspace给定对数范围均匀分成几份对应值,
## meshgrid网格化,np.r_ ,np.c_,一个行一个列,可以接受多种输入,包括切片、数组。zeros,one,取一样值,identity对角线给1,别的为0
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(2,20,2)
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
np.arange(2,20,2,dtype=np.float32)
array([ 2., 4., 6., 8., 10., 12., 14., 16., 18.], dtype=float32)
np.linspace(0,10,10)
array([ 0. , 1.11111111, 2.22222222, 3.33333333, 4.44444444,
5.55555556, 6.66666667, 7.77777778, 8.88888889, 10. ])
np.logspace(0,1,5)
array([ 1. , 1.77827941, 3.16227766, 5.62341325, 10. ])
x = np.linspace(-10,10,5)
y = np.linspace(-10,10,5)
x
array([-10., -5., 0., 5., 10.])
x,y=np.meshgrid(x,y)
x
array([[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.]])
y
array([[-10., -10., -10., -10., -10.],
[ -5., -5., -5., -5., -5.],
[ 0., 0., 0., 0., 0.],
[ 5., 5., 5., 5., 5.],
[ 10., 10., 10., 10., 10.]])
np.r_[0:10:1]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.c_[0:10:1]
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
np.zeros(3)
array([0., 0., 0.])
np.zeros((3,3))
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
np.ones((3,3))
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
np.ones((3,3))*8
array([[8., 8., 8.],
[8., 8., 8.],
[8., 8., 8.]])
a=np.empty(6)
a.shape
(6,)
a.fill(1)
array = np.array([1,2,3,4])
array
array([1, 2, 3, 4])
np.zeros_like(array)
array([0, 0, 0, 0])
np.ones_like(array)
array([1, 1, 1, 1])
np.identity(5)
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
运算
x = np.array([5,5])
y = np.array([2,2])
np.multiply(x,y)
array([10, 10])
np.dot(x,y)
np.int64(20)
x.shape
(2,)
x.shape=2,1
x
array([[5],
[5]])
np.dot(x,y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[64], line 1
----> 1 np.dot(x,y)
ValueError: shapes (2,1) and (2,) not aligned: 1 (dim 1) != 2 (dim 0)
y.shape = 1,2
print(x.shape)
print(y.shape)
(2, 1)
(1, 2)
np.dot(x,y)
array([[10, 10],
[10, 10]])
np.dot(y,x)
array([[20]])
x = np.array([1,1,1])
y = np.array([[4,5,6],[1,2,3]])
print(x*y)
[[4 5 6]
[1 2 3]]
y = np.array([1,1,1,4])
x = np.array([1,1,1,2])
x == y
array([ True, True, True, False])
np.logical_and(x,y)
array([ True, True, True, True])
np.logical_or(x,y)
array([ True, True, True, True])
np.logical_not(x,y)
array([0, 0, 0, 0])
随机模块
import numpy as np
np.random.rand(3,2)
array([[0.17953902, 0.1301528 ],
[0.0380382 , 0.15961125],
[0.71432336, 0.20237312]])
### 返回是随机整数,左闭右开
np.random.randint(10,size=(5,4))
array([[0, 7, 9, 8],
[0, 0, 2, 7],
[6, 7, 4, 9],
[8, 7, 1, 0],
[2, 6, 6, 5]], dtype=int32)
np.random.rand()
0.19414180564410888
np.random.random_sample()
0.8492017836540788
np.random.randint(0,10,3)
array([3, 8, 1], dtype=int32)
mu,sigma = 0,0.1
np.random.normal(mu,sigma,10)
array([ 0.05971091, 0.11068208, 0.02777231, 0.06185503, -0.08028422,
-0.22575989, -0.10443067, 0.01615294, 0.03461024, 0.08860057])
洗牌,随机种子
array = np.arange(10)
np.random.shuffle(array)
array
array([4, 6, 5, 8, 1, 0, 9, 2, 3, 7])
np.random.seed(10)
mu,sigma = 0,0.1
np.random.normal(mu,sigma,10)
array([ 0.13315865, 0.0715279 , -0.15454003, -0.00083838, 0.0621336 ,
-0.07200856, 0.02655116, 0.01085485, 0.00042914, -0.01746002])
Numpy读写数据
%%writefile king.txt
1 2 3 4 5 6
3 4 5 6 7 8
Writing king.txt
data = []
with open('king.txt') as f:
for line in f.readlines():
fileds = line.split()
print(fileds)
cur_data = [float(x) for x in fileds]
data.append(cur_data)
data = np.array(data)
data
['1', '2', '3', '4', '5', '6']
['3', '4', '5', '6', '7', '8']
array([[1., 2., 3., 4., 5., 6.],
[3., 4., 5., 6., 7., 8.]])
split默认空格,读出来是一个个字符,然后再一个个取出来,append组成个list。
data = np.loadtxt('king.txt')
data
array([[1., 2., 3., 4., 5., 6.],
[3., 4., 5., 6., 7., 8.]])
%%writefile king2.txt
1,2,3,4,5,6
4,5,6,7,8,7
Writing king2.txt
data = np.loadtxt('king2.txt',delimiter = ',')
data
array([[1., 2., 3., 4., 5., 6.],
[4., 5., 6., 7., 8., 7.]])
%%writefile king3.txt
x,y,z,w,a,b
1,2,3,4,5,6
3,4,5,6,7,8
Writing king3.txt
data = np.loadtxt('king3.txt',delimiter = ',',skiprows = 1)
data
array([[1., 2., 3., 4., 5., 6.],
[3., 4., 5., 6., 7., 8.]])
skiprows:去掉几行 delimiter = ‘,’,分隔符,usecols=(0,1,4)指定使用哪几例
读写array结构
array = np.array([[1,2,3],[4,5,6]])
np.save('king.npy',array)
array
array([[1, 2, 3],
[4, 5, 6]])
array1 = np.load('king.npy')
array2 = np.arange(10)
array2
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.savez('queen.npz',a=array1,b=array2)
data=np.load('queen.npz')
data.keys()
KeysView(NpzFile 'queen.npz' with keys: a, b)
data['a']
array([[1, 2, 3],
[4, 5, 6]])
data['b']
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
练习
打印当钱Numpy版本
print(np.__version__)
2.0.2
构造一个全零矩阵,并打印其占用内存大小
z=np.zeros((5,5))
print('%d bytes'%(z.size*z.itemsize))
200 bytes
打印一个函数帮助文档,比如numpy.add
print(help(np.info(np.add)))
add(x1, x2, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature])
Add arguments element-wise.
Parameters
----------
x1, x2 : array_like
The arrays to be added.
If ``x1.shape != x2.shape``, they must be broadcastable to a common
shape (which becomes the shape of the output).
out : ndarray, None, or tuple of ndarray and None, optional
A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
where : array_like, optional
This condition is broadcast over the input. At locations where the
condition is True, the `out` array will be set to the ufunc result.
Elsewhere, the `out` array will retain its original value.
Note that if an uninitialized `out` array is created via the default
``out=None``, locations within it where the condition is False will
remain uninitialized.
**kwargs
For other keyword-only arguments, see the
:ref:`ufunc docs <ufuncs.kwargs>`.
Returns
-------
add : ndarray or scalar
The sum of `x1` and `x2`, element-wise.
This is a scalar if both `x1` and `x2` are scalars.
Notes
-----
Equivalent to `x1` + `x2` in terms of array broadcasting.
Examples
--------
>>> np.add(1.0, 4.0)
5.0
>>> x1 = np.arange(9.0).reshape((3, 3))
>>> x2 = np.arange(3.0)
>>> np.add(x1, x2)
array([[ 0., 2., 4.],
[ 3., 5., 7.],
[ 6., 8., 10.]])
The ``+`` operator can be used as a shorthand for ``np.add`` on ndarrays.
>>> x1 = np.arange(9.0).reshape((3, 3))
>>> x2 = np.arange(3.0)
>>> x1 + x2
array([[ 0., 2., 4.],
[ 3., 5., 7.],
[ 6., 8., 10.]])
Help on NoneType object:
class NoneType(object)
| Methods defined here:
|
| __bool__(self, /)
| True if self else False
|
| __repr__(self, /)
| Return repr(self).
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
None
创建一个10-49的数组,并将其倒叙排列
array=np.arange(10,50,1)
array=array[::-1]
array
array([49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33,
32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16,
15, 14, 13, 12, 11, 10])
找一个数组中不为0的索引
np.nonzero([1,2,4,0,7,6,0,87])
(array([0, 1, 2, 4, 5, 7]),)
随机构造一个3*3矩阵,并打印其中最大值与最小值
array = np.random.random((3,3))
array.min()
array.max()
np.float64(0.8052231968327465)
构造一个5*5的矩阵,令自值都为1,并在最外层加上一圈0
array=np.ones((5,5))
array=np.pad(array,pad_width=2,mode='constant',constant_values=0)
array
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])
构建一个shape为(6,7,8)的矩阵,并找到第100个元素的索引值
np.unravel_index(100,(6,7,8))
(np.int64(1), np.int64(5), np.int64(4))
对一个5*5的矩阵做归一化操作
array = np.random.random((5,5))
max = array.max()
min = array.min()
array = (array-min)/(max-min)
array
array([[4.82916777e-01, 9.11170266e-01, 2.45605198e-01, 6.46273968e-01,
1.00000000e+00],
[5.73535042e-01, 6.37970703e-01, 0.00000000e+00, 3.68131543e-01,
4.67040973e-02],
[3.08237106e-01, 3.37487751e-01, 8.50614944e-01, 7.84484361e-04,
4.51867772e-01],
[3.19199938e-01, 6.91574756e-01, 3.55584952e-01, 4.41849228e-03,
9.73462352e-01],
[8.38351949e-01, 9.71356473e-01, 4.37991291e-01, 6.55776506e-01,
5.49111069e-01]])
找到两个数组中相同的值
z1 = np.random.randint(0,10,10)
z2 = np.random.randint(0,10,10)
print(z1)
print(z2)
print(np.intersect1d(z1,z2))
[5 9 0 4 6 6 0 2 3 3]
[2 6 0 5 1 3 6 5 5 1]
[0 2 3 5 6]
作者:智模睿脑君