代码收藏家技术教程 2023-06-07

Python实现数组去重的方法详解

使用 unique() 方法从 NumPy 数组中删除重复项

unique() 方法是 numpy 中的一个内置方法，它将一个数组作为输入并返回一个唯一的数组，即通过删除所有重复元素。为了删除重复项，我们将给定的 NumPy 数组传递给 unique() 方法，它将返回唯一数组。

numpy.unique(arr, return_index=False, return_inverse=False, return_counts=False, axis=None) 

Parameters:
arr            = The array to be passed to the function.
return_index   = If True, returns the indices of unique array
return_inverse = If True, also returns the indices of unique array
axis           = Axis 0 represents rows and axis 1 represents columns, if no axis is provided then the input array will be flattened i.e treated as a 1d array

从一维 NumPy 数组中删除重复元素

方法：

导入 numpy 库并创建一个 numpy 数组。

将数组传递给不带轴参数的 unique() 方法。

该函数将返回唯一数组。

打印结果数组

import numpy as np

# Create a NumPy Aray
data = np.array([1,2,3,4,4,5,6,7])

# Pass array to the unique function
# It will remove the duplicates.
data = np.unique(data)

print(data)

从 2D NumPy 数组中删除重复行

要从 2D NumPy 数组中删除重复行，请使用以下步骤，

导入 numpy 库并创建一个 numpy 数组

将数组传递给 unique() 方法 axis=0 参数

该函数将返回唯一数组

打印结果数组。

import numpy as np

# create numpy arrays
data = np.array([[1,2,3],
                 [3,2,1],
                 [7,8,9],
                 [9,8,9],
                 [7,8,9]])

# Delete duplicate rows from 2D NumPy Array
data = np.unique(data, axis=0)

print(data)

从 2D NumPy 数组中删除重复的列

要从 2D NumPy 数组中删除重复的列，请使用以下步骤，

导入 numpy 库并创建一个 numpy 数组

将数组传递给 unique() 方法 axis=1 参数

该函数将返回唯一数组

import numpy as np

# create numpy arrays
data = np.array([[1, 14, 3, 14, 14],
                 [3, 13, 1, 13, 13],
                 [7, 12, 9, 12, 12],
                 [9, 11, 9, 11, 11],
                 [7, 10, 9, 10, 10]])

# Remove Duplicate columns from 2D NumPy Array
data = np.unique(data, axis=1)

print(data)

使用 set() 方法从 NumPy 数组中删除重复项

set() 方法是 python 中的一个内置方法，它将一个可迭代对象作为输入并返回一个仅具有不同元素的可迭代集合。（利用元组的性质）

import numpy as np

# create numpy arrays
data = np.array([[1,2,3],
                 [3,2,1],
                 [7,8,9],
                 [9,8,9],
                 [7,8,9]])


# Delete duplicate rows from 2D NumPy Array
data = np.vstack(list(set(tuple(row) for row in data)))

print(data)

使用 unique() 方法和 return_index 参数

使用 unique() 函数从二维 NumPy 数组中删除重复行

unique() 方法是 numpy 中的一个内置方法，它将一个数组作为输入并返回一个唯一的数组，即通过删除所有重复元素。

在这种情况下，我们需要删除给定数组的重复项，因此我们创建一个长度为原始数组中列数的随机数组，并将随机数组与给定数组相乘。结果数组将作为输入参数传递给 unique() 方法，return_index 参数为 True，因此此方法将返回唯一数组的索引。索引将为我们提供一个唯一的数组。

numpy.unique(arr, return_index=False, return_inverse=False, return_counts=False, axis=None) 

Parameters:
arr            = The array to be passed to the function. 
return_index   = If True, returns the indices of unique array
return_inverse = If True, also returns the indices of unique array
axis           = Axis 0 represents rows and axis 1 represents columns, if no axis is provided then the input array will be flattened i.e treated as a 1d array

方法：

导入 numpy 库并创建一个 numpy 数组

创建一个长度为数组中列数的随机数组

使用 np.dot() 方法将随机数组和给定数组相乘，即点积，在本例中为矩阵乘法

将结果数组作为输入参数传递给 unique() 方法，return_index 参数为 True

该方法将返回唯一数组的索引。

索引用于打印给定数组的唯一数组

import numpy as np

# create numpy arrays
data = np.array([[1,2,3],
                 [3,2,1],
                 [7,8,9],
                 [9,8,9],
                 [7,8,9]])


# creating a random array
a = np.random.rand(data.shape[1])

# multiply the given array and random array.
b = data.dot(a)

# pass the resultant array to the unique()
unique, index = np.unique(b, return_index=True)

# use the index to print the unique array from given array
data = data[index]

print(data)

通过迭代从一维 NumPy 数组中删除重复项

给定一个一维数组，对于数组中的每个元素，我们将检查它是否在数组中重复，如果重复，我们将删除该元素，否则我们将保留它。

方法：

导入 numpy 库并创建一个 numpy 数组

初始化一个空列表并将其命名为唯一。

遍历 numpy 数组并为每个元素检查元素是否存在于唯一列表中

如果该元素不存在于唯一列表中，则将其添加到列表中，否则继续。

现在从唯一列表中创建一个 numpy 数组

import numpy as np

# create a numpy array
data=np.array([1, 2, 3, 4, 4, 6, 5, 6, 7])

# creating a empty list
unique=[]

# iterating each element of array
for i in data:
# if element is not present in the list
# add the element to list.
    if i not in unique:
        unique.append(i)

data=np.array(unique)       

print(data)

通过迭代数组从二维数组中删除重复项

给定一个二维数组，对于数组中的每个数组，我们将检查它是否在数组中重复，如果重复，我们将删除数组，否则我们将保留它。

方法：

导入 numpy 库并创建一个 numpy 数组

初始化一个空列表并将其命名为唯一。

遍历 numpy 数组并为每个数组检查数组是否存在于唯一列表中

如果该数组不存在于唯一列表中，则将其添加到列表中，否则继续。

现在从唯一列表中创建一个 numpy 数组

import numpy as np

# create 2D NumPy Array
data=np.array([ [1,2,3],
                [5,6,7],
                [7,8,9],
                [9,8,9],
                [7,8,9]])

unique=[]

# iterating each array of array
for i in data:
    # if array is not present in the list
    # add the array to list.
    if list(i) not in unique:
        unique.append(list(i))

data=np.array(unique)       

print(data)

使用 numpy.lexsort() 和 np.diff() 方法

词法排序()

lexsort() 是间接稳定排序，lexsort() 接受一个排序键数组，可以将其解释为 numpy 数组中的列，lexsort 返回一个整数索引数组，该数组描述了多列的排序顺序。

要从数组中删除重复项，我们将使用 lexsort() 对给定的 NumPy 数组进行排序，排序后如果有任何重复项，则它们将相邻。现在排序后的数组被传递给 diff() 方法，该方法将查找数组中的差异，如果有任何重复项，差异将为零。我们使用 any() 方法查找非零行，这将用于从排序数组中获取唯一数组。

方法：

导入 numpy 库并创建一个 numpy 数组

将给定数组的转置作为排序键传递给 lexsort() 方法

使用 lexsort 方法返回的排序索引对给定数组进行排序

排序后的数组传递给 numpy diff() 方法，它将找到沿轴的差异

any() 方法查找非零行

非零行信息用于从排序数组中创建唯一数组。