代码收藏家技术教程 2025-01-10

Python中的loc和iloc索引方法

python中 Pandas Series, 和 DataFrames的 loc() 和 iloc()使用指南

使用Python处理大型数据集时，高效的数据索引和切片是很重要的。Pandas为这项任务提供了强大的工具 – Series, DataFrame。而处理它们的两个最常用的索引方法是loc（）和iloc（）。本篇文章将主要用DataFrame的例子来阐述这两个索引方法的使用方法和异同之处。

`loc()`

loc() 主要是label-based的索引。它根据labels或index names选择数据（显示索引）。

按label选择行和列:

import pandas as pd
import numpy as np

# 创建一个DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])

# 按label选择行
print(df.loc['a'])  

# 输出:   A    1
#         B    4
# Name: a, dtype: int64

# 按label选择列
print(df.loc[:, 'A'])  

# 输出:   a    1
#         b    2
#         c    3
# Name: A, dtype: int64

通过label来选择一个范围的行和列:

print(df.loc['a':'b', 'A':'B'])  

# 输出:       A  B
#          a  1  4
#          b  2  5

Fancy indexing和Masking:

Fancy Indexing：选择多行或者是多列

# 选择第一和第二列（通过列名选择）
print(df.loc[:, ['A', 'B']])

# 输出：    A  B
#       0  1  5
#       1  2  6
#       2  3  7
#       3  4  8

Masking：根据某条件来选择行

# 选择列 'A' 大于 2 的行
print(df.loc[df['A'] > 2])

# 输出：   A  B   C
#      2  3  7  11
#      3  4  8  12

Fancy Indexing和masking合并：

import pandas as pd

# 创建数据框
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
})

# Masking条件
mask = df['A'] > 2

# 将mask转换为适配loc方法的标签索引
mask_indices = df.index[mask]

# Fancy indexing: 选择第二和第三列（通过列名选择）
selected_columns = ['B', 'C']

# loc结合masking和fancy indexing
result = df.loc[mask_indices, selected_columns]

print(result)

`iloc()`

iloc() 主要是integer-based的索引。它根据位置来选择数据（隐式索引）。

按照位置选择行和列：

# 选择第一行
print(df.iloc[0])  

# 输出: A    1
#       B    4
# Name: a, dtype: int64

# 选择第二列
print(df.iloc[:, 1])  

# 输出: a    4
#       b    5
#       c    6
# Name: B, dtype: int64

通过位置索引来选择一个范围内的行或列：

print(df.iloc[0:2, 0:2])  

# 输出:     A  B
#        a  1  4
#        b  2  5

Fancy indexing和Masking:

Fancy Indexing：选择多行或者是多列

# 选择第一和第三行
print(df.iloc[[0, 2]])

# 选择第一和第二列
print(df.iloc[:, [0, 1]])

Masking：根据某条件来选择行

# 选择列 'A' 大于 2 的行
print(df.loc[df['A'] > 2])

Fancy Indexing和masking合并：

import pandas as pd
import numpy as np

# 创建数据框
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
})

# Masking条件
mask = df['A'] > 2

# 将mask转换为适配iloc方法的integer索引
mask_indices = np.where(mask)[0]

# Fancy indexing
selected_columns_indices = [1, 2]  # 选择第二和第三列

# iloc结合masking和fancy indexing
result = df.iloc[mask_indices, selected_columns_indices]

print(result)

loc()和iloc()的区别辨析:

特点	`loc()`	`iloc()`
索引类型	Label-based	Integer-based
包含性	Inclusive	Exclusive (last index not included)
对于missing labels的处理	Raises KeyError	Returns NaN（Not a Number）