代码收藏家技术教程 2025-02-12

python+playwright自动化测试(二)：元素定位与页面内容获取

选择器

CCS

文本选择器

XPath选择器

组合选择器

定位器函数locator()

内置定位器

组合查询定位

过滤器filter

正则表达式使用

元素选择nth\first\last

all关键字

网页内容获取

官网：

https://playwright.dev/python/docs/next/api/class-locator#methods
https://playwright.dev/python/docs/locators#lists

playwright同selenium一样，支持css、xpath等的元素方法定位

选择器

CCS

'button.submit-button'

支持链式选择：page.locator('.van-popove >> .icon')

>: 定位和父级元素相邻的亲子元素

>>:定位父级元素下的所有元素

文本选择器

'text=登录'

使用文本选择器时关键字加引号和不加引号的区别：加引号为精准匹配，不加引号为模糊匹配，可以通过打印匹配的元素数量比较

    print(len(page.query_selector_all('text=自动化')))
    print(len(page.query_selector_all('text="自动化"')))

XPath选择器

'xpath=//button[@type="submit"]'

使用xpath选择器时，匹配到多个元素时可以使用下标索引选择'xpath=//div/a[@href][8]'，如这里匹配的元素中的第8个。

使用xpath时支持通过文本模糊匹配定位，见https://blog.csdn.net/JBY2020/article/details/120398923

组合选择器

'div[role="button"]:has-text("登录")'

定位器函数locator

page.locator('xpath=//div[@id="searchTag"]')

内置定位器

page.get_by_role()：显式和隐式可访问性属性，根据ARIA 属性使用，包括按钮、复选框、标题、链接、列表、表格等，如page.get_by_role("button", name="Sign in")

page.get_by_text()：文本内容，可指定exact=True参数，选择包含或不包含的文本的元素
page.get_by_label()：通过关联标签的文本定位表单控件
page.get_by_placeholder()：按元素的placeholder占位符属性定位
page.get_by_alt_text()：对于图片元素具有alt属性(通常是图像元素)
page.get_by_title()：标题属性title定位元素
page.get_by_test_id()：根据测试ID，data-test-id属性定位元素（实际使用时可配置其他属性）

组合查询定位

    t = page.get_by_text('自动化测试')
    b = page.locator('xpath=//div[@id="searchTag"]')
    b.locator(t)

使用or或and

    page.locator('xpath=//div[@id="searchTag"]').and_(page.get_by_text('测试'))
    # 想定位两个或多个元素中的一个，但不知道会是哪一个，请使用or
    page.locator('xpath=//div[@id="searchTag"]').or_(page.get_by_text('测试'))

过滤器filter

作为locator的参数，也作为filter的参数使用

has_text：包含文本xx
has_not_text：不包含文本XX
has：包含元素CC
has_not：不包含元素cc

    page.locator('xpath=//div[@id="searchTag"]').filter(has_text='py', has_not_text='ja',
                                                        has_not=page.get_attribute('xpath=//*[@calss]'),
                                                        has=page.get_by_text('test'))

内置定位器get_by_XXX、filter、locator等均支持链式使用，以缩小定位范围，链后的元素选择都是在链前的元素上再进行查找定位。

正则表达式使用

page.get_by_text("button", name=re.compile(r"[1-9]"))

元素选择nth\first\last

通常一个表达式会匹配到多个元素，如果我们需要其中的一个，可以使用对应方法获取

nth(index) 获取第N个：page.locator('xpath=//div/a[@href]').nth(2)

first获取第一个：page.locator('xpath=//div/a[@href]').first

last获取最后一个：page.locator('xpath=//div/a[@href]').last

all关键字

遍历获取到的所有匹配元素

# 获取元素数量
print(len(page.locator('xpath=//div/a[@href]').all()))
page.locator('xpath=//div/a[@href]').count()

# 遍历元素
for h in page.locator('xpath=//div/a[@href]').all():
    print(h.inner_text())

网页内容获取

获取网页html，单个元素的html，页面或元素的文本，可作为爬虫爬取数据使用

    print(page.content())  # 获取整个页面的html
    print(page.title())  # 获取标题
    print(page.locator('xpath=//div[@id="searchTag"]').inner_html())  # 获取元素的整个html源码内容
    print(page.locator('xpath=//div[@id="searchTag"]').get_attribute('class'))  # 获取元素属性值

    # 以字符串形式返回文本内容
    print(page.locator('xpath=//div[@id="searchTag"]').inner_text())  # 获取元素的文本内容，返回内容被格式化
    print(page.locator('xpath=//div[@id="searchTag"]').text_content())  # 获取某个元素内包含子元素内容及隐藏元素的所有文本内容，不会格式化返回内容

    # 以列表形式返回元素文本内容
    print(page.locator('xpath=//div[@id="searchTag"]').all_inner_texts())  # 返回内容被格式化
    print(page.locator('xpath=//div[@id="searchTag"]').all_text_contents())  # 不会格式化返回内容

作者：觅远

物联沃分享整理
物联沃-IOTWORD物联网 » python+playwright自动化测试(二)：元素定位与页面内容获取