Python 自动化测试工具 Playwright
Github 地址:https://github.com/microsoft/playwright-python
简介
Playwright 是一个微软开源的浏览器自动化端到端测试工具,有了它就可以方便地对网页应用程序执行端到端测试。还支持 Linux、macOS、Windows 系统下的 Chromium、Firefox 和 WebKit 浏览器。
安装
python -m pip install playwright
python -m playwright install
将同时安装 Playwright 以及 Chromium、Firefox 和 WebKit 的浏览器驱动依赖。
使用
codegen
Playwright 支持记录用户在浏览器中的操作并且自动生成代码。
python -m playwright codegen
Playwright 支持同步 API 以及异步 API,它们在功能方面是相同的,只是在使用API的方式上不同。
同步 API demo
from playwright import sync_playwright
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch()
page = browser.newPage()
page.goto('http://whatsmyuseragent.org/')
page.screenshot(path=f'example-{browser_type.name}.png')
browser.close()
异步 API demo
import asyncio
from playwright import async_playwright
async def main():
async with async_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = await browser_type.launch()
page = await browser.newPage()
await page.goto('http://whatsmyuseragent.org/')
await page.screenshot(path=f'example-{browser_type.name}.png')
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
其他
Playwright 还支持 pytest 测试、交互模式运行、执行 JS 代码、记录网络请求等功能。
体验
安装
安装浏览器依赖的时候很慢,即使已经 FQ
codegen
先来试试自动生成代码的功能
C:\Users\Hill\Desktop>python -m playwright codegen --help
Usage: index codegen [options] [url]
open page and generate code for user actions
Options:
-o, --output <file name> saves the generated script to a file
--target <language> language to use, one of javascript, python, python-async, csharp
(default: "python")
-h, --help display help for command
Examples:
$ codegen
$ codegen --target=python
$ -b webkit codegen https://example.com
-o
指定输出文件,--target
指定输出语言,-b
指定使用的浏览器类型。
就用使用百度搜索Python
来试一下
python -m playwright codegen -o codegen_test.py https://www.baidu.com
执行完后,会自动打开一个浏览器,跳转到百度,输入 Python 后回车,然后再关闭浏览器,运行目录下就出现了codegen_test.py
,文件内容如下:
from playwright import sync_playwright
def run(playwright):
browser = playwright.chromium.launch(headless=False)
context = browser.newContext()
# Open new page
page = context.newPage()
# Go to https://www.baidu.com/
page.goto("https://www.baidu.com/")
# Click input[name="wd"]
page.click("input[name=\"wd\"]")
# Press CapsLock
page.press("input[name=\"wd\"]", "CapsLock")
# Fill input[name="wd"]
page.fill("input[name=\"wd\"]", "P")
# Press CapsLock
page.press("input[name=\"wd\"]", "CapsLock")
# Fill input[name="wd"]
page.fill("input[name=\"wd\"]", "Python")
# Press Enter
page.press("input[name=\"wd\"]", "Enter")
# assert page.url() == "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=Python&fenlei=256&rsv_pq=e0a2bd93007a18fd&rsv_t=af15WBQFDyMcGk%2B3Xs%2BRwDVBT40f2Z8itvMfT5sBiqICLeozAk0f0ou7Pao&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=7&rsv_sug1=6&rsv_sug7=100&rsv_sug2=0&rsv_btype=i&prefixsug=Python&rsp=5&inputT=2425&rsv_sug4=3717"
# Close page
page.close()
# ---------------------
context.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)
重新执行该 Python 文件,将重复一遍刚刚的操作,自动启动浏览器,访问百度并搜索关键词Python
,由于停留时间太短,所以要在page.close()
前面加一个等待时间观察效果
同步和异步 API 测试
参照官方 demo 编写了同步和异步 API 测试脚本:
import asyncio
from playwright import sync_playwright, async_playwright
def sync_demo():
with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch()
page = browser.newPage()
page.goto('http://www.baidu.com')
page.screenshot(path=f'sync_example-{browser_type.name}.png')
browser.close()
async def async_demo():
async with async_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = await browser_type.launch()
page = await browser.newPage()
await page.goto('http://www.baidu.com')
await page.screenshot(path=f'async_example-{browser_type.name}.png')
await browser.close()
sync_demo()
asyncio.get_event_loop().run_until_complete(async_demo())
运行时,依次打开三种不同的浏览器,访问百度首页并截图,并且由于默认使用的是 headless 模式,所以不会显示浏览器页面
headless 模式可以使用browser = browser_type.launch(headless=Flase)
禁用
记录网页请求
from playwright import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.newPage()
def log_and_continue_request(route, request):
print(request.url)
route.continue_()
# Log and continue all network requests
page.route('**', lambda route, request: log_and_continue_request(route, request))
page.goto('http://www.baidu.com')
browser.close()
执行后,会打印访问百度首页时,浏览器发送的所有请求:
http://www.baidu.com/
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/baiduyun@2x-e0be79e69e.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/zhidao@2x-e9b427ecc4.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/baike@2x-1fe3db7fa6.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/tupian@2x-482fc011fc.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/baobaozhidao@2x-af409f9dbe.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/wenku@2x-f3aba893c1.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/jingyan@2x-e53eac48cb.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/topnav/yinyue@2x-c18adacacb.png
https://www.baidu.com/img/gjrdong_5dbd765f1fd82a0771cbc88aec7341c8.gif
https://www.baidu.com/img/flexible/logo/pc/result.png
https://www.baidu.com/img/flexible/logo/pc/result@2.png
https://www.baidu.com/img/flexible/logo/pc/peak-result.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/qrcode/qrcode@2x-daf987ad02.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/qrcode/qrcode-hover@2x-f9b106a848.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/lib/jquery-1-edb203c114.10.2.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/lib/esl-ef22c5ed31.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/sbase-0948aa26f1.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/s_super_index-855fcfd82e.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/min_super-ce30feac9c.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/components/hotsearch-8f112f3361.js
https://hectorstatic.baidu.com/cd37ed75a9387c5b.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/bundles/polyfill_9354efa.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/global/js/all_async_search_dafef7e.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/plugins/every_cookie_4644b13.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/plugins/bzPopper_7c5ff52.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/home/js/nu_instant_search_f7b49e5.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/plugins/swfobject_0178953.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/soutu/js/tu_68114f1.js
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/amd_modules/@baidu/search-sug_5b9188b.js
https://sp1.baidu.com/-L-Xsjip0QIZ8tyhnq/v.gif?logactid=1234567890&showTab=10000&opType=showpv&mod=superman%3Alib&submod=index&superver=supernewplus&glogid=2905514690&type=2011&pid=315&isLogin=0&version=PCHome&terminal=PC&qid=2905514810&sid=1460_33224_33058_31254_32971_33098_33101_26350_33198_33265&super_frm=&from_login=&from_reg=&query=&curcard=2&curcardtab=&_r=0.5174681533375083
https://sp2.baidu.com/-L-Ysjip0QIZ8tyhnq/v.gif?mod=superman%3Acomponents&submod=hotsearch&utype=undefined&superver=supernewplus&portrait=undefined&glogid=2905514690&type=2011&pid=315&isLogin=0&version=PCHome&terminal=PC&qid=2905514810&sid=1460_33224_33058_31254_32971_33098_33101_26350_33198_33265&super_frm=&from_login=&from_reg=&query=&curcard=2&curcardtab=&_r=0.3818542391730819&m=superman%3Acomponents_hotsearchShow&showType=hotword&words=%5B%2231%E7%9C%81%E5%8C%BA%E5%B8%82%E6%96%B0%E5%A2%9E24%E4%BE%8B%E7%A1%AE%E8%AF%8A%3A%E6%9C%AC%E5%9C%9F5%E4%BE%8B%22%2C%22%E9%9F%A9%E6%B0%91%E9%97%B4%E4%BA%BA%E5%A3%AB%E8%B5%B7%E8%AF%89%E4%B8%AD%E9%9F%A9%E6%94%BF%E5%BA%9C%E8%B4%A5%E8%AF%89%22%2C%22%E7%B4%A0%E5%AA%9B%E6%A1%88%E7%BD%AA%E7%8A%AF%E5%88%B0%E5%AE%B6%E7%94%BB%E9%9D%A2%3A%E8%AD%A6%E5%AF%9F%E5%A0%B5%E9%97%A8%E4%BF%9D%E6%8A%A4%22%2C%22%E5%AE%98%E6%96%B9%E9%80%9A%E6%8A%A5%E6%95%99%E7%BB%83%E7%BD%9A%E5%AD%A6%E7%94%9F%E7%94%A8%E4%BE%BF%E6%B1%A0%E6%B0%B4%E6%B4%97%E8%84%B8%22%2C%22%E6%8B%9C%E7%99%BB%3A39%E5%A4%A9%E5%90%8E%E7%BE%8E%E5%B0%86%E9%87%8D%E6%96%B0%E5%8A%A0%E5%85%A5%E5%B7%B4%E9%BB%8E%E5%8D%8F%E5%AE%9A%22%2C%22%E9%BB%91%E9%BE%99%E6%B1%9F%E6%96%B0%E5%A2%9E4%E4%BE%8B%E6%9C%AC%E5%9C%9F%E7%97%85%E4%BE%8B%20%E5%90%AB2%E5%B9%BC%E7%AB%A5%22%5D&pagenum=0
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/font/iconfont-8db5f471f4.woff2
https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/soutu/css/soutu_new2_ae491b7.css
https://www.baidu.com/sugrec?prod=pc_his&from=pc_web&json=1&sid=1463_33244_33061_32971_33099_33101_32846_33199_33240&hisdata=&_t=1607828626043&req=2&csor=0
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/searchbox/nicon-10750f3f7d.png
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/components/tips-e2ceadd14d.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/super_load-a97cbd2188.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/components/qrcode-da919182da.js
https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/components/guide-13c605998f.js
修改前端响应
利用 route 方法,我们可以实现一些网络劫持和修改操作,比如修改 request 的属性,修改 response 响应结果等。
举例:
from playwright.sync_api import sync_playwright
import re
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
def cancel_request(route, request):
route.abort()
page.route(re.compile(r"(\.png)|(\.jpg)"), cancel_request)
page.goto("https://spa6.scrape.center/")
page.wait_for_load_state('networkidle')
page.screenshot(path='no_picture.png')
browser.close()
这里我们调用了 route 方法,第一个参数通过正则表达式传入了匹配的 URL 路径,这里代表的是任何包含 .png
或 .jpg
的链接,遇到这样的请求,会回调 cancel_request 方法处理,cancel_request 方法可以接收两个参数,一个是 route,代表一个 CallableRoute 对象,另外一个是 request,代表 Request 对象。这里我们直接调用了 route 的 abort 方法,取消了这次请求,所以最终导致的结果就是图片的加载全部取消了。
这个设置有什么用呢?其实是有用的,因为图片资源都是二进制文件,而我们在做爬取过程中可能并不想关心其具体的二进制文件的内容,可能只关心图片的 URL 是什么,所以在浏览器中是否把图片加载出来就不重要了。所以如此设置之后,我们可以提高整个页面的加载速度,提高爬取效率。
另外,利用这个功能,我们还可以将一些响应内容进行修改,比如直接修改 Response 的结果为自定义的文本文件内容,或是构造某些特定的响应返回给前端,用于测试前端在特定响应值情况下的表现。
总结
个人体验是 Playwright 相比于 Selenium 或是 Pyppeteer,安装和使用都更加简便,不需要额外的 webdriver 或是特定版本的浏览器。
最大的优点可能是 codegen 功能,能够大大缩短浏览器自动化代码编写的时间,并且其语法也相对简单,另外,记录所有页面请求、网络劫持修改等功能也非常实用。
缺点是离线环境下的部署安装可能比较麻烦,仍待研究。