Version 0.5.0
EXEC和EVAL用法示例教程:https://github.com/NaiboWang/EasySpider/wiki/EXEC%E5%92%8CEVAL%E7%94%A8%E6%B3%95%E7%A4%BA%E4%BE%8B
此版本只发布了Windows x64 与x32版本及MacOS的Apple芯片版本,欢迎试用并及时提Issue反馈Bug,因此对于其余操作系统版本,请先使用0.3.5版本。
This version has only released the Windows x64 and x32 versions and MacOS apple Silicon version, welcome to try it out and report any bugs as Issues in a timely manner. For the other operating system versions, please use version 0.3.5 for now.
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
MacOS版本需要使用下面的命令修改包属性以解决“包已损坏”的问题:
xattr -cr 你的EasySpider.app文件路径
如:
xattr -cr /Users/你的用户名/Downloads/EasySpider_MacOS_all_arch/EasySpider.app
然后再次尝试打开。
For MacOS version, the following command needs to be used to modify the package attributes to solve the "package is damaged" problem:
xattr -cr YourPathToEasySpider.app
For example:
xattr -cr /Users/YourUserName/Downloads/EasySpider_MacOS_all_arch/EasySpider.app
Then try opening it again.
Windows x64版本支持64位的Windows 10及以上系统,Windows x32版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载此版本。注意x32版本的EasySpider的Chrome浏览器永远都是109版本,不会随着Chrome版本更新而更新(为了兼容Win 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。
The Windows x64 version supports Windows 10 and above with 64-bit, while the x32 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download this version. Note that the Chrome browser in this x32 version of EasySpider is always version 109 and will not update with Chrome updates (to maintain compatibility with the Win 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems.
MacOS版请用系统自带的归档使用工具
解压,MacOS版本支持所有芯片组,包括Intel芯片(如酷睿i7) 和 Apple自研芯片(如M1,M2),注意下载对应版本的程序,且操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个Issue。
For the MacOS version, please use the system's inbuilt Archive Utility to unzip. The MacOS version supports all chipsets, including Intel chips (such as Core i7) and Apple's self-developed chips (such as M1, M2). Ensure you download the correct version of the program, and note that the minimum required version for the operating system is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。
Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.
更新说明
- 重大更新: 自定义操作增加在当前环境直接运行Python代码自定义变量和获得变量值功能,循环和判断条件同样支持自定义变量和表达式的识别:
此选项为高级功能,可以直接用Python代码操纵正在运行中的浏览器,及可以自定义整个执行环境中的变量,并对变量进行修改赋值等操作,示例:
- 用
self.browser
表示当前操作的浏览器,可直接用selenium
的API进行操作,如self.browser.find_element(By.CSS_SELECTOR, "body").send_keys(Keys.END)
即可滚动到页面最下方。 - 自定义一个全局变量:
self.myVar = 1
- 操纵上面定义的全局变量:
self.myVar = self.myVar + 1
- 打印上面定义的全局变量:
print(self.myVar)
如果想要将自己定义的变量作为字段记录,请选择下一个在执行环境下获得Python表达式值(eval操作)
选项。
此选项为高级功能,可以直接返回Python代码的表达式值,并在其他位置用Field["本操作名称"]
表示此操作返回值,示例:
- 返回当前浏览器对象的相关值,用
self.browser
表示当前操作的浏览器,可直接用selenium的API进行操作,如self.browser.find_element(By.CSS_SELECTOR, "body").text
即可返回当前页面的文字。 - 返回自定义全局变量的值:
self.myVar
- 返回条件判断的值:
self.myVar == 1
,此表达式的判断值可用于条件判断
和循环
!!!
注意此功能不能对变量进行赋值操作,即不可以写self.myVar = 1
这种,如果想要进行赋值操作,请选择上一个在执行环境下运行Python代码(exec操作)
选项。
- 一个循环文本列表内的文本可以输入多个输入框,只要对应好索引值:
- 执行阶段可设置Excel指定读取文件,并可指定Excel路径,对于一个循环文本列表中的多个字段,可读入Excel同名称多列并自动合并:
- 相对循环内的元素点击和移动到元素事件可设置相对循环内的XPath,但此功能和之前版本的任务文件存在兼容性问题,之前版本的文件需要手工修正,需要把所有使用循环内的点击元素操作的XPath设置为空才可用,因此建议直接使用新版本设计任务。
- UI重大更新:可通过拖动操作来新增操作,修改流程以及调整锚点,即新增操作,剪切元素和调整锚点操作可通过拖动实现;右键可以删除元素;可双击箭头直接调整锚点。
- 浏览器操作台右下角增加关闭操作台的按钮,以应对某些操作台会挡住验证码框或登录框的特殊场景。
- 记录字段前可选择是否清空其他非本操作定义字段的值。
- 增加跳过当前循环功能,即
Continue
功能。 - 所有的XPath均可以使用
Field["字段值"]
替换为变量值。 - 对于提取数据操作,增加重新执行任务时从上次保存的位置继续执行的功能(保存任务时设置),以解决程序意外退出必须从头跑的问题。
- OCR功能更换为
ddddocr
,无需手动安装环境并提高了OCR识别准确率。 - 修复提取数据时不保存数据多一行的bug。
- 操作执行前可设定等待某元素出现才执行。
- 可提取元素的属性值。
- 增加版权和使用协议说明。
- 全版本支持
一直向下滚动直到页面内容无变化
的功能,同时循环点击下一页的操作的退出循环条件改为找不到下一页按钮
及检测不到页面内容变化
。 - 优化日志格式。
- 增加可保存为
JSON
格式的文件的功能。 - Chrome版本更新为115。
Release Notes
-
Major Update: Added the ability to run Python code, manipulate custom variables, and retrieve variable values directly in the current environment for custom actions. Loops and conditional statements also support recognition of custom variables and expressions:
This option provides advanced functionality to manipulate the browser running in real-time using Python code. You can customize variables within the entire execution environment and perform operations such as modification and assignment. Examples:
- Use
self.browser
to refer to the current browser being operated on, and perform actions using Selenium APIs. For instance,self.browser.find_element(By.CSS_SELECTOR, "body").send_keys(Keys.END)
can scroll to the bottom of the page. - Define a global variable:
self.myVar = 1
- Manipulate the above-defined global variable:
self.myVar = self.myVar + 1
- Print the above-defined global variable:
print(self.myVar)
If you want to record your custom variables as field values, choose the next option:
Retrieve Python Expression Value in Execution Environment (eval operation)
.This option allows you to directly return the expression value of Python code and represent the return value of this operation using
Field["operation name"]
in other places. Examples:- Return relevant values of the current browser object using
self.browser
, which refers to the current browser being operated on. You can directly use Selenium APIs, e.g.,self.browser.find_element(By.CSS_SELECTOR, "body").text
to retrieve the text on the current page. - Return the value of a custom global variable:
self.myVar
- Return the value of a conditional statement:
self.myVar == 1
, the evaluation of this expression can be used forconditional statements
andloops
!!!
Note that this functionality does not support variable assignment, meaning you cannot write something likeself.myVar = 1
. If you want to perform an assignment, choose the previous option:Run Python code on current environment (the "exec" operation)
.
- Use
-
Within a loop, multiple input fields can now be associated with text from a looped list by matching corresponding index values:
-
During execution, you can set Excel files for specific reads, specifying Excel paths. For multiple fields within a looped text list, you can read multiple columns with the same name from Excel and automatically merge them:
-
Relative element clicks and move-to-element events within a loop can be set using relative XPath. However, this feature is not compatible with task files from previous versions. Previous version files need manual modification, where XPaths used for element clicks within the loop must be set to empty in order to work. It's recommended to directly use the new version's task design.
-
UI Major Update: Operations can be added, flow can be modified, and anchor points can be adjusted through drag-and-drop actions. Adding operations, cutting elements, and adjusting anchor points can all be achieved through dragging and dropping. Right-click to delete elements. Double-click arrows to directly adjust anchor points.
-
Added a close button in the bottom right corner of the browser console to handle scenarios where the console obstructs captcha or login prompts.
-
Option to clear other non-operation-defined field values before recording a field.
-
Added the feature to skip the current loop, i.e.,
Continue
functionality. -
All XPaths can be replaced with variable values using
Field["field value"]
. -
For data extraction operations, added the ability to resume execution from the last saved position when re-executing a task (set during task save), to address the issue of starting from the beginning after unexpected program termination.
-
Replaced OCR functionality with
ddddocr
, eliminating the need for manual environment installation and improving OCR recognition accuracy. -
Fixed a bug where an extra row of data wasn't saved during data extraction.
-
Set a waiting condition for an element to appear before executing an operation.
-
Can extract attribute values of elements.
-
Added copyright and usage agreement statements.
-
Full version supports the function to "Scroll down continuously until the page content remains unchanged." The exit conditions for looped operations of clicking the next page have been updated to "Next page button not found" and "Page content change not detected."
-
Optimized log formatting.
-
Added the ability to save files in JSON format.
-
Updated Chrome version to 115.