Skip to content

Version 0.3.5

Compare
Choose a tag to compare
@NaiboWang NaiboWang released this 05 Jul 21:50
· 413 commits to master since this release

如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址

有关Windows x64位版本部分情况下无法采集链接地址的说明:#128

Explanation about the issue where the link address cannot be collected in some cases on Windows x64 version: #128

Windows x64版本支持64位的Windows 10及以上系统,Windows x86版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载此版本。注意x86版本的EasySpider的Chrome浏览器永远都是109,不会随着Chrome版本更新而更新(为了兼容Win 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。

The Windows x64 version supports Windows 10 and above with 64-bit, while the x86 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download this version. Note that the Chrome browser in this x86 version of EasySpider is always version 109 and will not update with Chrome updates (to maintain compatibility with the Win 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems.

MacOS版请用系统自带的归档使用工具解压,MacOS版本支持所有芯片组,包括Intel和M1,M2等处理器,但操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本Mac版使用,或自行下载代码并编译,示例编译方式看这个Issue

The MacOS version supports all chipsets, including Intel, M1, M2, and other processors. However, the minimum operating system requirement is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue. Please unzip the .tar.gz file by the Arxiv Utility software.

同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。

Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.

更新说明

  1. 提速:极大的提升了大部分场景的采集速度。
  2. 所有写JavaScript/系统命令代码语句的地方以及打开网页的链接池,都可以用Field["参数名"]表示最近提取到的页面参数值/自定义操作返回,即实现了全面的变量功能。
  3. 循环中可以在任意位置使用自定义操作退出循环选项直接退出循环,即添加了Break功能。
  1. 可以提取在<iframe>标签内的数据。
  2. 增加暂停执行任务功能,可长按键盘p键暂停和继续执行任务。
  3. (Windows x64可用,其余系统请等待下个版本)增加“一直向下滚动直到页面内容无变化”的功能,同时循环点击下一页的操作的退出循环条件改为找不到下一页按钮及检测不到页面内容变化
  4. 执行阶段也可以使用XPath Helper扩展来调试XPath,配合上面的暂停功能使用。
  5. 可导出为Excel/TXT文件,可写入MySQL数据库,可指定数据类型为整数/小数/日期等,点此查看MySQL写入教程
  6. 调用任务时的输入参数值可以通过读取Excel文件替换。
  7. 浏览器操作台可通过左上角拖动改变大小。
  1. 提取数据的字段可设置为不保存(适用于只想将此字段作为变量输入的情况)。
  2. 输入文字操作后可用<enter><ENTER>表示硬回车,即输入完成后在当前文本框按回车。
  3. 可以模拟手机端浏览器运行。
  4. (只支持Windows x64版本)可处理和采集针对被Cloudflare的验证码保护的变态网站,点此查看视频教程
  5. 新增默认索引位置使用last()从后往前数的XPath提示。
  6. 操作后等待时长可设置为设定时间的50%-150%的随机等待。
  7. 软件包内自带python源代码以供专业人士修改任务流程和调试。
  8. 打开网页的高级操作支持获取当前页面Cookies,并可修改Cookies。
  1. 更改点击元素方式,真正模拟现实世界鼠标点击操作。
  2. 通用参数设置:每采集多少条本地写入一次,默认为10;控制栏预览数据长度,默认为15等。
  3. 压缩任务文件大小。
  4. 保存名称和位置更改,默认文件保存路径是Data/Task_ID,想要保存到其他路径,可以用../../这种形式进行相对路径引用,比如../../JS表示保存的的文件名是JS,保存位置为和Data文件夹同一级目录的文件夹,即EasySpider主文件夹。
  5. 流程图和选项配置自动刷新,无需点击确定按钮,但仍需手动保存任务。
  6. 源代码优化,使二次开发更容易。
  7. Bug修复:如执行系统命令如果失败会打印错误信息,修复了MacOS和Linux下系统命令执行失败的Bug;URL格式判断,累计增长的字段名索引值不正确等Bug。
  8. 屏蔽无关日志信息,执行界面更清爽。

Update Instruction

  1. Speed up: Greatly improved the collection speed in most scenarios.
  2. Variable Functionality: In all places where you write JavaScript/system command code statements and open web page links, you can use Field["parameter_name"] to represent the recently extracted page parameter value/custom operation return. This provides comprehensive variable functionality.
  3. Loop Control: During a loop, you can use the exit loop option of custom operation at any position to directly exit the loop, that is, the Break function has been added.
  4. Data Extraction: Data within <iframe> tags can be extracted.
  5. Task Control: Added pause execution task feature, you can long press the p key on the keyboard to pause and continue execution.
  6. (Windows x64 only now, other OS please wait for the next version) Add a "Keep scrolling until the page content does not change" feature, and modify the loop exit condition of repeatedly clicking the next page operation to "unable to find the next page button" and "page content doesn't change".
  7. XPath Debugging: You can also use XPath Helper extension to debug XPath during the execution stage, which can be used in conjunction with the pause feature above.
  8. Data Export and Writing: Can be exported to Excel/TXT files, can be written to MySQL databases, can specify data types as integer/decimal/date, etc., click here to view MySQL writing tutorial.
  9. Parameter Handling: The input parameter values when calling tasks can be replaced by reading Excel files.
  10. Interface Adjustment: The browser operation console can be resized by dragging the top left corner.
  11. Data Handling: Fields for extracting data can be set to not be saved (suitable for cases where you only want to use this field as a variable input).
  12. Text Input: After entering text operation, <enter> or <ENTER> can be used to represent a hard return, that is, press enter in the current text box after entering.
  13. Device Simulation: Can simulate mobile browser running.
  14. (Not Stable) Cloudflare Handling: Capable of handling and collecting data from websites protected by Cloudflare's captcha, click here to view the video tutorial.
  15. XPath Indexing: Added a hint for using last() from the back as the default index position in XPath.
  16. Wait Time Control: The waiting time after the operation can be set to 50%-150% of the set time for random waiting.
  17. Source Code Included: The software package comes with Python source code for professionals to modify the task process and debugging.
  18. Cookie Handling: The advanced operations of open webpage support getting the current page Cookie and can modify Cookie.
  19. Click Simulation: Change the way to click elements, truly simulating real-world mouse click operations.
  20. General Parameter Settings: General parameter settings: how many times to write locally for each collection, the default is 10; control bar preview data length, the default is 15, etc.
  21. File Compression: Compressed task file size.
  22. Name and Location Changes: The default file save path is Data/Task_ID. If you want to save to a different path, use relative path referencing like ../../. For example, if the file name is JS and you want to save it in a folder at the same level as the Data folder, which is the EasySpider main folder, you can use ../../JS as the relative path.
  23. Flowchart Updates: Automatic update and refresh of the flowchart, no need to click the Confirm button.
  24. Source Code Optimization: Source code optimization, making secondary development easier.
  25. Bug Fixes: Bug fixes: such as printing error information if the execution of system commands fails, fixing the bug of system command execution failure under MacOS and Linux; URL format judgment and other bugs.
  26. Filter irrelevant log information for a cleaner interface execution.