Collection of scraper scripts written in PHP that grab the decisions from public legal websites, showing different levels of complexity to obtain them.
- PHP 7.1 (enhance speed and type hinting)
- PHP DOM library (query the HTML structure accurately)
- Regular Expressions (obtain name and ID of each case)
https://appellate.nccourts.org/opinion-filings/?c=sc.
Method used: GET
.
Grabs the decisions from the current year.
After checking the structure of the website form to parse the archive, same URL is being used, with the only difference that a new query string to search by year is appended, giving the search range between 1998 and the current year.
http://iapps.courts.state.ny.us/lawReporting/Search?searchType=opinion
Method used: POST
.
The URL shows a form to grab decisions from different NY courts. So, tampering the information using the Firefox add-on Tamper Data for FF Quantum, the system submits a POST call to update the list of cases.
The scripts tries to search from 1998 to the current year using a start and end dates, up to the current date today. All the documents grabbed are in HTML format
Run php index.php
at the root of this project, and you will be presented with a menu of options.
Choose one of them, or exit just hit Enter.
- Include SQL database schema.
- Add classes and commands to save results in database using PDO.