Skip to content

A collection of PHP scrapers that gather decision data from free legal courts in US

License

Notifications You must be signed in to change notification settings

rafa8626/scrapers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapers

Collection of scraper scripts written in PHP that grab the decisions from public legal websites, showing different levels of complexity to obtain them.

Resources

  1. PHP 7.1 (enhance speed and type hinting)
  2. PHP DOM library (query the HTML structure accurately)
  3. Regular Expressions (obtain name and ID of each case)

List of scrapers

North Carolina - Supreme Court

https://appellate.nccourts.org/opinion-filings/?c=sc.

Method used: GET.

Grabs the decisions from the current year.

After checking the structure of the website form to parse the archive, same URL is being used, with the only difference that a new query string to search by year is appended, giving the search range between 1998 and the current year.

New York - Court of Appeals

http://iapps.courts.state.ny.us/lawReporting/Search?searchType=opinion

Method used: POST.

The URL shows a form to grab decisions from different NY courts. So, tampering the information using the Firefox add-on Tamper Data for FF Quantum, the system submits a POST call to update the list of cases.

The scripts tries to search from 1998 to the current year using a start and end dates, up to the current date today. All the documents grabbed are in HTML format

How to execute?

Run php index.php at the root of this project, and you will be presented with a menu of options. Choose one of them, or exit just hit Enter.

TODO

  1. Include SQL database schema.
  2. Add classes and commands to save results in database using PDO.

About

A collection of PHP scrapers that gather decision data from free legal courts in US

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages