Skip to content

PHP proxy that converts webpages to json objects, filtered by jquery-like selectors.

License

Notifications You must be signed in to change notification settings

keyle/json-anything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

json-anything

PHP proxy that converts webpages to json objects, filtered by jquery-like selectors (Built on top of PHPQuery).

Takes parameters from GET or POST.

Think of it as a 'shotgun api'. Because many sites still don't provide apis.

MIT LICENSE.

##Parameters:

###debug:

This is a debug flag.

If set, it will var_dump the results for improved readibility. If not, a json-encoded object is returned.

###url:

The url to go scrape and make JSON from.

For complex urls already using GET (ie. some search engines) you can POST url instead.

EXAMPLE

?url=http://www.bom.gov.au/

###sel:

Comma separated list of selectors, in a jQuery fashion

PHP can't parse # from a url as it's never sent to the server ('url fragments').

Because of this, use % instead of #.

You can, but are not forced to, use two underscores (__) for a space (so it doesn't look like %20)

Get attributes by using a '|'

By default it will return 'text()'

a        ---> will return all text of the A tags
a|html   ---> will return all html contents of the A tags
a|href   ---> will return all href attribute of the A tags
img|src  ---> will return all src attribute of the img tags

EXAMPLES

span.hey ---> will return all spans with class hey
span%hey ---> will return all spans with id hey (I know)
div%pad table:first a ---> div#pad table:first a

Complex Example using __ instead of space:

Within #results, will return the 4th column, 2nd span, b's html

%results__td:nth-child(4)__span:nth-child(2)__b|html

##Real world examples

###Latest currencies $ from Reuters

index.php?url=http://uk.reuters.com/business/currencies&sel=%currPairs__td:first-child__a,%currPairs__td:first-child__a|href,%currPairs__td:nth- child(2)

{
  "url":"http://uk.reuters.com/business/currencies",
  "sel":"#currPairs td:first-child a,#currPairs td:first-child a|href,#currPairs td:nth-child(2)",
  "results":[
    [
      "GBP/USD",
      "/business/currencies/quote?srcAmt=1&srcCurr=GBP&destAmt=&destCurr=USD",
      "1.6299"
    ],
    [
      "GBP/EUR",
      "/business/currencies/quote?srcAmt=1&srcCurr=GBP&destAmt=&destCurr=EUR",
      "1.1348"
    ],

    ...
  ]
}

###Hacker News

There is an unofficial api but anyway, this works too.

http://localhost/json-anything/?url=http://news.ycombinator.com/&sel=.title__a:first-child,.title__a:first-child|href

{
  "url":"http://news.ycombinator.com/",
  "sel":".title a:first-child,.title a:first-child|href",
  "results":[
    [
      "Octopress - A blogging framework for hackers",
      "http://octopress.org/"
    ],
    [
      "NoSQL is What?",
      "http://blog.zawodny.com/2011/07/23/nosql-is-what/"
    ],
    [
      "Radio Shack to start stocking Arduino, Other Goodies",
      "http://blog.radioshack.com/2011/07/21/top-ten-diy-suggestions-from-you/"
    ],

    ...
  ]
}

###Australia Bureau of Meteorology (bom)

The bom is known for not having decent a api.

Take a look at its home page http://www.bom.gov.au/ ... We're going to grab the Forecast as presented for the major cities.

http://localhost/json-anything/?url=http://www.bom.gov.au/&sel=%pad__table:first__a,%pad__table:first__.max,%pad__table:first__td:last-child

url=http://www.bom.gov.au/
%pad__table:first__a, (grab the #pad then the first table's A tags text)
%pad__table:first__.max, (the maximas, has class .max)
%pad__table:first__td:last-child (grab the last TD of the first table)

{
  "url":"http://www.bom.gov.au/",
  "sel":"#pad table:first a,#pad table:first .max,#pad table:first td:last-child",
  "results":[
    [
      "Sydney",
      "17\u00b0",
      "Possible shower clearing."
    ],
    [
      "Melbourne",
      "14\u00b0",
      "Shower or two."
    ],
    [
      "Brisbane",
      "23\u00b0",
      "Fine."
    ],

    ...
  ]
}

(\u00b0 is the unicode for the DEGREE sign)

##Feel free to improve

Not much of this is perfect, it comes from flaws in its quirky design, also the PHP could be improved.

If you have some PHP experience in your legs and want to improve the code and/or add features, I'd gladly take changes (fork away!).

About

PHP proxy that converts webpages to json objects, filtered by jquery-like selectors.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages