google-scholar-export is a Python library for scraping Google scholar profiles to generate a HTML publication lists.
Currently, the profile can be scraped from either the Scholar user id, or the Scholar profile URL, resulting in a list of the following:
- Publication title
- Publication authors
- Journal information (name, issue no., vol.)
- Date
- Url to the Scholar publication
- The number of citations according to Scholar
The resulting html is formatted like:
<p>Publications (<b>20</b>) last scraped for <a href="https://scholar.google.co.uk/citations?user=JicYPdAAAAAJ&hl=en">Geoffrey Hinton</a> on <b>2019-08-11</b>
using <a href="https://github.com/TWRogers/google-scholar-export">google-scholar-export</a>.</p>
<div class="card">
<div class="card-publication">
<div class="card-body card-body-left">
<h4><a href="https://scholar.google.co.uk/citations?user=JicYPdAAAAAJ&hl=en#d=gs_md_cita-d&u=%2Fcitations%3Fview_op%3Dview_citation%26hl%3Den%26oe%3DASCII%26user%3DJicYPdAAAAAJ%26citation_for_view%3DJicYPdAAAAAJ%3AGFxP56DSvIMC">Learning internal representations by error-propagation</a></h4>
<p style="font-style: italic;">by DE Rumelhart, GE Hinton, RJ Williams</p>
<p><b>Parallel Distributed Processing: Explorations in the Microstructure of …</b></p>
</div>
</div>
<div class="card-footer">
<small class="text-muted">Published in <b>1986</b> |
<a href="https://scholar.google.co.uk/scholar?oi=bibs&hl=en&oe=ASCII&cites=1374659557399191249,4574189560556662535,10453698013284960354,12541410141153091507,7476519782727404507,1722523513356915749,6822548856209813074,4464353390709992638,15344233312479649775">Citations: <b>62260</b></a></small>
</div>
</div>
...
And is primarily aimed at people using Bootstrap.
It is possible to modify the html for each publication by modifying PAPER_TEMPLATE
in ./exporter/exporter.py
Generating lists of publications for static websites is a pain. Google Scholar, popular amongst academics, is great at tracking publications and citations. However, it does not have an API.
There are some other libraries:
However, both of these are php based, and not useful for static sites.
The purpose of this repository is to allow generation of static html code directly from your Google Scholar profile. This code can be run manually, or at website build time to update the publications list.
Here is an example that utilises this library: twrogers.github.io/projects.html
The aim is eventually to develop a JS version of this library.
Install the relevant requirements as usual from the root of this repository:
pip install -r ./requirements.txt
This code was written and tested in Python 3.5.2.
To export to html from a Google Scholar user id, do the following:
from exporter import ScholarExporter
# Example from user id:
s = ScholarExporter.from_user('JicYPdAAAAAJ') # Geoffrey Hinton user
s.export('index.html')
To export to html from a Google Scholar profile url, do the following:
from exporter import ScholarExporter
# Example from url:
s = ScholarExporter('https://scholar.google.co.uk/citations?user=JicYPdAAAAAJ&hl=en') # Geoffrey Hinton url
s.export('index.html')
- Add example CSS style sheet
- Fix
IndexError
andAttributeErrors
that sometimes occur. - Add other export options
Copyright 2019 Thomas W. Rogers. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0