Wyzant-monitor

Extract daily information about Wyzant tutoring jobs and populate an sqlite database with the information.

Background

As a tutor with Wyzant.com I can view a list of online tutor jobs throughout the USA; specifically tutoring requests made by students in my subject areas. The list changes as new student requests are posted and as previously posted ones are removed (because a tutor connection was made, usually.) I save the list of the most recent 100 job posts each morning in a date-coded html file. These files are ingested and populate/update a Jobs database file.

By tracking a tutoring job request from the day it appears until it is gone from the list, I determine how quickly each tutor job was accepted and can then see how that time varies with parameters, e.g. with ZIP code or tutoring subject. A second database table with ZIP code as the primary key is used to include other data, currently, from the IRS Income Tax Statistics for a recent year.

Implementation

In the sqlite database I have three tables: one listing the daily files, one for the cumulative list of jobs, and one with information based on ZIP codes. For this last one, I read in IRS tax return data by ZIP code (2015, all States, includes AGI; put in IRS_2015_data/15zpallagi.csv) which ZIPload.py reads and uses the numbers of returns in different income ranges to create a number indicating the low-income level of a ZIP code (roughly the fraction of incomes <$25k minus the fraction with incomes >$100k.)

Most of the code is straight forward and based on examples from Python for Everybody. One unique aspect is the routine get_jobs_info( ) in wyzhelp.py: the parsing of the html jobs list is carried out with an explicit finite-state machine design to accomodate the variability of each posted job's html fields, etc.

Using sqlitebrowser, the sqlite data can be exported as Jobs.csv and Files.csv; these can be read into display software such as Tableau. Snapshot(s) showing the results are on my Tableau Public page .

Daily workflow:

Steps 1 and 2 are all that are required to populate the Jobs database; having the ZIP codes in the database is sufficient for Tableau to do geomapping.

Each morning: Download html page of the most recent 100 WyzAnt jobs available in my subjects. Use "Save page as..." in Chrome to save to a file, yyyy-mm-dd_A.html, in the dir WyzAntDaily_Data/ (an example dir with a few files is included here.)
Ingest the new html file(s) and update job information by running: $ python WyzIngest.py

The following step creates a word cloud from the job descriptions' text.

For word-cloud visualiztion run: $ python WyzWord.py . Output is written to gword.js ; open gword.htm in a browser to see the vizualization.

Further things to do:

Make a WyzSummary.py to list useful stats from the Jobs db, possibly selecting ones I should apply for ;-)
Add daystr to the Files table in wyzjobs.sqlite db (or do this in Tableau from the file's date): it would be a string code for the day-of-week the job was posted, i.e., the day before the html file's date (since the file is captured in the morning and most jobs are submitted the previous day/night).

Name	Name	Last commit message	Last commit date
Latest commit dan3dewey Update WyzIngest.py May 30, 2018 d6f54a3 · May 30, 2018 History 13 Commits
images	images	Initial versions of files	May 18, 2018
.gitignore	.gitignore	Initial versions of files	May 18, 2018
LICENSE	LICENSE	Initial commit	Jan 10, 2018
README.md	README.md	changed names of dirs for general use	May 21, 2018
README_Py4ECapstone.txt	README_Py4ECapstone.txt	Initial versions of files	May 18, 2018
WyzAntDaily_Data_Feb18.tar.gz	WyzAntDaily_Data_Feb18.tar.gz	mentioned finite state machine; added data dir example files	May 21, 2018
WyzIngest.py	WyzIngest.py	Update WyzIngest.py	May 30, 2018
WyzWord.py	WyzWord.py	Initial versions of files	May 18, 2018
ZIPload.py	ZIPload.py	Initial versions of files	May 18, 2018
d3.layout.cloud.js	d3.layout.cloud.js	Initial versions of files	May 18, 2018
d3.v2.js	d3.v2.js	Initial versions of files	May 18, 2018
gword.htm	gword.htm	Initial versions of files	May 18, 2018
gword.js	gword.js	Initial versions of files	May 18, 2018
wyzhelp.py	wyzhelp.py	Initial versions of files	May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wyzant-monitor

Background

Implementation

Daily workflow:

Further things to do:

About

Releases

Packages

Languages

License

dan3dewey/Wyzant-monitor

Folders and files

Latest commit

History

Repository files navigation

Wyzant-monitor

Background

Implementation

Daily workflow:

Further things to do:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages