Frugally is a smarter online shopping experience. Minimize your time spent staring at a screen by having all of the best deals from all of your favorite vendors in one convenient place. Whats more, we have our own algorithms set in place to recommend you deals based upon your previous purchases. Create an account today to take full advantage!
Frugally is a simple design that employs the use of Selenium's web scraping libraries to gather all of the best deals from a set list of vendors. By handing this data off to our full stack web server (Hardware owned!), we are able to create a truly unique shopping experience that is curated to the user. All of your personal data is stored at home, right here in Philadelphia Pennsylvania- and in future iterations, will be stored in the safest place on earth; the Blockchain.
Frugally is an entirely free service to the user, however feel free to buy your favorite team of engineers and entrepreneurs a cup of coffee by donating to our site https://www.patreon.com/frugallyio?fan_landing=true. If you would like to get involved or have any suggestions, email us at [email protected].
And then came that girl who rowed in the dark. Each night she paused to relay her coordinates, how her body was performing, and the atmospheric conditions. Often she noted things-the outlines of birds migrating at night, a whale shark seining for krill off her bow. She had, she said, a growing ability to dream while she rowed.
The Orphan Master's Son by Adam Johnson
- Frontend
- Backend
- Database
- Scraping
- Logs
- Linux Server Configurations
- UNIX Basics for Navigating the Server over SSH
- Guide for Steve
All frontend code is kept under the /static and /templates directories of the github repository, and is managed by the Flask framework. All code is written in HTML/CSS/JS and Jinja2 injection. Template courtesy of Templated Co.
This section details the code that can be found on the Github Repository, as well as under the /var/www/Frugally/Frugally
directory on the server
Frugally/
├── DBqueries.py
├── README.md
├── __init__.py
├── chromedriver
├── uniqueIPs.c
├── php
├── nordstromracksales/
│ └── nordstromracksales/
| ├── items.py
| ├── middlewares.py
| ├── pipelines.py
| ├── settings.py
| ├── module/
| │ ├── __init__.py
| │ ├── items.py.tmpl
| │ ├── middlewares.py.tmpl
| │ ├── pipelines.py.tmpl
| │ ├── settings.py.tmpl
| │ └── spiders/
│ | └── __init__.py
| └── spiders/
| ├── __init__.py
| ├── adidas_spider.py
| ├── asos_spider.py
| ├── newbalance_spider.py
| ├── nike_spider_men.py
| ├── nike_spider_women.py
| ├── nordstromrack_spider.py
| ├── nordstromrack_spider_men.py
| ├── nordstromrack_spider_women.py
| ├── patagonia_spider.py
│ └── urbanoutfitters_spider.py
├── static/
│ ├── css/
| │ ├── font-awesome.min.css
| │ ├── skel.css
| │ ├── style-mobile.css
| │ ├── style-narrow.css
| │ ├── style-narrower.css
| │ ├── style-normal.css
| │ ├── style-wide.css
| │ └── style.css
│ ├── fonts/
| │ ├── FontAwesome.otf
| │ ├── fontawesome-webfont.eot
| │ ├── fontawesome-webfont.svg
| │ ├── fontawesome-webfont.ttf
| │ └── fontawesome-webfont.woff
│ ├── images/
│ └── js/
| ├── init.js
| ├── jquery.droptron.min.js
| ├── jquery.min.js
| ├── skel-layers.min.js
| └── skel.min.js
└── templates/
├── LICENSE.txt
├── google5e9dcfe4850ad995.html
├── index.html
├── login.html
├── mens.html
├── womens.html
├── 500.html
├── robots.txt
├── sitemap.xml
└── about.html
This section details the MySQL Database used to hold all of our scraped product listings on the site
- MySQL Config:
sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
- MySQL Login Command:
/usr/bin/mysql -u frugally -p
- MySQL Data Directory Command:
select @@datadir;
- Shows where the Databases are located - MySQL Database Status Command:
SHOW ENGINE INNODB STATUS\G
This database is transient and is only meant to hold the latest scraped products
show tables;
+------------------------+
| Tables_in_Frugally |
+------------------------+
| NikeMen |
| NikeMenTemp |
| NikeWomen |
| NikeWomenTemp |
| NordstromRackMen |
| NordstromRackMenTemp |
| NordstromRackWomen |
| NordstromRackWomenTemp |
+------------------------+
Temp tables are filled while scrapers are running, then swapped in for original tables once scrapers are done. The old original tables are then dropped.
describe NikeMen;
+-------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| vendor | varchar(20) | YES | | NULL | |
| gender | varchar(5) | YES | | NULL | |
| title | varchar(1000) | YES | | NULL | |
| brand | varchar(100) | YES | | NULL | |
| retailprice | float | YES | | NULL | |
| price | float | YES | | NULL | |
| discount | int(11) | YES | | NULL | |
| imagelink | varchar(10000) | YES | | NULL | |
| link | varchar(10000) | YES | | NULL | |
+-------------+----------------+------+-----+---------+-------+
These tables are persistent and are meant to power the recommendations algorithms
show tables;
+-----------------------+
| Tables_in_BigDataDave |
+-----------------------+
| LinksClicked |
| ProductHistory |
| Users |
+-----------------------+
This section details the web scraping.
The Scraping spiders are run via PHP scripts found in the /php directory of the github repo. Each PHP script is placed inside of a crontab and run every 2h. The Frugally tables are flushed each time a spider finishes scraping.
all logs are stored under /var/www/Frugally/logs
- Apache2 logs:
error.log
,access.log
,ssl_error.log
,ssl_access.log
- Crontab PHP Scraping logs:
crontab_*.log
where * is an abbreviation for the spider - Flask log:
flask.log
- Information can be logged using
app.logger.info()
in the__init__.py
file
- Information can be logged using
list crontab configurations sudo crontab -e
grep CRON /var/log/syslog
Need to update SSL certificates every 60 days
For some reason, when the server undergoes a hard restart, the firewall of the physical server resets. Below are the necessary commands to get back up and running. Don't mess with this unless you know what you are doing. These are now managed automatically by crontabs
iptables-save > iptables.dump
iptables-restore < iptables.dump
iptables -L
iptables -D INPUT <index of the REJECT rule to delete.>
sudo iptables -A INPUT -p tcp -m multiport --dports 80,443 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT
sudo iptables -A OUTPUT -p tcp -m multiport --dports 80,443 -m conntrack --ctstate ESTABLISHED -j ACCEPT
This whole process is the biggest pain. be patient and methodical when configuring apache settings, debugging is limited. make sure the development server is ready for production first, then place the whole thing ontop of apache to minimize the possible number of errors going forward. good luck.
frugally.wsgi (make sure you install wsgi for python3)
#!/usr/bin/python3
import sys
import logging
logging.basicConfig(stream=sys.stderr)
sys.path.insert(0,"/var/www/Frugally/")
from Frugally import app as application
/etc/apache2/sites-available/Frugally.conf
<VirtualHost *:80>
#ServerName 192.168.1.235
ServerName frugally.io
ServerAdmin [email protected]
ErrorLog /var/www/Frugally/logs/error.log
CustomLog /var/www/Frugally/logs/access.log combined
LogLevel info
WSGIDaemonProcess frugally user=www-data group=www-data threads=5
WSGIProcessGroup frugally
WSGIScriptAlias / /var/www/Frugally/frugally.wsgi
<Directory /var/www/Frugally/Frugally/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
Alias /static /var/www/Frugally/Frugally/static
<Directory /var/www/Frugally/Frugally/static/>
Require all granted
</Directory>
</VirtualHost>
<VirtualHost *:443>
#ServerName 192.168.1.235
ServerName frugally.io
ServerAdmin [email protected]
ErrorLog /var/www/Frugally/logs/ssl_error.log
CustomLog /var/www/Frugally/logs/ssl_access.log combined
LogLevel info
SSLEngine on
SSLCertificateFile /var/www/Frugally/frugally.io-ssl-bundle/domain.cert.pem
SSLCertificateKeyFile /var/www/Frugally/frugally.io-ssl-bundle/private.key.pem
SSLCertificateChainFile /var/www/Frugally/frugally.io-ssl-bundle/intermediate.$
WSGIProcessGroup frugally
WSGIScriptAlias / /var/www/Frugally/frugally.wsgi
<Directory /var/www/Frugally/Frugally/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
Alias /static /var/www/Frugally/Frugally/static
<Directory /var/www/Frugally/Frugally/static/>
Require all granted
</Directory>
</VirtualHost>
This section lists a few essential unix commands needed for navigating through the Frugally Server
WARNING This command will restart the physical hardware server, so make sure you save any work and that you know what you are doing when you run this command. You will lose SSH connection upon execution, and will have to wait a few minutes before it fully boots back up.
sudo restart
sudo systemctl restart <component name>
apache2
: restarts the web servermysql
: restarts the database- You can also swap
restart
forstart
stop
orstatus
- Git Directory:
/var/www/Frugally/Frugally
- One directory up are configuration files related to the backend components, and generally should not be touched. - Flask Logs:
/var/www/Frugally/Frugally/flask.log
- Apache Logs:
/var/www/Frugally/logs
- WSGI Config:
/var/www/Frugally/frugally.wsgi
- Apacahe Config:
/etc/apache2/sites-available/Frugally.conf
- MySQL Config:
sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
- MySQL Login Command:
/usr/bin/mysql -u frugally -p
- MySQL Data Directory Command:
select @@datadir;
- Shows where the Databases are located
cd <directory>
- change directorypwd
- shows the present working directorydir
- shows the contents of the current directorydf
- shows disk space on the root, disk1 and disk2nano <filename>
- text editorcp <source> <destination>
- copy filerm <filename>
- delete filemv <source> <destination>
- move or rename file
sudo git pull
- downloads the latest code from the master branchsudo git add .
- add all files to be staged for commitgit status
- show the status of the git directorysudo git commit -am "message content"
- commit a change to be made to the git directorysudo git push origin <branch name>
- push the commit up to github branch name
- Log on to the server
- change directory
cd /var/www/Frugally/Frugally
- view contents of directory
dir
- open the DBqueries.py file
sudo nano DBqueries.py
- make edits to
getSQL
functions - save and exit by pressing
ctrl+x
or just savectrl+s
- restart the server to test your changes
sudo systemctl restart apache2
- navigate to the website
- If you get an error page:
- check the frugally error logs
sudo nano flask.log
and scroll to the bottom. This should give you a standard python error. - be sure to delete the file afterwards to reset it
sudo rm flask.log
- check the apache error logs:
- first go up one directory
cd ..
- then change directories
cd logs
view the contents of the directorydir
- check the
error.log
file as well as thessl_error.log
by typingsudo nano error.log
and scrolling to the bottom. - feel free to delete these files afterwards as well.
- first go up one directory
- check the frugally error logs
If you find that still, after all of that, you dont have a clue as to why your code isnt working, this is completley common. I have banged my head on my desk trying to get the stupid error logs to work, and they still dont log some errors. Just try to think it through.
- place
app.logger.info("<enter debugging message here>")
anywhere in the init.py file or DBqueries.py to print information to the flask.log file. If all else fails, this can end up being your guardian angel.
best of luck and godspeed.
- Create SQL Statements that filter content based on user input.
In DBqueries.py you will find the getSQLdiscount
function. This function will return an array of products, sorted from best discount to lowest discount, with any combination of filters applied.
the filters parameter should look something like [[gender, m/f], [vendor, [nike, nordstrom]], [brand, [burberry, guess, zara ...]]]
unpack the filters and insert them into the corresponding SQL statements so that the database returns the correct set of products. You should have multiple SQL statements, one for each table (in other words one for each vendor). To collect the results, just append the item variable by item = item + cursor.fetchall()
after each SQL statement. then at the very end, return item.
# The goal of this function is to return a set of products
# whose attributes match that of the filters
# and are sorted in order of highest discount to lowest
def getSQLdiscount(filters):
# Parse out the filters
if(filters!=None):
gender = str(filters[0][1]).lower()
filtervendor = filters[1][1]
filterbrands = filters[2][1]
else:
gender = "all"
filtervendor = "all"
filterbrands = "all"
cursor = conn.cursor()
cursor.execute('SELECT * FROM NordstromRackMen WHERE', gender)
item = cursor.fetchall()
once again the login for mysql is /usr/bin/mysql -u frugally -p
then just enter the super secret password.
If you want to test your query before putting it in the DBqueries.py file (which I would recommend)
- login to the mysql server
/usr/bin/mysql -u frugally -p
use Frugally
show tables;
anddescribe [tablename];
- write the sql statements here and see what kind of output you get
- Ex.
SELECT * FROM NordstromRackMen WHERE gender='male', brand='burberry' OR brand='guess' OR brand='zara' ORDER BY discount;
- Ex.
Machine Learning link: https://amulyayadav.github.io/DS442/