Skip to content

Commit

Permalink
Merge pull request #17 from NHSDigital/websitecheckerupdate
Browse files Browse the repository at this point in the history
updated github action for ease of use
  • Loading branch information
ryma2fhir authored Oct 5, 2023
2 parents 4070b1c + 975753d commit 359a56a
Show file tree
Hide file tree
Showing 7 changed files with 75 additions and 110 deletions.
29 changes: 0 additions & 29 deletions .github/workflows/errorChecker.yml

This file was deleted.

28 changes: 0 additions & 28 deletions .github/workflows/linkchecker.yml

This file was deleted.

36 changes: 0 additions & 36 deletions .github/workflows/spellChecker.yml

This file was deleted.

65 changes: 65 additions & 0 deletions .github/workflows/websiteChecker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
name: Simplifier IG Website Checking
on:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
inputs:
websiteurl:
default: "https://simplifier.net/guide/uk-core-implementation-guidance-directory?version=current"
jobs:
job1:
name: html error checker
runs-on: ubuntu-latest
steps:
- name: Checkout repo content
uses: actions/checkout@v3
- name: Set up python
uses: actions/setup-python@v4
with:
python-version: 3.x
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r ./IGPageContentValidator/requirements.txt
- name: Execute HTML Error Check
run: INPUT_STORE=${{ github.event.inputs.websiteurl }} python ./IGPageContentValidator/errorChecker.py
job2:
name: url link checker
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout@v3
- name: Install dependencies
run: |
sudo apt install python3-bs4 python3-dnspython python3-requests
pip3 install linkchecker
- name: Check input link is valid
run: >
echo 'exit codes can be found at
https://everything.curl.dev/usingcurl/returns'
curl ${{ github.event.inputs.websiteurl }} -s -f -o /dev/null
- name: Execute Link Check
run: >
linkchecker -r 2 --check-extern --no-status -f
./IGPageContentValidator/linkcheckerrc ${{ github.event.inputs.websiteurl }} || test $? = 1;
job3:
name: spell checker
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout@v3
- name: Set up python
uses: actions/setup-python@v4
with:
python-version: 3.x
- name: Install dependencies
run: |
sudo apt install aspell
python -m pip install --upgrade pip
pip install -r ./IGPageContentValidator/requirements.txt
- name: execute relToAbsLinks.py
run: INPUT_STORE=${{ github.event.inputs.websiteurl }} python ./IGPageContentValidator/relToAbsLinks.py

- name: Execute Spell Check
run: cat OutputLinks.txt | while read p; do wget -nv -O - $p | aspell list -H --camel-case --lang en_GB --add-html-skip=nocheck -p ./IGPageContentValidator/.aspell.en.pws |sort| uniq -c; echo -e '\n'; done;
16 changes: 8 additions & 8 deletions IGPageContentValidator/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Simplifier Implementation Guide Page Content Validation

The validator works by scraping the webpage within website.txt for any internal webpage links within the Simplifier Guide. These webpages are then validated individually.
The validator works by scraping the webpage for any internal webpage links within the Simplifier Guide. These webpages are then validated individually.

The website validation is in three parts:
- HTML Error Checking - This checks each page for any html errors. This captures any errors caused by using Simplifier relative links, e.g `{{pagelink: }}`, amongst the usual coding errors.
Expand All @@ -9,13 +9,13 @@ The website validation is in three parts:

## Instructions

1. Edit the file `website.txt` ensuring the website you want scraped is entered on the first line. Note: Only Simplifier.net guides will work with this checker.
2. Click the `Actions` button. the top 3 actions will be the individual checkers needed. Wait until there is a green tick next to each.
3. Within each Action click the `Build` button
4. Within the Build click the following for the results:
- HTML Error Check
- Link Check
- Spell Check
1. Go to [Actions..websiteChecker](https://github.com/NHSDigital/IOPS-FHIR-Test-Scripts/actions/workflows/websiteChecker.yml)
2. Click `Run workflow`.
3. Enter the website url into the `websiteurl` box and click `Run workflow`.
4. Click on the action and then click on the following for the results:
- html error checker
- link checker
- spell checker

## HTML Error Checking
Uses the errorChecker.py script. Checks for any html errors on a website using BeautifulSoup's `find_all('div',{'class':"error"})`. This returns the errors for each individual page.
Expand Down
4 changes: 2 additions & 2 deletions IGPageContentValidator/linkScraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@

from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests # this module helps us to download a web page
import os

with open('./IGPageContentValidator/website.txt', 'r') as file:
data = file.readline().strip('\n')
data = os.environ['INPUT_STORE']

'''returns html page of link within website.txt'''
def RequestData(url):
Expand Down
7 changes: 0 additions & 7 deletions IGPageContentValidator/website.txt

This file was deleted.

0 comments on commit 359a56a

Please sign in to comment.