Skip to content

Installation instructions

Niladree Bhattacharjee edited this page Nov 17, 2015 · 54 revisions

These detailed installation instructions are intended for RHEL6.

Pre-requisites

As outlined in the README, w3act requires Java, Play, and by default a PostgreSQL database to connect to.

Java

Java installation is carried out manually and uses the Oracle jdk-7u45-linux-x64.tar.gz, which means that our 'java -version' is "build 1.7.0_45-b18". However, it is assumed that any Java v7 JDK is appropriate. This version should be the system default Java service.

Alternatively use the following commands:

# cd /home/ait/Downloads/
# wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F"    "http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz"

Open the java JDK archive:

# tar xzf jdk-7u45-linux-x64.tar.gz\?AuthParam\=1389315645_23d791afdb9b9b12eac71a5702d01ed1 

Set up this java version as the main java version:

# alternatives --install /usr/bin/java java /home/ait/Downloads/jdk1.7.0_45/bin/java 2
# alternatives --config java
# alternatives --install /usr/bin/java java /home/ait/Downloads/jdk1.7.0_45/bin/java 2
# alternatives --config java

Check the result via java version:

# java -version

Play

Play is manually installed, using 'activator-dist-1.3.5.zip' from http://www.playframework.com/download. Note this play version is 2.3.x and should be the system default Play service.

Bootstrap

In order to include bootstrap directory use commands:

# git submodule init
# git submodule update

PostgreSQL

Our PostgreSQL installation is internal, but principally:

# yum install http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-redhat93-9.3-1.noarch.rpm
# yum install postgresql93-server
# chkconfig postgresql-9.3 on

PostgreSQL needs to be initialised:

# service postgresql-9.3 initdb
# service postgresql-9.3 start
  • this should start PostgreSQL on default port 5432

PostgreSQL user authentication also needs to be amended. Edit '(postgres installation directory)/pg_hba.conf':

local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust

After editing service restart is required:

# service postgresql-9.3 restart

Check that PostgreSQL is running on port 5432:

# netstat -ant | grep 5432

Once PostgreSQL is installed, the user 'training' should be created and then the 'w3act' database created, for which the 'training' user should be defined as owner. Run:

$ su - postgres -c "createuser --superuser training"
$ su - postgres -c "createdb --owner=training --username=training w3act"
$ su - postgres -c "psql -c 'grant all on database w3act to training' "

Maxmind

There is an additional requirement for this w3act service, which is a Maxmind GeoIP2 database. However, this is pre-packaged in this repository and there is no further configuration required.

Whois

Whois lookup is a service for mapping between domain name and country. It is pre-packaged in this repository in the "lib" folder and there is no further configuration required. If necessary, installation guidelines can be found in README.

Crawl Permission Request via E-mail

Configuration settings for sending e-mail via SMTP is required. That should be done in the project configuration file 'w3act.properties". This file supports e-mail configuration and sending. These settings are:

# host=0.0.0.0              // The IP address of SMTP connection
# user=Domain\\User         // The username for login in SMTP
# password=1234             // The password for SMTP connection
# [email protected]   // The e-mail address of the sender
# port=25                   // The port for SMTP connection
# server_name=www.webarchive.org.uk // The server name e.g. for crawl permission request

Archiving via rabbitMQ

The settings for the queue endpoint for the rabbitMQ library should be defined in the project configuration file 'w3act.properties".

# queue_host=www.webarchive.org.uk
# queue_port=5762
# queue_name=w3actqueue
# routing_key=w3actroutingkey
# exchange_name=w3actexchange

Setting up a development instance

In order to create a development instance of the application, clone sources from the Github repository:

# git clone https://github.com/ukwa/w3act.git
# cd w3act

Make amendments to the code if necessary:

# change parameter in configuration files
# create configuration file for production mode
# modify initial-data.yml file for roles and permissions configuration
# modify templates.yml file for templates configuration
# modify users.yml file for users configuration
# modify contact-persons.yml file for configuration of contact persons
# modify flags.yml file for configuration of using predefined list (submitted with default version)
# modify tags.yml file for configuration of open tags

Modify the w3act.properties configuration file parameters (e.g. for Drupal access):

# drupal_user=username
# drupal_password=pwd

The file initial-data.yml is a text file that contains data for different database tables like e.g. user table, roles and permissions.

The installation-specific Users with associated Roles should be defined in users.yml file in the manner:

roles:

- !!models.Role        &sys_admin
    name:              sys_admin
    permissions:       create_roles_and_organisations, create_user, administer_user

- !!models.Role        &archivist
    name:              archivist
    permissions:       create_user, administer_user, administer_collections

users:

- !!models.User
    email:             [email protected]
    name:              Max Mustermann
    password:          secret
    field_affiliation: COM
    role_to_user:      [*sys_admin, *archivist]

The file templates.yml is a text file that contains data about different email templates. The installation specific templates should be defined in this file in the manner:

- !!models.MailTemplate
    name:         General
    type:         Permission Request
    subject:      Our Archive
    placeHolders: url, name
    fromEmail:    [email protected]
    text:         default.txt

where the "text" field defines a path to the file containing the email template text.

The file flags.yml contains predefined list of attention flags:

- !!models.Flag
    name:         PRIORITY_PERMISSION
    description:  This flag marks priority permission

- !!models.Flag
    name:         PRIORITY_CRAWL_AND_QA
    description:  This flag marks priority crawl and QA

- !!models.Flag
    name:         PRIORITY_QA
    description:  This flag marks priority QA

- !!models.Flag
    name:         QA_ISSUE_APPEARANCE
    description:  This flag marks QA issue appearance

- !!models.Flag
    name:         QA_ISSUE_FUNCTIONALITY
    description:  This flag marks QA issue functionality

- !!models.Flag
    name:         QA_ISSUE_CONTENT
    description:  This flag marks QA issue content

- !!models.Flag
    name:         FOLLOW_UP_PEMISSION
    description:  This flag marks follow up permission

- !!models.Flag
    name:         GENERAL_CHANGE_REQUEST
    description:  This flag marks general change request

The file tags.yml contains initial set of open tags in the manner:

- !!models.Tag
    name:         science
    description:  This site is related to science

- !!models.Tag
    name:         sport
    description:  This site is related to sport

In order to run the application in development mode, use the run command:

# activator run

or for debugging

# activator debug run

If the sbt cache causes problems, use:

# activator clean-all

or one of the required scripts:

# ./cleanup.sh - to remove previous compiled code and DB data
# ./cleanup-evolutions.sh - to remove the evolution DB table creation/destruction SQL

Switching on/off data import in application.conf

# application.data.import=true|false

Running data import

# ./data_import.sh - found in root project

Testing

For testing use command:

# cd w3act
# activator test

Various tests are implemented in this project and can be found in the test/ directory:

# Integration tests are necessary to check content of created internet pages or to start browser. For this task we also employ Selenium WebDriver in order to automatically run  different W3ACT pages in a browser starting with the login page.
# Application tests are employed to evaluate general functionality and HTML page contents.
# Models testing is used to test the created domain model and its connection to the database.
# Additionaly [Travis](https://travis-ci.org/ukwa/w3act/) was setup for automated testing combined with Github submissions of this project.

Setting up a production instance

In order to separate your development application from production mode you can create a configuration file for production mode e.g. conf/prod.conf. In this file you import values from conf/application.conf and overwrite fields that should be different in production mode e.g. database name or flag for evolutions ("-DapplyDownEvolutions.=true") that are required to start the production application. It is important for the production mode to have the database evolution scripts before starting e.g. \w3act\conf\evolutions\default\1.sql

# include "application.conf"

# db.default.driver=org.postgresql.Driver
# created database 'w3act' with user 'training'
# db.default.url="postgres://training:(password)@127.0.0.1/w3act"

# applyDownEvolutions.w3act=true

Download the project sources from Github and use the stage command to prepare your application to be run in place:

# git clone https://github.com/ukwa/w3act.git
# cd w3act

Use activator command if it is in path:

# activator clean stage

or directly from play installation:

# /home/ait/Downloads/activator-dist-1.3.5/w3act/activator clean stage

This cleans and compiles your application and copies it to the target/universal/stage directory. It also creates a service start-up script within target/universal/stage/bin/w3act where 'w3act' is the project’s name. Run created script:

# target/universal/stage/bin/w3act -Dconfig.file=/home/ait/projects/w3act/conf/prod.conf -Dlogger.file=/home/ait/projects/w3act/conf/prod-logger.conf

or for Windows:

# target/universal/stage/bin/w3actprod.bat

When you are running this script you can specify your configuration file as a parameter. The default is application.conf. For production you could use either –Dconfig.file or if you prefer also -Dconfig.resource=prod.conf, which essentially means the same and looks in the conf/ directory of the project for the given file. A third possibility would be to use e.g. "-Dconfig.url=http://www.webarchive.org.uk/conf/prod.conf" but then you must provide this URL.

Switching on/off data import in application.conf

# application.data.import=true|false

Running data import

# ./data_import.sh - found in root project

Setting Wayback URL

# application.wayback.url="http://www.webarchive.org.uk/wayback/archive/xmlquery.jsp?url="

Switching off importing accounts/user.yml

# use.accounts=false

Logging configuration

There are two possibilities. The first one is to configure the logging level using the logger key in your conf/application.conf file. Play defines a default application logger for your application, which is automatically used when you use the default Logger operations.

# Root logger:
logger=ERROR

# Logger used by the framework:
logger.play=INFO

# Logger provided to your application:
logger.application=DEBUG

Another possibility is to use logback configuration. The default configuration file (logger.xml) comes with play in the production mode and defines two appenders, one dispatched to the standard out stream, and the other to the logs/application.log file. If you want to fully customize logback, just create an alternative logback config file called e.g. prod-logger.xml and copy that to the conf/ directory of your application. In this file you can specify your logging output e.g. /var/log/w3act.log:

# <appender name="FILE" class="ch.qos.logback.core.FileAppender">
#     <file>/var/log/w3act.log</file>
#     <encoder>
#         <pattern>%date - [%level] - from %logger in %thread %n%message%n%xException%n</pattern>
#     </encoder>
# </appender>

Using the "-Dlogger.file" property you can specify another logback configuration file to be loaded from the file system, e.g.

# target/universal/stage/bin/w3actprod -Dconfig.file=/home/ait/projects/w3act/conf/prod.conf -Dlogger.file=/home/ait/projects/w3act/conf/prod-logger.xml

Create a binary distribution

If you want to deploy your application to the server without any dependency on Play itself you can do this with the dist task. This task will build a binary version of your application and produces a ZIP file in target/universal/w3act-1.0.zip containing all JAR files needed to run your application in the target/universal folder of your application.

# activator dist

For Windows users a start script will be produced with a .bat file extension. The Linux you will need to add Unix file permissions. Because when the file is expanded the start script will be required to be set as an executable:

$ unzip target/universal/w3act-1.0.zip
$ chmod +x /path/to/bin/w3act

where 'w3act' is a project name. A w3act-1.0 directory will be created that contains a bin/ folder with start scripts.

To learn how to create a proper binary distribution, see Production Configuration, Production Distribution and the native packager

Deploy a binary distribution

Use the package created above employing generated w3act.BAT start script for Windows or shell script for Linux. Use necessary parameter e.g. for evolutions:

$ cd w3act-1.0
$ ./bin/w3act -DapplyDownEvolutions.w3act=true

Where w3act in applyDownEvolutions parameter stands for database name.

RHEL deployment

The RHEL scripts are located in folder conf/sysv.

Script create-distribution-package

In order to deploy project application into /opt/ folder on RHEL without the need for internet access use 'create-distribution-package' script that documents and supports the use of 'play dist' command in root directory of the project. This script creates RHEL deployment package for W3ACT project with the package structure:

$     sysv
$     w3act-1.0
$         |_bin
$         |_conf
$         |_lib
$         |_share

The resulting distribution package 'w3act-dist.zip' should contain:

   1. Project sources (a ZIP file resulting from “play dist” command) e.g. w3act-1.0.zip
   2. Configuration files (*.yml, *.conf …) in folder "conf/" of the zip
   3. SysV init scripts in folder "sysv/"

This script should be executed in a root directory of the project. We assume that zip and unzip program is installed and play software is installed in /etc/default/play-2.2.1

Main definitions:

$ PROJECT_NAME=w3act
$ VERSION="$PROJECT_NAME"-1.0
$ SOURCES_ZIP="$VERSION".zip
$ PLAY_DIR=/etc/default/activator-dist-1.3.5
$ SOURCE_DIR=target/universal
$ DIST_ZIP="$PROJECT_NAME"-dist.zip
$ SYSV_DIR=sysv

In this script we first clean up the old distribution package. In a second step we build a binary version of the application in order to deploy it to the server without any dependency on Play itself using 'play dist'. Then we extract created sources ZIP. We add the SysV init scripts and create a distribution package as a ZIP. This package can be copied to the opt folder on RHEL and unzipped there.

$ Usage: ./create-deployment-package

Script w3act-rhel-deployment

All services are managed via SysV scripts from 'sysv' folder in distribution package. The SysV init script inside the main distribution will be installed to /etc/init.d/w3act and will support services like /etc/init.d/w3act [start|stop].

The script 'w3act-rhel-deployment' supports W3ACT RHEL deployment with the default run level 3 (/etc/inittab). The location of the W3ACT project after unzipping of the distribution ZIP 'w3act-dist.zip' should be /opt/w3act-1.0. The structure of the W3ACT project related files under the "opt" folder should be the same as described above.

We assume that PostgresQL, Java and Play Framework are already installed in folder /etc/default The distribution package contains:

   1. Project sources (a ZIP file resulting from “play dist” command) e.g. w3act-1.0.zip
   2. Configuration files (*.yml, *.conf …) in folder "conf/"
   3. SysV init script in folder "sysv/"
   4. This script that extracts supporting software and configuration files in required directories like (/etc/init.d, /etc/default/ and /etc/sysconfig) 

This script cleans up old configuration files and copies new configuration files to the folder /etc/sysconfig/w3act/ in order to isolate installation settings from code. Then we clean up old SysV init script and copy new file to the /etc/init.d. The last step is creation of the run level link.

$ Usage: ./w3act-rhel-deployment 

Script w3act

This SysV init script 'w3act' manages w3act services on RHEL

  1. If it is necessary to change the Linux run level to e.g. 3 use the command

    $ init 3

  2. Copy this script from /opt/w3act/sysv/ folder to the /etc/init.d/ folder using command

    $ cp /opt/sysv/w3act /etc/init.d/

  3. Add rights using command

    $ chmod 755 /etc/init.d/w3act

  4. Create symlink to the /etc/init.d/w3act script in required /etc/rc5.d level folder e.g. for the level 3 use commands

    $ cd /etc/rc3.d $ ln -s /etc/init.d/w3act S99w3act

  5. To start the W3ACT application use e.g.

    $ service w3act start

    For application start we provide two parameters:

    a. Database name -DapplyDownEvolutions.<databasename>=true e.g. w3act
    b. Location for the file that contains the process id of the started application e.g. -Dpidfile.path=/var/run/play.pid
    
  6. To stop the W3ACT application use e.g.

    $ service w3act stop

Expected locations for project and for play framework are:

$ PROJECT_ROOT=/opt/w3act-1.0
$ PLAY_DIR=/etc/default/activator-dist-1.3.5

Logging

The logging in a single-line formats with a datestamp per line is defined in play configuration file.

Password Encryption

For password encryption we employ secure hashing with random salt method proposed by Taylor Hornby.

$ Password Hashing With PBKDF2 (http://crackstation.net/hashing-security.htm).
$ Copyright (c) 2013, Taylor Hornby
$ All rights reserved.

Git version retrieval

For Linux 'get-last-version.sh' should be executed at the beginning of deployment. This will create a file 'last-version.txt' in a root of the project. The About page will retrieve the last version from this file.

Troubleshooting

Problem: Console prints only a few info messages and drupal data is not imported into local database

possible cause: the path to PostgreSQL\9.3\bin\psql in cleanup.bat may be wrong and should be adjusted according to the settings on your machine

Problem: Console reports the following Excption:

java.net.UnknownHostException: www.webarchive.org.uk

possible cause: Login information in conf/w3act.properties is not correct

drupal_user=... drupal_password=...

After correcting the login information it may be necessary to wait 30 minutes before another attempt to login can succeed.

Problem: Browser reports:

Configuration error Cannot connect to database [default]

possible causes:

  • database user name or password does not match with information from conf/application.conf
  • database w3act does not exist or is not owned by the database given in conf/application.conf

Problem: Browser reports:

Unexpected exception NoSuchElementException: key not found: SOURCE

solution: This is a problem rooted in the play framework. Stop the server and start it again.