The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.
SolrWayback bundle version 5+ now require java 11 or java 17 and no longer runs under java8. Tomcat and Solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.
Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.2.1/solrwayback_package_5.2.1.zip
How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.
Solr must now be started with a -c (for cloud) argument:
solr-9/bin/solr start -c -m 4g
How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.
Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md
Changes since last 5.0.0 release:
5.2.1
- Fixed memento null-pointer for revisits.
5.2.0
- Upgraded solr dependencies from v9.1.0 to v9.4.1
- HTML pages with geo tag will no longer be found in image GEO search.
- Fixed Gephi export regression bug, not all results was extracted due to Gephi also was limit by CSV export limit size in property file.
- Added SolrWayback ASCII logo in log file when started successfully.
- Add support for Memento API, including timegates and timemaps. Memento properties added to solrwayback.properties (Thanks @VictorHarbo )
- Two new memento properties added in solrwayback.properties. Will use same default values if not defined in property file.
- Removed Jetty 'mvn jetty:run' as development option and switched to 'mvn cargo:run that will start a Tomcat instead. Routing was not working in Jetty. See README.md for details how to use.
- Upgrade from deprecated HttpSolrClient to HttpJdkSolrClient compatible with Http1 and Http2.
- Download button added to toolbox n-gram to download data in csv format.
5.1.2
Bug fix. Chunking was not removed in all cases. This was only relevant for WARC-files that are created with chunking. (not Heritrix)
Dockerfile has been updated to build SolrWayback bundle 5.1.0. (Will be upgraded each release) See: #456 implemented by @c-vandendyck-kbr
Geo search was not working for Solr 9.4 in cloud mode. Solr function query syntax rewrite was required and it also is backwards compatible with Solr7.
5.1.1
Little cleanup in log messages due to shard-splitting to avoid repeated stack traces.
Solr9 bug temporary bug fix due to invalid Json from Solr. See:#449
5.1.0
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.
Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.
Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.
5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.0.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.