Skip to content

Releases: Pometry/Raphtory

v0.3.1

10 May 13:00
91ae71c
Compare
Choose a tag to compare

Raphtory 0.3.0 Release - In Rust we trust 🦀

Thanks to a whole bunch of feedback on our Python alpha, over the last 6 months Raphtory has been through its second full make over with a total rewrite of the core engine in Rust! There are WAY too many changes to list here, but below you can see a couple of highlights.

If you would like to give this new version a go, you can check out the docs as well as our examples - available in both Python and Rust. Good places to start are the Jupyter notebooks for the Reddit snap dataset and Enron emails.

Highlights 🏆

Revamped API

The first thing you will notice about the new Raphtory is that the API is A LOT more friendly. The example notebooks above show this off nicely, but as a snippet this is how you could grab Gandalf from our Lord of the Rings Graph and plot his degree every 1000 time increments:

timestamps   = []
degree       = []
for gandalf in g.vertex("Gandalf").rolling(window=1000):
    timestamps.append(gandalf.latest_time())
    degree.append(gandalf.degree())     
            
sns.set_context()
ax = plt.gca()
plt.xticks(rotation=45)
ax.set_xlabel("Time")
ax.set_ylabel("Interactions")
sns.lineplot(x = timestamps, y = degree,ax=ax) 
image

Performance 📈

  • The memory usage of Rust is over an order of magnitude less than Scala, with one test graph (~11 million nodes, ~33 million edges) previously requiring ~120GB of Ram now only needing 7GB! This means that datasets which required spinning up a EC2 instance or cluster can now easily be run on your laptop 💪🏻
  • Similarly, complex algorithms like Temporal Triadic Motifs run in seconds instead of hours - even when running locally instead of on a beefy box 🍖
  • Querying for specific vertices/edges runs 1000x faster and can be done directly on the graph via the new APIs instead of having to write a full algorithm.

Python 🐍

  • Raphtory can now be easily installed with all dependencies via pip install raphtory.
  • Via PyO3 Raphtory can now be run and called directly in python without secretly running a Java Runtime Environment in the background 🥷. This also means it now works perfectly with all other python libraries!
  • The vast majority of the heavy lifting is done in Rust, meaning you get all the ease of Python, with very little slow down!

Whats next ⏭️

We are currently tinkering away with several parts of the new APIs, adding more helper functions, and expanding the algorithmic suite. However, alongside this we have some larger pieces of work in the pipeline:

  • We are building a GraphQL extension to Raphtory, codename flux-capacitor, alongside the functionality for compiling Raphtory to WASM - making it possible to interact with Raphtory through Javascript/web UIs.
  • We are working on out-of-core analysis, allowing Raphtory to work on Graphs many times the size of the memory of the machine it is running on.
  • Finally, we are running a suite of standard benchmarks on this new version to give some concrete comparable numbers to other Graph systems.

If you have an other suggestions please do let us know via Github issues, discussions or on Slack!

Full Changelog: v0.2.0...v0.3.1

raphtory-scala-0.2.2

24 Mar 14:32
Compare
Choose a tag to compare
v0.2.2

bumped to v0.2.2

raphtory-scala-0.2.1

13 Feb 13:40
Compare
Choose a tag to compare

Release v0.2.1

raphtory-scala-0.2.0

07 Feb 12:08
Compare
Choose a tag to compare
v0.2.0

bumped to v0.2.0

raphtory-scala-0.1.6

19 Jan 15:33
Compare
Choose a tag to compare
v0.1.6

bumped to v0.1.6

raphtory-scala-0.1.5

09 Jan 14:17
Compare
Choose a tag to compare
v0.1.5

bumped to v0.1.5

raphtory-scala-0.1.4

12 Dec 10:26
Compare
Choose a tag to compare
v0.1.4

bumped to v0.1.4

raphtory-scala-0.1.3

09 Dec 11:49
Compare
Choose a tag to compare
v0.1.3

bumped to v0.1.3

raphtory-scala-0.1.0

22 Jun 10:42
955a277
Compare
Choose a tag to compare

Raphtory 0.1.0 Release ⏳

Over the last 6 months Raphtory has gone through a full rebuild, with the majority of the original project deprecated as raphtory-akka. This includes replacing all the underlying tech stack, remaking both ingestion and analysis API's, totally reworking local and distributed deployment and adding on a host of wonderful bells and whistles to boot. As such we have had a bit of a rebrand and are considering this the official Raphtory 0.1.0 release!

Below is a small summary of the new features that have been introduced. We shall soon be following up with individual blog posts on each. You can also read an in-depth dive in our Documentation.

Thanks to everyone who helped bring this together! Looking forward to the many exciting things we have planned for the future of Raphtory 🚀 ✨

Getting data in and out of Raphtory 📥 📤

  • Spouts and Graph Builders have been rewritten to be more flexible.
    • They now interact via any serialisable class instead of just strings.
    • Spouts now extend Iterable, requiring only a next() and hasNext() function - making them far easier to get non-standard datasources into.
  • Output of algorithms is now handled by the Sink and Format interfaces. These define how to output your results to a location and the serialised format this should take.
    • This allows code to connect to say AWS S3 to be defined once, and used in combination with a format for CSV, TSV, JSON, XML, etc
    • This also enables us to define global formats, including all perspectives i.e. a singular valid JSON object for your whole query instead of individual JSON objects per-vertex as before.
  • We have added a new connectors sub-project to the repository containing all the different Spouts and Sinks we support out of the box. These can be imported into your project as required and support a variety of things from AWS S3 to the Twitter Firehose.

Graph and Algorithmic Engine 🧮

  • step(), iterate(), select() and tabularise() as the main algorithmic flow operators can now be performed in any sequence.
  • Algorithms may be composed together with the -> operator .
  • Small amounts of global graph state can be stored as aggregators such as sum, product, min, max, any, all, within the algorithmic flow.
  • Histogram API for storing distributions of vertex and edge quantities for algorithms and extracting quantiles.
  • Introduction of vertex and edge filters for creating subgraph views of a perspective.
  • Many new algorithms including three-node motifs, temporal taint tracking, max flow, prisoners' dilemma and more.
  • Temporal Multilayer View for modelling a temporal graph as a multilayer graph of snapshots with interlayer edges.
  • Support and convenience functions for weighted networks and merging strategies for converting multiple temporal edges into a single weighted edge within a perspective.
  • A host of convenience functions within the Vertex and Edge visitor objects, enabling cleaner algorithm code.

Graph perspective API 🔍

  • More flexible ways of time slicing for expressing temporal queries using function composition. (see the documentation page for a full description of the new API on this).
  • Name changes of pointquery(), rangequery() and livequery() to at(), climb(), depart() respectively.
  • Support for natural language time descriptors on top of existing long timestamp specification. E.g. 25 June 2022, windowsize = 1 day.
    • Increments and windows may be composed of multiple time frames with commas and 'and' to allow natural text. For instance, the interval "1 month 1 week 3 days" can equally be rewritten as "1 month, 1 week, and 3 days"
  • Full handling of time including leap years, different month lengths and time zones.

Raphtory Internals ⚙️

  • Raphtory now runs predominantly on top of Apache Pulsar, with Akka being used for analysis control messages.
    • This means that all components are decoupled and can message eachother without fear of data being lost or causing a crash due to huge amounts of backpressure.
    • This messaging is built on top of a communication layer abstraction which allows the medium for each topic (comms between two component types) to be set within conf. In later versions of Raphtory this will be exposed to the user, allowing Raphtory to run on other message brokers/technology stacks.
  • Cluster management is now handled by zookeeper, which provides a central location to track partition IDs and store addresses for service discovery.
  • We have begun a transition into Typelevel Cats 😻 for better state management and execution. This will be finalised in the next version.

Deployment :shipit:

  • Raphtory has a new deployment API where local deployments are created through either Raphtory.load() for closed datasets or Raphtory.stream() for streaming datasets.
    • This returns a Temporal Graph where queries can be built up in a much more expressive fashion as explored above.
  • The Raphtory Service for distributed deployments has been fully overhauled, allowing all components to be easily spun up on bare-metal or as a container. For those wanting to give this a try, the process is fully documented on our ReadTheDocs page described below.
  • To support automation and large scale deployments Raphtory is fully integrated with Kubernetes.
    • We have even created a Deploy sub-project in the repo that will allow you to automatically spin up and shut down Raphtory components via fabric8.
  • Whether you are running a local deployment or a distributed deployment you can now spin up a Client which can attach and submit new queries, with the results output to any Sink specified.
    • This client interacts with Raphtory through the same TemporalGraph API as the local deployments.
  • Raphtory 0.1.0 is now available on Maven, meaning you no longer need to build any of the jars. This includes the core, connectors and deploy packages.

Testing, Logging and Metrics 📈

  • Testing suite for algorithmic correctness and integration of all Raphtory components.
  • CI/CD pipeline for all PR's to Raphtory complete with test and build of core, connectors and the examples.
  • The scourge of println's has been replaced with logging throughout all packages, appropriately levelled and configurable through environment variables
  • Metrics have been reenabled and expanded, encompassing the full ingestion and analysis pipeline. These are handled by Telemetry and scraped via Prometheus.

Documentation and Examples 📖

  • Getting set up with Raphtory, understanding the underlying frameworks and creating your own projects on top is fully explained in our new ReadTheDocs tutorials.
  • Several example projects are now available within the repository covering social networks, cryptocurrencies, interaction networks and more. All of these can be used as a basis for your own applications.
  • Every user facing function is fully commented and searchable via ScalaDocs.
  • All algorithms included as part of Raphtory core have their purpose and parameters fully explained here.

Python Raphtory Client (Alpha) 🐍 👷

  • A Raphtory client for Python has been created for submitting established queries to a running Raphtory instance, meaning it is not necessary to interact with the Scala code after the initial setup, if preferred.
  • Algorithm results can be outputted to this client for postprocessing and visualisation in Python.
  • This is a new feature and will become more established with functionality and stability in the next release.

Bug fixes 🐛

  • Far too many to list here due to the full rewrite. But trust me there were a lot.

raphtory-akka-0.4.1

29 Mar 16:43
63a9d2b
Compare
Choose a tag to compare
raphtory-akka-0.4.1 Pre-release
Pre-release

Prior to the rebuild of Raphtory on top of Pulsar, there have been several fixes and updates to the prior 0.4.0 version.

Primarily these changes consisted of:

  • Porting 0.4.0 to the new composable analysis API.

  • Updating the algorithms to fit the new functions.

  • Minor refactors and bug fixes