This package, as the name suggests, will become a “sort of” port of ggplot2 for Nim.
It is based on the ginger package.
It is still in a heavy WIP state, but the features mentioned below at least are working. However, some specific use cases of them may be broken. The implementation of featuers may still change dramatically.
On the plus side, thanks to Nim’s macro system even formula creation from raw identifiers is possible. See below for more details.
For simple examples of some of the current existing features, see Examples below.
If you’re unfamiliar with the Grammar of Graphics to create plots, one
of the best resources is probably Hadley Wickham’s book on ggplot2
,
for which also an online version exists at:
https://ggplot2-book.org/
In general this library tries (and will continue to do so) to stay
mostly compliant with the ggplot2
syntax. So searching for a
solution in ggplot2
should hopefully be applicable to this (unless
the feature isn’t implemented yet of course).
For a more nimish approach, check out the recipes, which should give you examples for typical use cases and things I encountered and the solutions I found. Please feel free to add examples to this file to help other people!
The documentation is found at:
https://vindaar.github.io/ggplotnim
Installation should be just a
nimble install ggplotnim
away. Maybe consider installing the #head
, since new version
probably won’t be released after every change, due to rapid
development still ongoing.
Since this library is written from scratch there is only a single
external dependency, which is cairo
.
Geoms:
geom_point
geom_line
geom_histogram
geom_freqpoly
geom_bar
Facets:
facet_wrap
Scales:
- size (both for discrete and continuous data)
- color (both for discrete and continuous data)
Shape as a scale is not properly implemented, simply because ginger only provides 2 (circle, cross) different marker shapes so far. Feel free to add more!
The library implements a naive dynamic and column based data
frame. Each column is represented as a persistent vector of
Values
. A Value
is a variant object, similar to a JsonNode
of
the standard library.
NOTE: Due to the dynamic nature and naive implementations performance is not a priority. Heavy calculations should be done before creation of the data frame. Simple arithmetic, filtering, reducing etc. is the main aim.
The data frame provides the “5 verbs” of dplyr. Implemented functions:
filter
mutate
,transmute
select
,rename
arrange
summarize
and also group_by
, which are all based on the FormulaNode
object. Basically they all receive varargs[FormulaNode]
, which is
evaluated in context of the given dataframe.
Creationg of a FormulaNode
can be done either directly via untyped
templates acting on +
, -
, *
, /
, ~
. Using the mpg
data set
as an example:
let f = displ ~ hwy / cty
would describe the dependence of the displacement (displ
) of the
ratio of the highway to the freeway mpg.
Echoeing this formula prints it as a lisp like tree:
(~ displ (/ hwy cty))
Note that the ~
in the untyped templates always acts as the root
node of the resulting tree. The LHS of it is always considered the
dependend quantity.
In these templates however, the identifiers are converted to strings
and must match the names in the data frame!
The second way to create a FormulaNode
is via the f{}
macro. This
provides a little more flexibility:
let f = f{ "displ" ~ "hwy" / mean("cty") }
Note that here all keys must be explicit strings. Everything that is not a string, will be interepreted in the calling scope.
If the identifier is the first element of a nnkCall
, e.g. as in
mean("cty")
, it will be stored in a FormulaNode
of kind
fkFunction
. An fkFunction
itself may contain two different kinds
of functions, as evident by the implementation:
# storing a function to be applied to the data
fnName: string
arg: FormulaNode
case fnKind*: FuncKind
of funcVector:
fnV: proc(s: PersistentVector[Value]): Value
res: Option[Value] # the result of fn(arg), so that we can cache it
# instead of recalculating it for every index potentially
of funcScalar:
fnS: proc(s: Value): Value
We store the name of the function as a string for debugging and
echoeing. The function must only take a single argument (this may be
changed in the future / we may wrap a function with multiple arguments
in a template in the future). It can either be a procedure taking a
vector of Values
corresponding to a proc working on a whole column
as the input (e.g. mean
) or a scalar function taking a single
Value
(e.g. abs
). In the latter case the function is applied to
each index of the key of the data frame given by arg
.
Lifting templates are provided to lift any:
liftVector[T]Proc
:proc (s: seq[T]): T
proc toproc(s: PersistentVector[Value]): Value
liftScalar[T]Proc
:proc (s: T): T
proc toproc(s: Value): Value
where T
may be float, int, string
.
The PersistentVector
is an implementation detail of the data frame
at the moment and may be changed back to seq
soon.
On the other hand if an identifier is not part of a nnkCall
it is
interpreted as a variable declared in the calling scope and will be
converted to a Value
using %
and stored as a fkVariable
.
Literal interger and float values are also allowed.
Using a lifted vector valued function and local variables as keys and integer values:
let val = 1000
let key = "cty"
let f = f{"cty_norm" ~ "cty" / mean(key) * val}
Using a lifted scalar valued function and local variables as keys and float literal values for a random calculation:
let g = f{"cty_by_2ln_hwy" ~ "cty" / (ln("hwy") * 2)}
The following are just the first plots I reproduced. The mpg
dataset
being used has to be read via the readCsv
proc and be converted to a
dataframe via toDf
. The file is located in data/mpg.csv part of
the repository. So the header of all examples below is simply:
import ggplotnim
let mpg = toDf(readCsv("data/mpg.csv"))
where it is assumed the current working directory is the ggplotnim
dir.
Simple scatter plot of two quantities "displ"
vs. "hwy"
of a
dataframe.
ggplot(mpg, aes(x = "displ", y = "hwy")) +
geom_point() +
ggsave("scatter.pdf")
Note: if the ggsave
call is omitted, the return value will be a
GgPlot
object, which can either be inspected or modified or called
upon with ggsave
at a later time.
Same scatter plot as above, but with a grouping by a third quantity
"class"
encoded in the dot color. Also adds a title to the plot.
ggplot(mpg, aes(x = "displ", y = "cty", color = "class")) +
geom_point() +
ggtitle("ggplotnim - or I Suck At Naming Things™") +
ggsave("scatterColor.pdf")
We may now also perform some operations on the data frame, before we plot it. For instance we can filter on a string (or a number) and perform calculations on columns:
df.filter(f{"class" == "suv"}) # comparison via `f{}` macro
.mutate(ratioHwyToCity ~ hwy / cty # raw untyped template function definition
) # <- note that we have to use normal UFCS to hand to `ggplot`!
.ggplot(aes(x = "ratioHwyToCity", y = "displ", color = "class")) +
geom_point() +
ggsave("scatterFromDf.pdf")
And eeehm, I guess the legend is broken if we only have a single entry…
In addition we can use locally defined procedures in the f{}
macro
as well (see above for caveats). For instance we can normalize a
column by dividing by the mean:
df.mutate(f{"cty_norm" ~ "cty" / mean("cty")}) # divide cty by mean
.ggplot(aes(x = "displ", y = "cty_norm", color = "class")) +
geom_point() +
ggsave("classVsNormCty.pdf")
Note that calculations involving explicit numbers or constants is not
supported yet. For that the implementation of FormulaNode
must be
changed to use Value
as well.
A simple histogram of one quantity "hwy"
of a dataframe.
ggplot(mpg, aes("hwy")) +
geom_histogram() +
ggsave("simpleHisto.pdf")
Same as the histogram above, but as a frequence line.
ggplot(mpg, aes("hwy")) +
geom_freqpoly() +
ggsave("freqpoly.pdf")
A combination of a histogram and a frequency line plot. Also showcases
the ability to set aesthetics of specific geoms to a constant value
(in this case change line width and color of the freqpoly
line).
Note that the order in which the geom_*
functions are called is also
the order in which they are drawn.
ggplot(mpg, aes("hwy")) +
geom_histogram() +
geom_freqpoly(color = parseHex("FD971F"),
size = 3.0) +
ggsave("histoPlusFreqpoly.pdf")
Although still somewhat ugly, because the scaling is off, facet wrapping is working in principle:
ggplot(mpg, aes("displ", "hwy")) +
geom_point(aes(color = "manufacturer")) +
facet_wrap(~ class) +
ggsave("facet_wrap_manufacturer.pdf")
A simple bar plot of a variable with discrete data (typically a column of strings, bools or a small subset of ints).
ggplot(mpg, aes(x = "class")) +
geom_bar() +
ggsave("bar_example.pdf")
From the beginning one of my goals for this library was to provide not only a Cairo backend, but also to support Vega-Lite (or possibly Vega) as a backend. To share plots and data online (and possibly add support for interactive features) is much easier in such a way.
For now only a proof of concept is implemented in
vega_utils.nim
. That is only geom_point
with the "x"
, "y"
,
"color"
scale set on the main aesthetic are supported. Generalizing
this is mostly a tediuos process, since the GgPlot
object fields
etc. have to be mapped to the appropriate Vega-Lite JSON nodes.
A simple example:
let vegaJson = ggplot(mpg, aes(x = "displ", y = "cty", color = "class")) +
geom_point() +
ggtitle("ggplotnim - or I Suck At Naming Things") +
ggvega()
show(vegaJson)
creates the equivalent plot from above using Vega-Lite. Note that it still uses the Vega-Lite default theming.
It generates the following Vega-Lite JSON:
{ "$schema": "https://vega.github.io/schema/vega-lite/v4.json", "description" : "Vega-lite plot created by ggplotnim", "width" : 640, "height" : 480, "title": "ggplotnim - or I Suck At Naming Things", "data": {"values" : [{"displ": 1.8, "cty": 18.0, "class": "compact"}, {"displ": 1.8, "cty": 21.0, "class": "compact"}, {"displ": 2.0, "cty": 20.0, "class": "compact"}, ... ] ]}, "mark": "point", "encoding": { "x": {"field": "displ", "type": "quantitative"}, "y": {"field": "cty", "type": "quantitative"}, "color": {"field": "class", "type": "nominal"} } }
And results in the following Vega-Lite plot:
Or if you want to look at the interactive version in your browser, see here:
- customization is very limited (font size, point sizes, line widths
etc.). ginger provides the functionality, but it’s not exposed in
gglpotnim atm. Extend
Theme
object for this, add args to procs where applicable. - log10 plots force x and y range to be of orders of 10
- facet wrap layout is quite ugly still
- …
- legend is not always centered (easy to fix)
- plots with two legends produce overlapping legends (easy to fix)
- plots with continuous color scale produce no legend