This repository has been archived by the owner on Nov 30, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 10
/
CHANGES
549 lines (357 loc) · 17.9 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
3.4.2
- Fixed dependencies.
3.4.1
- Significantly improved performance of HyperBall on graphs with a highly
skewed (e.g., heavy-tailed) outdegree distribution (e.g., transposed web
graphs).
- Fixed wrong estimation of memory used.
- Now ConnectedComponents writes results using "wcc" instead of "scc".
3.4.0
- Fixed problem with obsolete BitSet used to store buckets in Stats.
- New parallel classes to compute geometric centralities and betweenness.
3.3.3
- Regressed to fastutil's quicksort calls in case of array fragments. Java
7's Arrays.sort() has a memory bug that was killing the performance of a
number of methods.
3.3.2
- We now distribute SpeedTest, hoping to improve the quality of benchmarks
in the literature.
3.3.1
- Adapted to new DSI utilities.
3.3.0
- HyperBall sports a new adaptive decomposition scheme that
is based on the number of arcs to be scanned, rather than
on the number of nodes.
- Fixed bug in the computation of the buckets. If you have used the new
iterative implementation of Tarjan's algorithm
(StronglyConnectedComponents) to compute buckets please recompute them.
3.2.1
- New iterative implementation of Tarjan's algorithm.
- HyperBall can now compute Nieminen's centrality.
3.2.0
- New selectable upper bound for EFGraph makes it possible to build
"fake" graphs in whcih successors are greater than or equal to
the number of nodes (this was already possible with BVGraph). Useful
for incremental graph construction.
- New IncrementalImmutableSequentialGraph adapter, which provides an
inversion of control for storing graphs: you supply, one at a time,
the successor list of each node.
3.1.0
- New HyperBall implementation of the HyperANF idea. It computes several
kind of geometric centrality and once in systolic local mode uses time
proportional to the number of edges causing a modification, setting in
practice the expected run time to the theoretical bound O(m log n).
- Now terminal nodes have closeness equal to zero (terminal nodes already
had Lin's centrality equal to one).
- The DecimalFormat object used to print data is has now a fixed US locale.
- New EFGraph implementation (backported from the big version) using the
Elias-Fano representation of monotone sequences. Compression is not so
good, but successor enumeration is blazingly fast and the implementation
returns a skippable iterator which provides constant-time search of
nodes by lower bound.
- Both BVGraph and EFGraph have outdegree caching and exact unwrapping
of successorArray(). This should bring performance improvements.
3.0.9
- We switched to SLF4J for logging.
3.0.8
- Now Webgraph includes an adapter towards the Jung graph analysis
framework. The main method of the class can write immutable graphs
into Pajek format.
3.0.7
- Now ScatteredArcsASCIIGraph accepts a translation function from
node identifiers to node numbers.
3.0.5
- New ImmutableGraph.outdegrees() method that exposes the degrees of a graph
as an IntIterator.
- RandomGraph removed.
- ErdosRenyiGraph and almost all transformed graphs now support copy().
- New Transform.NodeClassFilter.
3.0.2
- A new class performs parallel breadth-first visits. NeighbourhoodFunction
now uses it.
- A new class DoubleSweepFringeDiameter computes heuristically the
diameter of symmetric graphs using parallel breadth-first visits.
- A new class ConnectedComponents computes the connected components of
symmetric graphs using parallel breadth-virst visits.
- Fixed in ScatteredArcsASCIIGraph a bug (inherited from fastutil) that
was generating spurious nodes for graphs with more than 100 million
nodes.
- Moved unit tests to Junit4.
- Fixed bug in mapOffline() that was causing some spurious zero-degree
nodes to be part of the graph if a tail of nodes of the original graph
was erased.
- Fixed rare race condition in HyperANF external executions using less
than 64 registers.
- New method to compute the median of all distances.
- HyperApproximateNeighbourhoodFunction now handles graphs that do not
implement numArcs().
- Major revamp of ImmutableSubgraph, which now uses an additional integer
per supergraph node, but it's much faster.
- New subclass of ImmutabeSubgraph that automatically builds a subgraph
formed by nodes with outdegree in a specified range.
- ErdosRenyiGraph is now an ImmutableGraph.
- New class for checks.
- HyperApproximateNeighbourhoodFunction can now be reused by calling
init(seed).
3.0.1
- We now try to adapt classes from the big version if possible.
- Offset deltas can now be long (in case you have really crazy graphs).
3.0
- WARNING: This release has minor binary incompatibilities with previous
releases, mainly due to the move from the interface
it.unimi.dsi.util.LongBigList to the now standard
it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release
of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were
all modified to fit the new interface, and that prepare the way for our
"big" versions, that is, supporting >2^31 entries in arrays (simulated),
elements in lists, terms, documents, nodes, etc. Please read our (short)
"Moving Java to Big Data" document (JavaBig.pdf) for details.
- We now require Java 6.
- New mapOffline method for mapping large graphs.
- All offline transformation methods now compress their batches; the
resulting batch size is comparable to the size of the BVGraph
representation with copying disabled.
- Now we never resort to the ImmutableGraph implementation of
NodeIterator when iterating over an ImmutableSubgraph if
the supergraph does not implement random access. We used to
do it when the graph was very sparse, but not checking for
random access was not a good idea.
- The computation of the ratio w.r.t. the information-theoretical
lower bound (associated to the key "compratio" in the property file of a
graph) was wrong and has been fixed.
- A number of classes deal with exact and approximate computation of
the neighbourhood function of a graph, its distance density function,
and derived values. See our paper about HyperANF (this stuff was actually
introduced in 2.4.5, but we forgot to mention).
- Fixed an occasional infinite loop in ErdosRenyiGraph.
- New class for reading scattered arc lists (ids need to be contiguous).
- BVGraph.store() now sets up the node iterator before starting the
progress logger. This should provide more sensible estimates of time to
completion in case of offline methods.
- BVGraph node iterators have now a finalize() method that will close the
underlying bit stream (and thus possibly the underlying file handle)
when the iterator is no longer used.
- HyperApproximateNeighbourhoodFunction would not work with an offline
graph even with a single thread (thanks to Lars Backstrom for reporting
this bug).
- HyperApproximateNeighbourhoodFunction supports now 16 (-l4) or 32 (-l5)
registers per counter.
- Fixed old-standing synchronization bug in
HyperApproximateNeighbourhoodFunction.
- New static method NeighbourhoodFunction.harmonicDiameter().
2.4.5
- WebGraph is now distributed under the GNU General Public License 3.
- WARNING: A small modification to the coding makes it possible to
compress graphs with more than 1B nodes (always up to 2B nodes). This,
however, means that such graphs will not be readable by previous
versions, which will crash. We felt this was not such a big issue, as
such graphs were not previously compressible at all, so the version
number has not been bumped.
- StronglyConnectedComponents no longer uses a separate thread to
set the stack size. The process was not guaranteed (by contract)
to set the stack size at all. The computation now runs in the main
thread, and we suggest using suitable JVM options to set a large
stack size.
- BVGraph now computes a wealth of statistical data related to the
behaviour of the compression algorithm.
- A number of classes deal with estimating efficiently the neighbourhood
function of a graph, the effective diameter, and the spid
(shortest-paths index of dispersion).
- A caching mechanism has been put in place to make offsets loading
orderds of magnitudes faster. You can generate a cached, serialised
EliasFanoMonotoneLongBigList with the option -L of BVGraph, and then it
will be loaded instead of scanning the offsets file.
- Fixed bug in the definition of in/out trees in ArrayListMutableGraph.
- Now Stats computes loops.
- BitStreamArcLabelledGraph was not supporting offset steps any longer,
but constructors and static methods still made it possible to pass
an offset step. This has been fixed.
- Some residual documentation about offset steps has been removed.
- A new cutoff option makes it possible to eliminate from a graph
generated by a map operation on the command line (see Transform) all
nodes whose index is too large. This is useful in conjunction with maps
that quotient a graph (e.g., to get just large strongly connected
components).
2.4.4
- The empty Formatter constructor was causing problems on localised systems.
Now we use Locale.ROOT.
- offsetStep > 1 no longer exist. It is not necessary with the new Elias-Fano
offset list.
- Speed improvements in random access to a BVGraph.
- Fixed semantics of ImmutableGraph.successorArray(): implementations are
now forced to return a new array at each call. All implementations in
WebGraph are now compliant.
- Now nodeIterator(int) in ImmutableSequentialGraph is implemented so that
it calls nodeIterator() and then skips to the desired node.
- Fixed bad bug in UnionImmutableGraph: the node for which the cache was
active was not set by successors().
- We now output some basic, exponentially binned stats for the distribution
of successor gaps and residual gaps. From these data we also compute an
approximation of the average gap for successors and for residuals.
- We now record how much space is used by every component of the compression
algorithm.
- Following some research, the default minimum interval length in BVGraph
is now 4.
2.4.3
- Fixed ArrayListMutableGraph.addNodes() (thanks to Erik Lumer for
finding and fixing this bug).
- New options to shift the output of ASCII graphs.
- RemappedImmutableGraph.successorArray(x) was providing the same array on every
call, thus making the inherited successors(x) method unusable to scan in
parallel different lists. Fixed (now it returns a copy of the array, instead).
- New random transformation that permutes randomly a graph.
2.4.2
- Transform was not derelativising underlying-graph filenames.
- New classes to support flexible filtering of arc-labelled
graphs. See the new action "larcfilter" of Transform and
the interface LabelledArcFilter.
- StronglyConnectedComponents now uses a filter for labelled arcs, in case
you want to compute components of a subgraph.
- Fixed old bug in StronglyConnectedComponents: the renumber
option was not working.
- New Transform.compose() transformation that composes graphs
(i.e., it computes the graph represented by the product
of the Boolean matrices representing two graphs). You can
even compose labelled graphs by providing a semiring to
compose labels.
- Now label files can be longer than 2GiB.
2.4.1
- Fixed stupid null-pointer bug in BitStreamImmutableArcLabelledGraph.
2.4
- WARNING: There are more general relabelling strategies, but older
code must be slightly refactored.
- Now BitStreamArcLabelledImmutableGraph supports contextual labels.
They accept an additional directory as context, to resolve relative
names.
2.3
- Fixed bug in BitStreamArcLabelledImmutableGraph: labels longer
than 2Gi would have caused overflows.
- The new pointer loading system has been extended to arc-labelled graphs,
too.
2.2
- New pointer loading system based on succinct representations. Now on
typical web graphs pointers occupy 8-9 [sic] bits per element, thus
almost halving the memory footprint.. The performance drop is about
10-15% (measured in ns/link on an Opteron) for reference chains of length
3 (and it decreases for shorter chains).
- New greyPerm transform to just get the permutation.
- ArcLabelledImmutableGraph now strengthens the implementation of
nodeIterator() based on the random-access methods.
- Fixed lack of checks in integer key labels.
- New defensive check in BVGraph against badly implemented ImmutableGraphs.
2.1
- WARNING: Refactored to be based on dsiutils and Sux4J. This will cause
some incompatibilities, in particular with loggers.
- Moved DocumentSequenceImmutableGraph to LAW, to avoid dependency
on MG4J and vice versa.
2.0
- WARNING: WebGraph 2+ is not fully compatible with previous versions, and
requires some minor refactoring: due to the new lazy architecture, the
semantics of successors() has radically changed; in particular, a
LazyIntIterator is returned instead of an IntIterator. Please refer to
the ImmutableGraph documentation.
- New customised class parser that will prepend it.unimi.dsi.webgraph.
and it.unimi.dsi.webgraph.labelling. to classes specified on
the command line (at last!).
- New ArcListASCIIGraph that specifies one arc per line and guesses
the number of nodes. A special implementation can be used when
nodes are numbered from one.
- New --spec switch that makes it possible to specify graphs as
class names with arguments. Most useful to turn MG4J's document
sequences into graphs using a VirtualDocumentResolver.
- Slightly relaxed contract for numNodes() (to make ArcListASCIIGraph
conforming).
- New classes for union and transposition of labelled graphs. Transform
has been adapted to use automatically BitStreamArcLabelledImmutableGraph
to save arc-labelled graph, but the class is settable.
- Arc-labelled graphs must expose a prototype of their labels.
- New store() suggested methods for arc-labelled graphs.
- New Stats class for computing basic statistical data.
- Very, very, old bug in BVGraph has been fixed. nodeIterator(from)
with from>1 was not working properly. Thanks to Francesco Zumpano
and Pierluigi Origlia for finding this bug.
- New example class to interface your data with arc-labelled graph classes.
- Integer labels have a public value fields.
- Load methods of BVGraph now look for an offsetstep property to set
the offset step externally.
- New extension for label offsets (.labeloffsets) and new property
key for the underlying graph (underlyinggraph). Watch out!
- New relabelling wrapper to change the labels of a graph.
- New class implementing a variant of the Tarjan algorithm.
- All standard extensions and property keys are now defined by string constants.
- New algo package. We start with strongly connected components.
1.7
- Brand new ArcLabelledGraph
- Deprecated classes and methods have been removed.
- Revamped OutdegreeStats class.
- New loadOnce() method for loading graphs on-the-fly. Very useful for
generating an ASCIIGraph to standard output can compressing it without
actually storing it.
- New randomAccess() method that tells you whether a
graph supports random access.
- A number of new packages containing unit tests.
- Fixed bug in ImmutableSubgraph: the property subgraphnodes
was not actually read.
- Implemented a workaround for bug #6478546 (you can't do read() on large
arrays when you have a lot of heap--bizarre, isn't it?).
1.6
- Most load() static methods now override the return type and
declare the actual returned type, usually more specific (e.g.,
BVGraph.load() returns a BVGraph).
- Graphs can now be transposed with an offline method. It is
slower than the in-memory method, but it can transpose arbitrarily
large graphs.
1.5
- IMPORTANT: WebGraph requires now Java 5.
- New ArrayListMutableGraph class that makes it easy to create
dynamically graphs, and then exposes them as an ImmutableGraph.
- New documentation and example on how to import your data in
WebGraph.
- All code moved from ProgressMeter to ProgressLogger. All old
methods are deprecated.
- Command line parsing entirely handled by JSAP.
- The default maximum reference count for BVGraph is now 3.
- ASCIIGraph has been revamped to be usable to convert offline
large graphs.
- The basename property was never used, and it is no longer saved.
1.4.1
- New method writeOffsets() and corresponding -O option in BVGraph
which writes the offsets of a graph computing them from the graph
representation (.graph file). This allows to distribute directly
just the .graph and the .properties files.
- Incompatible ImmutableSubgraph, with more (hopefully) sensible
method names.
1.4
- Now various classes use the ImmutableGraph reflection methods.
- New ImmutableSubgraph class for storing and manipulating subgraphs
holding just a reference to the node subset.
- New Transform static container with common constructions, and
computation of Gray code ordering.
- Fixed lack of error message when accessing randomly successor
in a sequentially loaded BVGraph.
1.2.4
- The graph class name is now obtained using getName(), and
kluges have been placed that make also old graphs work.
- New explicit convention for storing the graph class name in a property file.
- New static methods in ImmutableGraph that load a graph using reflection
and the convention above.
- Fixed lack of check or null pm.
- Fixed lack of loadOffline() method in BVGraph (causing infinite recursion).
1.2.2
- Aligned usage of iterators with fastutil 3.1.
1.2.1
- Fixed a stupid bug (in one case we forgot to reallocate a new
FastMultiByteArrayInputStream).
- Fixed another stupid bug (using a standard, memory-stored
graph would have not worked!).
1.2
- BVGraph now supports graphs larger than 2 GiB (in fact, up to 256 PiB)
using (transparently) FastMultiByteArrayInputStream.
1.1
- The return type of the load method has been changed to ImmutableGraph,
so to make it possible to override it in subclasses. This might require some
additional type casting in existing code.
1.0r2
- Updated to new fastutil class set.
1.0
- First public release.