-
Notifications
You must be signed in to change notification settings - Fork 36
BRJS Memoization Strategy Strawman
NOTE: This is currently being refactored. See https://github.com/BladeRunnerJS/brjs/wiki/Memoized-Value-refactor-proposal
Although the object-oriented model within BladeRunnerJS has been hugely beneficial compared to what we had in BladeRunner, it is, by default, detrimental to performance since the same operations are performed time and time again, requiring expensive disk I/O to be repeated unnecessarily. If each class were to memoize the computed values they were responsible for creating, then the performance would be significantly better than what we had in BladeRunner, since disk I/O would only limited to cases where the underlying files had actually changed.
Unfortunately there are a number of reasons why doing this is problematic:
- Memoization breaks encapsulation since it requires each class to understand the workings of dependent classes (specifically the files that will be read), both immediate and transitive.
- There's no easy way to detect caching bugs caused by memoization, and since encapsulation has been broken then bugs will be common place.
- Any Caching bugs will severely damage our reputation since they are hard for developers to track down, and make the development environment unusable since the server will need to be restarted after every file change.
For memoization to work, we therefore need a solution that either doesn't break encapsulation or removes the harm associated with breaking encapsulation, and doesn't require additional testing. The solution I am proposing is to have each class that needs to memoize have an API that allows it to control what files can be read while the new result is being computed, so that an exception is thrown if a file is read outside of the permitted directories and files of all of the classes currently memoizing. Although this approach still breaks encapsulation since classes make assumptions about the directories that will be read from by other classes, developers will immediately be informed when their suppositions were incorrect.
To be absolutely sure that we are able to limit all APIs that read from the disk, a Java SecurityManager
object will be installed that can universally vet all file-system access, regardless of the API responsible for it. Because there is a performance implication to installing a security-manager, installation will be optional within the model, and may only be done when running our unit and spec tests.
Here is an example of a unit test that chooses to enable file checking:
@Before
public void setup() {
brjs.io().installFileAccessChecker();
}
@After
public void tearDown() {
brjs.io().uninstallFileAccessChecker();
}
And here is what memoizing model/plug-in code will look like:
private Foo foo;
private final MemoizationGuard fooGuard = brjs.io().createMemoizationGuard(dir());
private Bar bar;
private final MemoizationGuard barGuard = brjs.io().createMemoizationGuard(file("dir1"), file("dir2"));
public Foo getFoo() {
if(fooGuard.requiresMemoization()) {
try(fooGuard.memoize()) {
foo = createNewFoo();
}
}
return foo;
}
public Bar getBar() {
if(barGuard.requiresMemoization()) {
try(barGuard.memoize()) {
bar = createNewBar();
}
}
return bar;
}
Notice how we are making use of Java 7's try-with-resources feature so we don't need to remember to signal the end of the memoization block, and don't need to worry about any exceptions that might be thrown.
The BRJS.getFileIterator()
convenience method will also be made available via BRJS.io()
, but will be re-implemented on top of the createMemoizationGuard()
method, so we will end up with:
brjs.io().getFileIterator(dir1);
Given that most methods within most classes will benefit from memoization, we are going to end up with a large job to make all these methods memoizing methods, and as a result wiil end up with much more complicated classes as a result of all this memoization boiler-plate. A potentially quicker approach, that would also give us simpler and easier to maintain code, would be use to an AOP compiler like AspectJ.
By using the AspectJ compiler our model code would instead look like this:
private Foo foo;
private Bar bar;
@Memoizable
public Foo getFoo() {
return createNewFoo();
}
@Memoizable(dirs={"dir1", "dir2"})
public Bar getBar() {
return createNewBar();
}
Here, the Memoizable
annotation could be applied to any class that implements a new Directory
interface (which Node
automatically would), having following methods:
interface Directory {
File dir();
}
Then, classes that applied the @Memoizable
annotation without providing any attributes would default to using dir()
, whereas if the dirs
attribute was provided, its values would be relative to dir()
.
Although using AspectJ provides many advantages, it also imposes some limitations that must be taken into account:
- We become limited to using the AspectJ compiler (a modified Eclipse compiler) to compile our source code.
- Support for new versions of Java (e.g. Java8) will lag behind the normal release date (historically, by between 12 to 15 months).
- The AspectJ compiler is not supported by Gradle out-of-the-box, so we would need to use a third-party plug-in or recipe.
- There may be some re-factorings that can only be done if you have an AspectJ plug-in installed for your IDE, though plug-ins are available for Eclipse, IntelliJ & Netbeans, etc.
By making use of Java 8's lambda syntax, we can remove almost all of the boiler plate associated with memoization, without the need for AOP:
private MemoizedValue<Foo> foo = new MemoizedValue<>(dir());
private MemoizedValue<Bar> bar = new MemoizedValue<>(file("dir1"), file("dir2"));
public Foo getFoo() {
return foo.value(() -> {
return new Foo(getBar());
});
}
public Bar getBar() {
return bar.value(() -> {
return new Bar(getBaz());
});
}
Additionally, switching to Java 8 has a number of associated advantages:
- We can start making use of streams to parallelize our workload across the available cores provided by the CPU.
- Plug-ins can make use of the Nashorn compiler to run JavaScript efficiently.
- We can use lambdas to simplify the many
Commander
classes used within our spec tests.
Given that some CaplinTrader users will not be able to install Java 8 on their developer machines for some time yet, for the time being we could use retrolambda (via the gradle retrolambda plug-in), to allow the use of lambdas and the Java 8 compiler now, but with all tests being run against transpiled Java 7 byte code.