-
Notifications
You must be signed in to change notification settings - Fork 4
Rule Base Generator
The Rule-Base-Generator is a tool which assists the integration designer in designing and ordering cleaning rules which are then applied during the Cleaning phase.
The tool should be used for one source at a time since rules depend on the specific way in which attribute names are encoded at each source.
The tool is equipped with a simple command-line interface which allows to:
- Visualize all keys available at the current state for a given source;
- Write rules which follow the syntax:
<regular_expression>=>pattern_matching_replacement_strategy
; - Evaluate the simulated result of the addition of the inserted rule into the current RuleBase;
- Reject the rule, without any change of the current RuleBase;
- Accept the rule, with consequent insertion in the RuleBase based on rule precedence and application to all keys which are matched by the rule left-hand-side;
- Visualize at all times an updated list of transformed keys which are already handled by rules currently contained in the RuleBase;
- Visualize at all times an updated list of transformed keys which are NOT handled by rules currently contained in the RuleBase
- Update the RuleBase behavior w.r.t. to a specific regular expression by rule substitution.
As a simple example of rule, consider the following:
replicates(__[0-9]__)library__biosample__(donor)__(age|sex)(.*)=> $2$1$3$4
Note that the special "dollar" characters serve as identifiers of the content of the parenthesis in the left hand side part of the rule.
When applied to the transformed key replicates__1__library__biosample__donor__age
,
it allows to produce, as a result, a concatenation of the second parenthesis content donor
,
with the first one __1__
, with the third one age
, and the fourth one which is an empty string in this particular situation.
For execution commands for this application please refer to User Manual
The implementation is available at this link
Usage
Supporting Tools