Saturday, 3 November 2007

Benchmarking Drools

In my earlier post I said that I hadn't had a chance to run a speed-test on Drools to compare using Drools to sort values with using Java. Last week I did just that and have learned a lot about tuning Drools along the way.

I have included all the source code in an appendix at the end of this post, along with a link to all the zipped source.

Objective


The objective is to determine how to best use the Drools rule engine depending on the number of Facts, type of rules, and requirements of the system including performance, scalability and maintainability.

The Approach


A simple problem will be passed to the Drools rule engine for solution. The different approaches used in the solution will include the use of stateful and stateless sessions, using the rule engine to sort and order facts, passing the rule engine pre-ordered and sorted facts, and modifying the rules themselves to measure changes in performance.

The Problem


The problem is basically to aggregate credit and debit cashflows into accounting periods and determine the account balance at the end of each period after all the cashflows have been applied.

Caveats


This problem is quite a simple one and using the Drools rule engine for such a task is very much taking a sledgehammer to crack a nut - unless we were expecting to introduce more rules and facts into the equation later.

Expectations


  • I would expect a stateful session to be faster if the RuleBase is cached

  • I would expect a stateless session (which can be applied to this problem) to be faster than a stateful session

  • I would expect better performance if the cashflows are aggregated into accounting periods and passed to the rule engine for each accounting period in turn.

The Tests


The tests are broken down into nine test sets (numbered 1 to 9 below). Each test set provides a different solution to the problem and all test sets provide exactly the same output.

Each test set is run several times with a different number of facts each time. For each set of facts supplied, the test set is run through several iterations and the fastest, slowest, and average time to complete processing is measured in milliseconds.

The tests themselves evolved with findings and feedback and so the final set of nine tests is listed below:

  1. Stateful Session

    A new RuleBase is created for each iteration and then all the facts are inserted together into a stateful session.

    The rule engine orders the accounting periods and aggregates the facts into each accounting period in turn before calculating the end of period balance by applying the credits and debits to the relevant bank account.

  2. Stateful Session with cached RuleBase

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration. All the facts are inserted together into a stateful session.

    The rule engine orders the accounting periods and aggregates the facts into each accounting period in turn before calculating the end of period balance by applying the credits and debits to the relevant bank account.

  3. Stateful, Cached,Condition elements grouped within rules

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration. All the facts are inserted together into a stateful session.

    The rule engine orders the accounting periods and aggregates the facts into each accounting period in turn before calculating the end of period balance by applying the credits and debits to the relevant bank account.

    Within the rules themselves, the Condition elements are grouped together as per the example below:

    This condition set shows two AccountingPeriod conditions separated with a Cashflow condition:

    AccountingPeriod( $start : start, $end : end )
    $cashflow : Cashflow( $account : account, $date : date <= $end
    && date >= $start, $amount : amount, type==Cashflow.DEBIT )
    not AccountingPeriod( start < $start)

    In the condition set below, we have grouped the AccountingPeriod conditions:

    AccountingPeriod( $start : start, $end : end )
    not AccountingPeriod( start < $start)
    Cashflow( $account : account, $date : date <= $end
    && date >= $start, $amount : amount, type==Cashflow.CREDIT )


  4. Stateful, Cached, Group, long for sorting

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration. All the facts are inserted together into a stateful session.

    The rule engine orders the accounting periods and aggregates the facts into each accounting period in turn before calculating the end of period balance by applying the credits and debits to the relevant bank account.

    Within the rules themselves, the Condition elements are grouped together.

    When ordering the AccountingPeriods and aggregating the Cashflows, this is done using a long primitive representation of the period start and end dates and the cashflow date instead of a Date object.

  5. Stateful, Cached, Group, Long for sorting

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration. All the facts are inserted together into a stateful session.

    The rule engine orders the accounting periods and aggregates the facts into each accounting period in turn before calculating the end of period balance by applying the credits and debits to the relevant bank account.

    Within the rules themselves, the Condition elements are grouped together.

    When ordering the AccountingPeriods and aggregating the Cashflows, this is done using a Long object representation of the period start and end dates and the cashflow date instead of a Date object.

  6. Stateful, Cached, Grouped, Facts inserted by Accounting Period

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration.

    The accounting periods are ordered, and the cashflows are aggregated for each accounting period in turn.

    Each accounting period is then processed in turn and all the facts for each accounting period are inserted into a stateful session and the results for that period are obtained.

    Within the rules themselves, the Condition elements are grouped together.

  7. Stateless, Cached, Grouped, Facts inserted by Accounting Period

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration.

    The accounting periods are ordered, and the cashflows are aggregated for each accounting period in turn.

    Each accounting period is then processed in turn and all the facts for each accounting period are inserted into a stateless session and the results for that period are obtained.

    Within the rules themselves, the Condition elements are grouped together.

  8. Stateless, Cached, Grouped, Facts inserted by Accounting Period, using the accumulate method

    A new RuleBase is created for each different set of Facts and is then cached and reacquired for each iteration.

    The accounting periods are ordered, and the cashflows are aggregated for each accounting period in turn.

    Each accounting period is then processed in turn and all the facts for each accounting period are inserted into a stateless session and the results for that period are obtained.

    Within the rules themselves, the Condition elements are grouped together, and the accumulate method is used to total the credits and debits respectively.

  9. Plain Java method

    All the sorting, aggregating and balance calculations are done in Java.

Test Results


I have quoted below the results for 588 facts being inserted into the rule engine, with the rules being repeated through 50 iterations to get the average time in milliseconds to process the cashflows and obtain the account balance for each accounting period.

1 - 79 ms - Stateful
2 - 27 ms - Stateful, Cached
3 - 9 ms - Stateful, Cached, group conditions
4 - 21 ms - Stateful, Cached, group conditions, long primitive
5 - 21 ms - Stateful, Cached, group conditions, Long
6 - 6 ms - Stateful, Cached, group conditions, aggregate cashflows
7 - 6 ms - Stateless
8 - 4 ms - Statless, Accumulate
9 - 1 ms - Java

All the results for each set of tests, grouped by number of Facts and number of iterations are detailed in Appendix A: Results

Discussion of the Results


  • Caching v Non-Caching RuleBase

    Test1 and Test2 both use the rule file test01.drl. All the facts are asserted in one go and sorted within the rules. The Facts are then retracted once they have been used. This makes writing the rules easier, as you don’t have to worry about Facts hanging around that could influence other rules or even re-fire completed rules.

    Caching the RuleBase does show a marked improvement, but that improvement becomes less marked as the number of Facts increases.

  • Grouping Condition Elements

    Test3, groups the conditions in the rule files. For this it uses test02.drl.

    In test01.drl we have:

    AccountingPeriod( $start : start, $end : end )
    $cashflow : Cashflow( $account : account, $date : date <= $end
    && date >= $start, $amount : amount, type==Cashflow.DEBIT )
    not AccountingPeriod( start < $start)

    in test02.drl we have:

    AccountingPeriod( $start : start, $end : end )
    not AccountingPeriod( start < $start)
    Cashflow( $account : account, $date : date <= $end
    && date >= $start, $amount : amount, type==Cashflow.CREDIT )

    Simply changing the order from AccountingPeriod, Cashflow, not AccountingPeriod to AccountingPeriod, not AccountingPeriod, Cashflow has reduced the average time from 27ms to 9ms!!
    Note that this performance improvement continues throughout all the test sets, regardless of the number of Facts.

  • Using long and Long for Sorting

    Test4, and Test5 use test03.drl and test04.drl respectively, and so use long primitive and Long instead of Date for sorting. This does not show an improvement over sorting with Dates.

    Note, however, that the compareTo() and equals() method in the Cashflow object use the Date object for testing and, so these results may have been rendered invalid, depending on any use of the Collections Api within the rule engine. I may change those methods later and try it again.

  • Aggregating Cashflows before Inserting into Drools

    Test6 uses test05.drl, and collates all the Cashflows for each period and then inserts them into the rule engine for each accounting period in turn. This again, shows a marked improvement from 21ms to 6ms and this scale of improvement is consistent across all the test sets.

  • Using a Stateless Session

    Test7 uses test05.drl and uses a stateless session. Not suprisingly this test has consistently similar results to Test6 across all the test sets.

  • Using the accumulate() method

    Test8, uses test06.drl which again uses a stateless session, but uses the accumulate() method within the rules and does not retract the cashflows. This shows an improvement over Test7 which becomes more marked as the number of Facts increases.

  • Plain Java method

    Test9, is a simple java method that performs all the cashflows. Not surprisingly, this is the fastest approach. As the number of rules increases, however, we may see a difference. Certainly, the complexity of the java code as the rules increased would increase much more, and Drools does offer other facilities over the java solution including the BPM.

Summary


It is important to decide when to use Stateful sessions and when to use Stateless sessions. For this problem, Stateless was sufficient. If facts are modified during the rule process or new rules are generated or inserted, then a Stateful session is required.

Collating and sorting Facts before inserting them into the rule engine can show a marked improvement. This will have to be thought through beforehand as there are times when this is not practical.

Grouping condition elements within the rules themselves can have a marked improvement on performance and using new features such as the accumulate method show additional marked improvements.

Appendix A: Results


Average time in ms to process 16 Facts,
JVM arguments: -Xms512M -Xmx512m



Average time in ms to process 588 Facts,
JVM arguments: -Xms512M -Xmx512m



Average time in ms to process 4020 Facts,
JVM arguments: -Xms512M -Xmx512m



Average time in ms to process 96112 Facts,
JVM arguments: -Xms512M -Xmx512m



Average time in ms to process 1921936 Facts,
JVM arguments: -Xms1512M -Xmx1512m



The Source


The source code for this benchmark test is split into 3 projects

  • The Lib Project

    This project contains common classes used by all projects. SimpleDate extends the Date class to allow instantiation from Strings, and TimedResults is used for timing the rule engine, providing methods to retrieve fastest, slowest and average times.

  • The RuleRunner Project

    This project contains the classes that interface with the Drools engine. You will need to download the latest version of Drools bin and include the jars in this project.

    This project contains 4 classes so far: The Fact interface must be implemented by any facts that are to be inserted into the rule engine; and Globals must be inserted as Global objects. To use stateful sessions within Drools you need to instantiate a StatefulRunner and to instantiate a stateless session you need to instantiate a StatelessRunner.

  • The SpeedTest Project
    This project contains all the code and rules for the benchmark tests. The main class is net.tplusplus.test.drools.speedtest.SpeedTest1.

For more discussion of the source, refer to the next post - Banking on Drools Part II.

Download the Source


The source code for these tests can be downloaded here


8 comments:

akhem said...

Hi John,

This is an extremely insightful post. Since I'm using drools to develop a BR solution these results would be extremely helpful and would give me some direction on tuning my rule base.

Good work !

Also I would pligg a link to your article at http://www.brplatform.com if that's okay with you.

Thnx.
Amit

johndunning said...

Thanks Amit

link away

regards

John

akhem said...

Hey John,

I did post a link to your article at www.brplatform.com. Its a collaborative site that we at Bizense have started to organize,consolidate and share all article links related to Business Rule Technology.

Please check it out and do let me know your feedback on the same so I can improve it further.

Is there any other benchmark study available comparing drools with other major commercial rule engines (Haley, Corticon, ILog, Blaze) ?

Regards,
Amit

justwish said...

I badly need some help in drools. My issue is that the rules that i have coded are working real slow. >
here is what i have done (sample rule)

rule "oID"
when
o : NewOrderReport()
exists ($o : NewOrderReport(oID == null || oID == ""))
exists ($o : NewOrderReport(account=="P" || account=="R"))

then
o.setValid(false);
System.out.println("Fired oID");
end

I have 25 such rules. I simple call ksession.execute(Object) method to execute this. and just the execute call takes around 110 ms for one object(fact) which is way too much. Can I get some pointers as to what am I doing wrong here?

Hussain said...

Hello John,

The link to your source code is no longer valid. Please share the correct one.

Thanks,
Hussain

Anonymous said...

[url=http://www.ukgamingstore.co.uk/]gaming computers[/url]

[url=http://www.ukgamingstore.co.uk/intel-core-i5-gaming-computers]core i5 gaming pc uk[/url]

[url=http://www.ukgamingstore.co.uk/intel-core-i7-gaming-laptops]core i7 gaming laptop uk[/url]

[url=http://www.ukgamingstore.co.uk/gaming-keyboards]gaming keyboards uk[/url]

Anonymous said...

[url=http://www.23planet.com]casino[/url], also known as able casinos or Internet casinos, are online versions of routine ("chunk and mortar") casinos. Online casinos dissemble someone bear gamblers to extemporize and wager on casino games with the relieve the Internet.
Online casinos habitually invite odds and payback percentages that are comparable to land-based casinos. Some online casinos mandate higher payback percentages as a medicament looking in gesture of depression automobile games, and some flow payout note audits on their websites. Assuming that the online casino is using an correctly programmed indefinitely league generator, jot games like blackjack accept an established debar edge. The payout participation after these games are established at coming the rules of the game.
Numerous online casinos pucker minus or obtaining their software from companies like Microgaming, Realtime Gaming, Playtech, Supranational Cunning Technology and CryptoLogic Inc.

Anonymous said...

top [url=http://www.001casino.com/]online casino[/url] coincide the latest [url=http://www.casinolasvegass.com/]casino games[/url] manumitted no deposit bonus at the best [url=http://www.baywatchcasino.com/]baywatchcasino
[/url].