Examples
There are two examples included with the current Banjo release; they can be found in the data directory of the downloadable zip file. This page shows how to run Banjo on the provided data files, how to change parameters via command line options, and how to interpret the output.
In the examples below, all framed areas are taken directly from the output of Banjo, so they serve as simulated screenshots obtained by running Banjo in a terminal window.
The two examples are:
Example: Searching for a Static Bayesian Network
The underlying problem for this example is a static Bayesian network with 33 variables and 320 observations. Say that we slightly modified the provided settings file to permit a longer run (of 1 hour) and made a few other changes, saving them as static.settings.long.txt. Then we execute Banjo by typing:
java -jar banjo.jar settingsFile=data/static/static.settings.long.txt
The application will provide immediate feedback on the settings that were supplied:
----------------------------------------------------------------------------- - Banjo Bayesian Network Inference with Java Objects - - Release 1.0.0 1 Aug 2005 - - Licensed from Duke University - - Copyright (c) 2005 by Alexander J. Hartemink - - All rights reserved - ----------------------------------------------------------------------------- - Project: banjo static example - User: demo - Data set: 33-vars-320-observations - Notes: static bayesian network inference ----------------------------------------------------------------------------- - Searcher: SearcherSimAnneal ----------------------------------------------------------------------------- - Settings file: data/static/static.settings.long.txt - Input directory: data/static/input - Observations file: static.data.txt - Output directory: data/static/output - Report file: static.report.txt - Variable count: 33 - Number of observations: 320 - Discretization policy: none - Discretization exceptions: none - Min. Markov lag: 0 - Max. Markov lag: - DBN mandatory lag(s): - Equiv. sample size: 1.0 - Max. parent count: 5 - Initial structure file: - 'Must be present' edges file: - 'Must not be present' edges file: - Max. time: 60.0 m - Max. proposed networks: - Max. restarts: - Min. networks before checking: 100 - Number of best networks tracked: 1 - Number of progress reports: 20 - Write to file interval: 10.0 m - Statistics: RecorderStandard - Proposer: ProposerRandomLocalMove - Evaluator: defaulted to EvaluatorBDe - Cycle checker: CycleCheckerDFS - Decider: defaulted to DeciderMetropolis ----------------------------------------------------------------------------- - Initial temperature: 10000 - Cooling factor: 0.7 - Reannealing temperature: 800 - Max. accepted networks before cooling: 2500 - Max. proposed networks before cooling: 10000 - Min. accepted networks before reannealing: 500 -----------------------------------------------------------------------------
The settings feedback is organized into several sections. Below the general header info that tracks which version of Banjo is being used, the general project information is listed. This includes the four free-form settings ‘project’, ‘user’, ‘data set’, and ‘notes’. Next come the values of the general parameters that set up a search, including the names and locations of input and output files, optional discretization of the supplied data, stopping criteria (in terms of time, number of networks, or number of restarts), number of high-scoring networks tracked, frequency of intermediate feedback and writing results to file, and the names of the core components. Finally, there is a section for parameters that are specific to the search strategy being selected. In our case, Simulated Annealing is governed by the values for the initial temperature, the cooling factor, the reannealing temperature, the maximum number of accepted networks before cooling, the maximum number of proposed networks before cooling, and the minimum number of accepted networks before reannealing.
Banjo also displays a “discretization report” that lists some of the core characteristics of the data supplied in the observations file. In our case, the data were already discrete, with values from 0 to 3, so no discretization was necessary. For this reason the discretization policy setting was set to ‘none’. The report simply describes the original data and what was done to them by the discretization policy, if anything. The discretization options are described in more detail in the Banjo User Guide.
----------------------------------------------------------------------------- - Pre-processing Discretization report ----------------------------------------------------------------------------- Variable | Discr. | Min. Val. | Max. Val. | Orig. | Used | | | | | points | points | ----------------------------------------------------------------------------- 0 | none | 0.0 | 1.0 | 2 | 2 | 1 | none | 0.0 | 3.0 | 4 | 4 | 2 | none | 0.0 | 3.0 | 4 | 4 | 3 | none | 0.0 | 3.0 | 4 | 4 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | 29 | none | 0.0 | 3.0 | 4 | 4 | 30 | none | 0.0 | 3.0 | 4 | 4 | 31 | none | 0.0 | 3.0 | 4 | 4 | 32 | none | 0.0 | 3.0 | 4 | 4 | -----------------------------------------------------------------------------
Banjo then starts the execution of the search, and provides periodic feedback on its progress: in our case, the maximum time allotted for the search was 60 minutes, and we had requested 20 progress reports. In addition, we requested that the intermediate results be saved to file every 10 minutes.
Starting search at 8/1/05 12:49:57 AM Prep. time used: 375 ms Search 5.0% completed. Elapsed time: 3.0 m. Time remaining: 57.0 m. Networks examined: 25061600. Search 10.0% completed. Elapsed time: 6.0 m. Time remaining: 54.0 m. Networks examined: 50386500. Search 15.0% completed. Elapsed time: 9.0 m. Time remaining: 51.0 m. Networks examined: 76142900. ----------------------------------------------------------------------------- - Intermediate report Best scores so far ----------------------------------------------------------------------------- Network score: -8452.086766, first found at iteration 70090842 33 0 1 30 1 1 14 2 1 17 . . . . . . . . . 30 1 8 31 1 13 32 1 13 Search 20.0% completed. Elapsed time: 12.0 m. Time remaining: 48.0 m. Networks examined: 101888400. Search 25.0% completed. Elapsed time: 15.0 m. Time remaining: 45.0 m. Networks examined: 127666700. Search 30.0% completed. Elapsed time: 18.0 m. Time remaining: 42.0 m. Networks examined: 153367400. ...
Finally, when the maximum allotted search time or number of search loops is reached, Banjo prints out the search results, which includes the set of best networks—in our case the single best network—and a set of statistical information collected by the search components. Since each data set has its own unique characteristics, the statistics can be helpful in tuning a search strategy.
Search 100.0% completed. Elapsed time: 60.0 m. Time remaining: 0 ms. Networks examined: 513865800. ----------------------------------------------------------------------------- - Best Structure ----------------------------------------------------------------------------- Network score: -8445.6013344, first found at iteration 302362683 33 0 2 5 25 1 1 14 2 1 17 3 1 5 . . . . . . . . . 29 1 13 30 1 8 31 1 29 32 1 13 ----------------------------------------------------------------------------- - Search Statistics ----------------------------------------------------------------------------- Statistics collected in searcher 'SearcherSimAnneal': Search completed at 8/1/05 1:49:57 AM Number of networks examined: 513865800 Total time used: 60.0 m High score: -8445.6013344, first found at iteration 302362683 Number of restarts: 5384 Statistics collected in proposer 'ProposerRandomLocalMove': Additions -- proposed: 171956379 Deletions -- proposed: 170972594 Reversals -- proposed: 170936826 Statistics collected in cycle checker 'CycleCheckerDFS': Additions -- considered: 171956379, acyclic: 155971285 Deletions -- considered: 170972594, acyclic: 170972594 Reversals -- considered: 170936826, acyclic: 168215107 Statistics collected in evaluator 'EvaluatorBDe': Scores computed: 15291416 Scores (cache) placed fetched with 0 parents: 33 283065857 with 1 parents: 1056 152618369 with 2 parents: 16368 207375207 with 3 parents: 13966810 4927714 with 4 parents: 1191187 107129 with 5 parents: 115962 5891 Statistics collected in decider 'DeciderMetropolis': Additions -- considered: 155971285, better score: 22476537, other accepted: 41396587 Deletions -- considered: 170972594, better score: 41449859, other accepted: 22423242 Reversals -- considered: 168215107, better score: 47932130, other accepted: 21712937 Average permissivity: 0.223
Explanation of the Results
Banjo supplies the obtained high-scoring Bayesian network in the following form (the data for nodes id = 4 to id = 30 are omitted):
Network score: -8445.6013344, first found at iteration 302362683 33 0 2 5 25 1 1 14 2 1 17 3 1 5 . . . . . . . . . 29 1 13 30 1 8 31 1 29 32 1 13
The first line indicates the score (-8445.6013344) and when it was first encountered (iteration 302362683).
Line 2 indicates that the number of variables in the network is 33.
Lines 3 to 35 (one for each of the 33 variables) first list the id of a variable, then the number of parents, and then a listing of the parents. For example, ‘0 2 5 25’ means that variable with id = 0 has 2 parents, namely the variables with id = 5 and id = 25.
The graphical representation of the obtained network, generated using the Banjo dot format output, looks like this:
Example: Searching for a Dynamic Bayesian Network
The second example is a search for a dynamic Bayesian network (DBN), described as a problem with 20 variables and 2000 observations. The minimum and maximum Markov lag in this example are both equal to 1, which means that no links between nodes of Markov lag 0 are permitted. You may notice in the resulting statistics that no reversals were considered as possible changes to the Bayesian network. In addition, there was no need for the search algorithm to perform any cycle checking when proposing a change. We run the search with:
java -jar banjo.jar settingsFile=data/dynamic/dynamic.settings.txt
As above, the application will provide immediate feedback on the settings that were supplied:
----------------------------------------------------------------------------- - Banjo Bayesian Network Inference with Java Objects - - Release 1.0.0 1 Aug 2005 - - Licensed from Duke University - - Copyright (c) 2005 by Alexander J. Hartemink - - All rights reserved - ----------------------------------------------------------------------------- - Project: banjo dynamic example - User: demo - Data set: 20-vars-2000-temporal-observations - Notes: dynamic bayesian network inference ----------------------------------------------------------------------------- - Searcher: SearcherGreedy ----------------------------------------------------------------------------- - Settings file: data/dynamic/dynamic.settings.txt - Input directory: data/dynamic/input - Observations file: dynamic.data.txt - Output directory: data/dynamic/output - Report file: dynamic.report.txt - Variable count: 20 - Number of observations: 2000 - 'Effective' number of observations with DBN: 1999 - Discretization policy: none - Discretization exceptions: none - Min. Markov lag: 1 - Max. Markov lag: 1 - DBN mandatory lag(s): 1 - Equiv. sample size: 1.0 - Max. parent count: 5 - Initial structure file: - 'Must be present' edges file: - 'Must not be present' edges file: - Max. time: - Max. proposed networks: 1000000 - Max. restarts: - Min. networks before checking: 1000 - Number of best networks tracked: 5 - Number of progress reports: 10 - Write to file interval: - Statistics: RecorderStandard - Proposer: ProposerAllLocalMoves - Evaluator: defaulted to EvaluatorBDe - Cycle checker: CycleCheckerDFS - Decider: defaulted to DeciderGreedy ----------------------------------------------------------------------------- - Min. proposed networks after high score: 1000 - Min. proposed networks before restart: 3000 - Max. proposed networks before restart: 5000 - Restart method: use random network - with max. parent count: 3 ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- - Pre-processing Discretization report ----------------------------------------------------------------------------- Variable | Discr. | Min. Val. | Max. Val. | Orig. | Used | | | | | points | points | ----------------------------------------------------------------------------- 0 | none | 0.0 | 2.0 | 3 | 3 | 1 | none | 0.0 | 2.0 | 3 | 3 | 2 | none | 0.0 | 2.0 | 3 | 3 | 3 | none | 0.0 | 2.0 | 3 | 3 | 4 | none | 0.0 | 2.0 | 3 | 3 | 5 | none | 0.0 | 2.0 | 3 | 3 | 6 | none | 0.0 | 2.0 | 3 | 3 | 7 | none | 0.0 | 2.0 | 3 | 3 | 8 | none | 0.0 | 2.0 | 3 | 3 | 9 | none | 0.0 | 2.0 | 3 | 3 | 10 | none | 0.0 | 2.0 | 3 | 3 | 11 | none | 0.0 | 2.0 | 3 | 3 | 12 | none | 0.0 | 2.0 | 3 | 3 | 13 | none | 0.0 | 2.0 | 3 | 3 | 14 | none | 0.0 | 2.0 | 3 | 3 | 15 | none | 0.0 | 2.0 | 3 | 3 | 16 | none | 0.0 | 2.0 | 3 | 3 | 17 | none | 0.0 | 2.0 | 3 | 3 | 18 | none | 0.0 | 2.0 | 3 | 3 | 19 | none | 0.0 | 2.0 | 3 | 3 | -----------------------------------------------------------------------------
Banjo then provides periodic feedback on its progress, and, when the search is completed, it supplies the final results. In our case this includes the 5 highest scoring networks, the statistical information about the search, a basic output of the best network for generating a graph in dot, and the list of influence scores.
Starting search at 8/1/05 1:21:39 PM Prep. time used: 1171 ms Search 10.0% completed. Networks examined: 100321. Elapsed time: 16391 ms. Search 20.0% completed. Networks examined: 200641. Elapsed time: 32.67 s. Search 30.0% completed. Networks examined: 300961. Elapsed time: 49.1 s. Search 40.0% completed. Networks examined: 400141. Elapsed time: 65.17 s. Search 50.0% completed. Networks examined: 500461. Elapsed time: 81.01 s. Search 60.0% completed. Networks examined: 600781. Elapsed time: 96.34 s. Search 70.1% completed. Networks examined: 701101. Elapsed time: 112.54 s. Search 80.0% completed. Networks examined: 800281. Elapsed time: 2.13 m. Search 90.0% completed. Networks examined: 900601. Elapsed time: 2.38 m. Search 100.0% completed. Networks examined: 1000921. Elapsed time: 2.64 m. ----------------------------------------------------------------------------- - Best 5 Structures ----------------------------------------------------------------------------- Network #1, score: -15935.2860609, first found at iteration 4941 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 4 0: 0 1: 2 1 4 5 0: 0 1: 2 4 5 6 0: 0 1: 1 6 7 0: 0 1: 2 3 7 8 0: 0 1: 2 3 8 9 0: 0 1: 3 5 6 9 10 0: 0 1: 3 8 9 10 11 0: 0 1: 2 10 11 12 0: 0 1: 1 12 13 0: 0 1: 1 13 14 0: 0 1: 1 14 15 0: 0 1: 1 15 16 0: 0 1: 1 16 17 0: 0 1: 1 17 18 0: 0 1: 1 18 19 0: 0 1: 1 19 Network #2, score: -15939.3820273, first found at iteration 4561 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 4 0: 0 1: 2 1 4 5 0: 0 1: 2 4 5 6 0: 0 1: 1 6 7 0: 0 1: 2 3 7 8 0: 0 1: 2 3 8 9 0: 0 1: 2 5 9 10 0: 0 1: 3 8 9 10 11 0: 0 1: 2 10 11 12 0: 0 1: 1 12 13 0: 0 1: 1 13 14 0: 0 1: 1 14 15 0: 0 1: 1 15 16 0: 0 1: 1 16 17 0: 0 1: 1 17 18 0: 0 1: 1 18 19 0: 0 1: 1 19 Network #3, score: -15986.7728978, first found at iteration 4181 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 4 0: 0 1: 2 1 4 5 0: 0 1: 2 4 5 6 0: 0 1: 1 6 7 0: 0 1: 2 3 7 8 0: 0 1: 2 3 8 9 0: 0 1: 2 5 9 10 0: 0 1: 2 9 10 11 0: 0 1: 2 10 11 12 0: 0 1: 1 12 13 0: 0 1: 1 13 14 0: 0 1: 1 14 15 0: 0 1: 1 15 16 0: 0 1: 1 16 17 0: 0 1: 1 17 18 0: 0 1: 1 18 19 0: 0 1: 1 19 Network #4, score: -15996.6277017, first found at iteration 3801 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 4 0: 0 1: 2 1 4 5 0: 0 1: 2 4 5 6 0: 0 1: 1 6 7 0: 0 1: 2 3 7 8 0: 0 1: 2 3 8 9 0: 0 1: 2 5 9 10 0: 0 1: 1 10 11 0: 0 1: 2 10 11 12 0: 0 1: 1 12 13 0: 0 1: 1 13 14 0: 0 1: 1 14 15 0: 0 1: 1 15 16 0: 0 1: 1 16 17 0: 0 1: 1 17 18 0: 0 1: 1 18 19 0: 0 1: 1 19 Network #5, score: -16061.5421473, first found at iteration 3421 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 4 0: 0 1: 2 1 4 5 0: 0 1: 2 4 5 6 0: 0 1: 1 6 7 0: 0 1: 2 3 7 8 0: 0 1: 2 3 8 9 0: 0 1: 1 9 10 0: 0 1: 1 10 11 0: 0 1: 2 10 11 12 0: 0 1: 1 12 13 0: 0 1: 1 13 14 0: 0 1: 1 14 15 0: 0 1: 1 15 16 0: 0 1: 1 16 17 0: 0 1: 1 17 18 0: 0 1: 1 18 19 0: 0 1: 1 19 ----------------------------------------------------------------------------- - Search Statistics ----------------------------------------------------------------------------- Statistics collected in searcher 'SearcherGreedy': Search completed at 8/1/05 1:24:01 PM Number of networks examined: 1000921 Total time used: 2.64 m High score: -15935.2860609, first found at iteration 4941 Number of restarts: 175 Statistics collected in proposer 'ProposerAllLocalMoves': Additions -- proposed: 982684 Deletions -- proposed: 18236 Reversals -- proposed: 0 (min. Markov lag = 1) Statistics collected in cycle checker 'CycleCheckerDFS': Additions -- no cyclicity test necessary Deletions -- no cyclicity test necessary Reversals -- none proposed Statistics collected in evaluator 'EvaluatorBDe': Scores computed: 267143 Scores (cache) placed fetched with 0 parents: 0 0 with 1 parents: 20 17906 with 2 parents: 380 696969 with 3 parents: 236638 22836 with 4 parents: 30105 2620 with 5 parents: 0 0 Statistics collected in decider 'DeciderGreedy': Additions -- considered: 2284, better score: 2284 Deletions -- considered: 350, better score: 0 Reversals -- considered: 0 (min. Markov lag = 1) ----------------------------------------------------------------------------- - Post-processing DOT graphics format output ----------------------------------------------------------------------------- digraph abstract { label = "Banjo Version 1.0.0\n High scoring network, score: -15935.29\n Project: banjo dynamic example\n User: demo\n Data set: 20-vars-2000-temporal-observations\n Networks searched: 1000921"; labeljust = "l"; 7->0; 0->2; 1->2; 2->3; 1->4; 4->5; 3->7; 3->8; 5->9; 6->9; 8->10; 9->10; 10->11; } ----------------------------------------------------------------------------- - Post-processing Influence scores ----------------------------------------------------------------------------- Influence score for (7,1) -> (0,0) -0.4377 Influence score for (0,1) -> (0,0) 0.7398 Influence score for (1,1) -> (1,0) 0.8321 Influence score for (2,1) -> (2,0) 0.7182 Influence score for (1,1) -> (2,0) -0.2764 Influence score for (0,1) -> (2,0) 0.1788 Influence score for (3,1) -> (3,0) 0.7487 Influence score for (2,1) -> (3,0) 0.4088 Influence score for (4,1) -> (4,0) 0.831 Influence score for (1,1) -> (4,0) 0.2829 Influence score for (5,1) -> (5,0) 0.7699 Influence score for (4,1) -> (5,0) -0.3771 Influence score for (6,1) -> (6,0) 0.8502 Influence score for (7,1) -> (7,0) 0.7327 Influence score for (3,1) -> (7,0) 0.4538 Influence score for (8,1) -> (8,0) 0.759 Influence score for (3,1) -> (8,0) -0.4027 Influence score for (9,1) -> (9,0) 0.693 Influence score for (6,1) -> (9,0) -0.1581 Influence score for (5,1) -> (9,0) 0.3163 Influence score for (10,1) -> (10,0) 0.6724 Influence score for (9,1) -> (10,0) 0.1913 Influence score for (8,1) -> (10,0) 0.3029 Influence score for (11,1) -> (11,0) 0.7665 Influence score for (10,1) -> (11,0) -0.4201 Influence score for (12,1) -> (12,0) 0.8469 Influence score for (13,1) -> (13,0) 0.8396 Influence score for (14,1) -> (14,0) 0.8457 Influence score for (15,1) -> (15,0) 0.8427 Influence score for (16,1) -> (16,0) 0.8625 Influence score for (17,1) -> (17,0) 0.8746 Influence score for (18,1) -> (18,0) 0.8476 Influence score for (19,1) -> (19,0) 0.8608
Explanation of the Results
The obtained high-scoring Bayesian network is supplied in the following form (the data for nodes id = 4 to id = 17 are omitted):
Network #1, score: -15935.2860609, first found at iteration 4941 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 . . . . . . . . . . . . . . . . . . 18 0: 0 1: 1 18 19 0: 0 1: 1 19
The first line indicates the ranking of the network (here: #1), and its associated score (here: -15935.2860609), first encountered at iteration 4941.
The second line indicates that the number of variables is 20.
Since we have a dynamic network with maximum Markov lag 1, we list the parents for each node and for each Markov lag in a separate “block”, starting with the respective Markov lag and a colon (‘:’). I.e., lines 3 to 22 (one for each of the 20 variables) first list the id of a variable, the block for Markov lag 0 (here, ‘0: 0’ indicates that there is no parent of Markov lag 0), and then the block for Markov lag 1 (here, for variable id = 0, ‘1: 2 0 7’ indicates that variable id = 0 has 2 parents of Markov lag 1, namely variable id = 0 and variable id = 7).
The graphical representation of the network, obtained using dot, is this: