Examples
There are two examples included with the current Banjo release; they can be found in the data directory of the downloadable zip file. This page shows how to run Banjo on the provided data files, how to change parameters via command line options, and how to interpret the output.
In the examples below, all framed areas are taken directly from the output of Banjo, so they serve as simulated screenshots obtained by running Banjo in a terminal window.
The two examples are:
Example: Searching for a Static Bayesian Network
The underlying problem for this example is a static Bayesian network with 33 variables and 320 observations. Say that we slightly modified the provided settings file to permit a longer run (of 1 hour) and made a few other changes, saving them as static.settings.long.txt. Then we execute Banjo by typing:
java -jar banjo.jar settingsFile=data/static/static.settings.long.txt
The application will provide immediate feedback on the settings that were supplied:
----------------------------------------------------------------------------- - Banjo Bayesian Network Inference with Java Objects - - Release 1.0.0 1 Aug 2005 - - Licensed from Duke University - - Copyright (c) 2005 by Alexander J. Hartemink - - All rights reserved - ----------------------------------------------------------------------------- - Project: banjo static example - User: demo - Data set: 33-vars-320-observations - Notes: static bayesian network inference ----------------------------------------------------------------------------- - Searcher: SearcherSimAnneal ----------------------------------------------------------------------------- - Settings file: data/static/static.settings.long.txt - Input directory: data/static/input - Observations file: static.data.txt - Output directory: data/static/output - Report file: static.report.txt - Variable count: 33 - Number of observations: 320 - Discretization policy: none - Discretization exceptions: none - Min. Markov lag: 0 - Max. Markov lag: - DBN mandatory lag(s): - Equiv. sample size: 1.0 - Max. parent count: 5 - Initial structure file: - 'Must be present' edges file: - 'Must not be present' edges file: - Max. time: 60.0 m - Max. proposed networks: - Max. restarts: - Min. networks before checking: 100 - Number of best networks tracked: 1 - Number of progress reports: 20 - Write to file interval: 10.0 m - Statistics: RecorderStandard - Proposer: ProposerRandomLocalMove - Evaluator: defaulted to EvaluatorBDe - Cycle checker: CycleCheckerDFS - Decider: defaulted to DeciderMetropolis ----------------------------------------------------------------------------- - Initial temperature: 10000 - Cooling factor: 0.7 - Reannealing temperature: 800 - Max. accepted networks before cooling: 2500 - Max. proposed networks before cooling: 10000 - Min. accepted networks before reannealing: 500 -----------------------------------------------------------------------------
The settings feedback is organized into several sections. Below the general header info that tracks which version of Banjo is being used, the general project information is listed. This includes the four free-form settings ‘project’, ‘user’, ‘data set’, and ‘notes’. Next come the values of the general parameters that set up a search, including the names and locations of input and output files, optional discretization of the supplied data, stopping criteria (in terms of time, number of networks, or number of restarts), number of high-scoring networks tracked, frequency of intermediate feedback and writing results to file, and the names of the core components. Finally, there is a section for parameters that are specific to the search strategy being selected. In our case, Simulated Annealing is governed by the values for the initial temperature, the cooling factor, the reannealing temperature, the maximum number of accepted networks before cooling, the maximum number of proposed networks before cooling, and the minimum number of accepted networks before reannealing.
Banjo also displays a “discretization report” that lists some of the core characteristics of the data supplied in the observations file. In our case, the data were already discrete, with values from 0 to 3, so no discretization was necessary. For this reason the discretization policy setting was set to ‘none’. The report simply describes the original data and what was done to them by the discretization policy, if anything. The discretization options are described in more detail in the Banjo User Guide.
-----------------------------------------------------------------------------
- Pre-processing Discretization report
-----------------------------------------------------------------------------
Variable | Discr. | Min. Val. | Max. Val. | Orig. | Used |
| | | | points | points |
-----------------------------------------------------------------------------
0 | none | 0.0 | 1.0 | 2 | 2 |
1 | none | 0.0 | 3.0 | 4 | 4 |
2 | none | 0.0 | 3.0 | 4 | 4 |
3 | none | 0.0 | 3.0 | 4 | 4 |
. | . | . | . | . | . |
. | . | . | . | . | . |
. | . | . | . | . | . |
29 | none | 0.0 | 3.0 | 4 | 4 |
30 | none | 0.0 | 3.0 | 4 | 4 |
31 | none | 0.0 | 3.0 | 4 | 4 |
32 | none | 0.0 | 3.0 | 4 | 4 |
-----------------------------------------------------------------------------
Banjo then starts the execution of the search, and provides periodic feedback on its progress: in our case, the maximum time allotted for the search was 60 minutes, and we had requested 20 progress reports. In addition, we requested that the intermediate results be saved to file every 10 minutes.
Starting search at 8/1/05 12:49:57 AM Prep. time used: 375 ms Search 5.0% completed. Elapsed time: 3.0 m. Time remaining: 57.0 m. Networks examined: 25061600. Search 10.0% completed. Elapsed time: 6.0 m. Time remaining: 54.0 m. Networks examined: 50386500. Search 15.0% completed. Elapsed time: 9.0 m. Time remaining: 51.0 m. Networks examined: 76142900. ----------------------------------------------------------------------------- - Intermediate report Best scores so far ----------------------------------------------------------------------------- Network score: -8452.086766, first found at iteration 70090842 33 0 1 30 1 1 14 2 1 17 . . . . . . . . . 30 1 8 31 1 13 32 1 13 Search 20.0% completed. Elapsed time: 12.0 m. Time remaining: 48.0 m. Networks examined: 101888400. Search 25.0% completed. Elapsed time: 15.0 m. Time remaining: 45.0 m. Networks examined: 127666700. Search 30.0% completed. Elapsed time: 18.0 m. Time remaining: 42.0 m. Networks examined: 153367400. ...
Finally, when the maximum allotted search time or number of search loops is reached, Banjo prints out the search results, which includes the set of best networks—in our case the single best network—and a set of statistical information collected by the search components. Since each data set has its own unique characteristics, the statistics can be helpful in tuning a search strategy.
Search 100.0% completed. Elapsed time: 60.0 m. Time remaining: 0 ms.
Networks examined: 513865800.
-----------------------------------------------------------------------------
- Best Structure
-----------------------------------------------------------------------------
Network score: -8445.6013344, first found at iteration 302362683
33
0 2 5 25
1 1 14
2 1 17
3 1 5
. . .
. . .
. . .
29 1 13
30 1 8
31 1 29
32 1 13
-----------------------------------------------------------------------------
- Search Statistics
-----------------------------------------------------------------------------
Statistics collected in searcher 'SearcherSimAnneal':
Search completed at 8/1/05 1:49:57 AM
Number of networks examined: 513865800
Total time used: 60.0 m
High score: -8445.6013344, first found at iteration 302362683
Number of restarts: 5384
Statistics collected in proposer 'ProposerRandomLocalMove':
Additions -- proposed: 171956379
Deletions -- proposed: 170972594
Reversals -- proposed: 170936826
Statistics collected in cycle checker 'CycleCheckerDFS':
Additions -- considered: 171956379, acyclic: 155971285
Deletions -- considered: 170972594, acyclic: 170972594
Reversals -- considered: 170936826, acyclic: 168215107
Statistics collected in evaluator 'EvaluatorBDe':
Scores computed: 15291416
Scores (cache) placed fetched
with 0 parents: 33 283065857
with 1 parents: 1056 152618369
with 2 parents: 16368 207375207
with 3 parents: 13966810 4927714
with 4 parents: 1191187 107129
with 5 parents: 115962 5891
Statistics collected in decider 'DeciderMetropolis':
Additions -- considered: 155971285, better score: 22476537, other accepted: 41396587
Deletions -- considered: 170972594, better score: 41449859, other accepted: 22423242
Reversals -- considered: 168215107, better score: 47932130, other accepted: 21712937
Average permissivity: 0.223
Explanation of the Results
Banjo supplies the obtained high-scoring Bayesian network in the following form (the data for nodes id = 4 to id = 30 are omitted):
Network score: -8445.6013344, first found at iteration 302362683 33 0 2 5 25 1 1 14 2 1 17 3 1 5 . . . . . . . . . 29 1 13 30 1 8 31 1 29 32 1 13
The first line indicates the score (-8445.6013344) and when it was first encountered (iteration 302362683).
Line 2 indicates that the number of variables in the network is 33.
Lines 3 to 35 (one for each of the 33 variables) first list the id of a variable, then the number of parents, and then a listing of the parents. For example, ‘0 2 5 25’ means that variable with id = 0 has 2 parents, namely the variables with id = 5 and id = 25.
The graphical representation of the obtained network, generated using the Banjo dot format output, looks like this:
Example: Searching for a Dynamic Bayesian Network
The second example is a search for a dynamic Bayesian network (DBN), described as a problem with 20 variables and 2000 observations. The minimum and maximum Markov lag in this example are both equal to 1, which means that no links between nodes of Markov lag 0 are permitted. You may notice in the resulting statistics that no reversals were considered as possible changes to the Bayesian network. In addition, there was no need for the search algorithm to perform any cycle checking when proposing a change. We run the search with:
java -jar banjo.jar settingsFile=data/dynamic/dynamic.settings.txt
As above, the application will provide immediate feedback on the settings that were supplied:
-----------------------------------------------------------------------------
- Banjo Bayesian Network Inference with Java Objects -
- Release 1.0.0 1 Aug 2005 -
- Licensed from Duke University -
- Copyright (c) 2005 by Alexander J. Hartemink -
- All rights reserved -
-----------------------------------------------------------------------------
- Project: banjo dynamic example
- User: demo
- Data set: 20-vars-2000-temporal-observations
- Notes: dynamic bayesian network inference
-----------------------------------------------------------------------------
- Searcher: SearcherGreedy
-----------------------------------------------------------------------------
- Settings file: data/dynamic/dynamic.settings.txt
- Input directory: data/dynamic/input
- Observations file: dynamic.data.txt
- Output directory: data/dynamic/output
- Report file: dynamic.report.txt
- Variable count: 20
- Number of observations: 2000
- 'Effective' number of observations with DBN: 1999
- Discretization policy: none
- Discretization exceptions: none
- Min. Markov lag: 1
- Max. Markov lag: 1
- DBN mandatory lag(s): 1
- Equiv. sample size: 1.0
- Max. parent count: 5
- Initial structure file:
- 'Must be present' edges file:
- 'Must not be present' edges file:
- Max. time:
- Max. proposed networks: 1000000
- Max. restarts:
- Min. networks before checking: 1000
- Number of best networks tracked: 5
- Number of progress reports: 10
- Write to file interval:
- Statistics: RecorderStandard
- Proposer: ProposerAllLocalMoves
- Evaluator: defaulted to EvaluatorBDe
- Cycle checker: CycleCheckerDFS
- Decider: defaulted to DeciderGreedy
-----------------------------------------------------------------------------
- Min. proposed networks after high score: 1000
- Min. proposed networks before restart: 3000
- Max. proposed networks before restart: 5000
- Restart method: use random network
- with max. parent count: 3
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
- Pre-processing Discretization report
-----------------------------------------------------------------------------
Variable | Discr. | Min. Val. | Max. Val. | Orig. | Used |
| | | | points | points |
-----------------------------------------------------------------------------
0 | none | 0.0 | 2.0 | 3 | 3 |
1 | none | 0.0 | 2.0 | 3 | 3 |
2 | none | 0.0 | 2.0 | 3 | 3 |
3 | none | 0.0 | 2.0 | 3 | 3 |
4 | none | 0.0 | 2.0 | 3 | 3 |
5 | none | 0.0 | 2.0 | 3 | 3 |
6 | none | 0.0 | 2.0 | 3 | 3 |
7 | none | 0.0 | 2.0 | 3 | 3 |
8 | none | 0.0 | 2.0 | 3 | 3 |
9 | none | 0.0 | 2.0 | 3 | 3 |
10 | none | 0.0 | 2.0 | 3 | 3 |
11 | none | 0.0 | 2.0 | 3 | 3 |
12 | none | 0.0 | 2.0 | 3 | 3 |
13 | none | 0.0 | 2.0 | 3 | 3 |
14 | none | 0.0 | 2.0 | 3 | 3 |
15 | none | 0.0 | 2.0 | 3 | 3 |
16 | none | 0.0 | 2.0 | 3 | 3 |
17 | none | 0.0 | 2.0 | 3 | 3 |
18 | none | 0.0 | 2.0 | 3 | 3 |
19 | none | 0.0 | 2.0 | 3 | 3 |
-----------------------------------------------------------------------------
Banjo then provides periodic feedback on its progress, and, when the search is completed, it supplies the final results. In our case this includes the 5 highest scoring networks, the statistical information about the search, a basic output of the best network for generating a graph in dot, and the list of influence scores.
Starting search at 8/1/05 1:21:39 PM
Prep. time used: 1171 ms
Search 10.0% completed. Networks examined: 100321. Elapsed time: 16391 ms.
Search 20.0% completed. Networks examined: 200641. Elapsed time: 32.67 s.
Search 30.0% completed. Networks examined: 300961. Elapsed time: 49.1 s.
Search 40.0% completed. Networks examined: 400141. Elapsed time: 65.17 s.
Search 50.0% completed. Networks examined: 500461. Elapsed time: 81.01 s.
Search 60.0% completed. Networks examined: 600781. Elapsed time: 96.34 s.
Search 70.1% completed. Networks examined: 701101. Elapsed time: 112.54 s.
Search 80.0% completed. Networks examined: 800281. Elapsed time: 2.13 m.
Search 90.0% completed. Networks examined: 900601. Elapsed time: 2.38 m.
Search 100.0% completed. Networks examined: 1000921. Elapsed time: 2.64 m.
-----------------------------------------------------------------------------
- Best 5 Structures
-----------------------------------------------------------------------------
Network #1, score: -15935.2860609, first found at iteration 4941
20
0 0: 0 1: 2 0 7
1 0: 0 1: 1 1
2 0: 0 1: 3 0 1 2
3 0: 0 1: 2 2 3
4 0: 0 1: 2 1 4
5 0: 0 1: 2 4 5
6 0: 0 1: 1 6
7 0: 0 1: 2 3 7
8 0: 0 1: 2 3 8
9 0: 0 1: 3 5 6 9
10 0: 0 1: 3 8 9 10
11 0: 0 1: 2 10 11
12 0: 0 1: 1 12
13 0: 0 1: 1 13
14 0: 0 1: 1 14
15 0: 0 1: 1 15
16 0: 0 1: 1 16
17 0: 0 1: 1 17
18 0: 0 1: 1 18
19 0: 0 1: 1 19
Network #2, score: -15939.3820273, first found at iteration 4561
20
0 0: 0 1: 2 0 7
1 0: 0 1: 1 1
2 0: 0 1: 3 0 1 2
3 0: 0 1: 2 2 3
4 0: 0 1: 2 1 4
5 0: 0 1: 2 4 5
6 0: 0 1: 1 6
7 0: 0 1: 2 3 7
8 0: 0 1: 2 3 8
9 0: 0 1: 2 5 9
10 0: 0 1: 3 8 9 10
11 0: 0 1: 2 10 11
12 0: 0 1: 1 12
13 0: 0 1: 1 13
14 0: 0 1: 1 14
15 0: 0 1: 1 15
16 0: 0 1: 1 16
17 0: 0 1: 1 17
18 0: 0 1: 1 18
19 0: 0 1: 1 19
Network #3, score: -15986.7728978, first found at iteration 4181
20
0 0: 0 1: 2 0 7
1 0: 0 1: 1 1
2 0: 0 1: 3 0 1 2
3 0: 0 1: 2 2 3
4 0: 0 1: 2 1 4
5 0: 0 1: 2 4 5
6 0: 0 1: 1 6
7 0: 0 1: 2 3 7
8 0: 0 1: 2 3 8
9 0: 0 1: 2 5 9
10 0: 0 1: 2 9 10
11 0: 0 1: 2 10 11
12 0: 0 1: 1 12
13 0: 0 1: 1 13
14 0: 0 1: 1 14
15 0: 0 1: 1 15
16 0: 0 1: 1 16
17 0: 0 1: 1 17
18 0: 0 1: 1 18
19 0: 0 1: 1 19
Network #4, score: -15996.6277017, first found at iteration 3801
20
0 0: 0 1: 2 0 7
1 0: 0 1: 1 1
2 0: 0 1: 3 0 1 2
3 0: 0 1: 2 2 3
4 0: 0 1: 2 1 4
5 0: 0 1: 2 4 5
6 0: 0 1: 1 6
7 0: 0 1: 2 3 7
8 0: 0 1: 2 3 8
9 0: 0 1: 2 5 9
10 0: 0 1: 1 10
11 0: 0 1: 2 10 11
12 0: 0 1: 1 12
13 0: 0 1: 1 13
14 0: 0 1: 1 14
15 0: 0 1: 1 15
16 0: 0 1: 1 16
17 0: 0 1: 1 17
18 0: 0 1: 1 18
19 0: 0 1: 1 19
Network #5, score: -16061.5421473, first found at iteration 3421
20
0 0: 0 1: 2 0 7
1 0: 0 1: 1 1
2 0: 0 1: 3 0 1 2
3 0: 0 1: 2 2 3
4 0: 0 1: 2 1 4
5 0: 0 1: 2 4 5
6 0: 0 1: 1 6
7 0: 0 1: 2 3 7
8 0: 0 1: 2 3 8
9 0: 0 1: 1 9
10 0: 0 1: 1 10
11 0: 0 1: 2 10 11
12 0: 0 1: 1 12
13 0: 0 1: 1 13
14 0: 0 1: 1 14
15 0: 0 1: 1 15
16 0: 0 1: 1 16
17 0: 0 1: 1 17
18 0: 0 1: 1 18
19 0: 0 1: 1 19
-----------------------------------------------------------------------------
- Search Statistics
-----------------------------------------------------------------------------
Statistics collected in searcher 'SearcherGreedy':
Search completed at 8/1/05 1:24:01 PM
Number of networks examined: 1000921
Total time used: 2.64 m
High score: -15935.2860609, first found at iteration 4941
Number of restarts: 175
Statistics collected in proposer 'ProposerAllLocalMoves':
Additions -- proposed: 982684
Deletions -- proposed: 18236
Reversals -- proposed: 0 (min. Markov lag = 1)
Statistics collected in cycle checker 'CycleCheckerDFS':
Additions -- no cyclicity test necessary
Deletions -- no cyclicity test necessary
Reversals -- none proposed
Statistics collected in evaluator 'EvaluatorBDe':
Scores computed: 267143
Scores (cache) placed fetched
with 0 parents: 0 0
with 1 parents: 20 17906
with 2 parents: 380 696969
with 3 parents: 236638 22836
with 4 parents: 30105 2620
with 5 parents: 0 0
Statistics collected in decider 'DeciderGreedy':
Additions -- considered: 2284, better score: 2284
Deletions -- considered: 350, better score: 0
Reversals -- considered: 0 (min. Markov lag = 1)
-----------------------------------------------------------------------------
- Post-processing DOT graphics format output
-----------------------------------------------------------------------------
digraph abstract {
label = "Banjo Version 1.0.0\n
High scoring network, score: -15935.29\n
Project: banjo dynamic example\n
User: demo\n
Data set: 20-vars-2000-temporal-observations\n
Networks searched: 1000921";
labeljust = "l";
7->0;
0->2;
1->2;
2->3;
1->4;
4->5;
3->7;
3->8;
5->9;
6->9;
8->10;
9->10;
10->11;
}
-----------------------------------------------------------------------------
- Post-processing Influence scores
-----------------------------------------------------------------------------
Influence score for (7,1) -> (0,0) -0.4377
Influence score for (0,1) -> (0,0) 0.7398
Influence score for (1,1) -> (1,0) 0.8321
Influence score for (2,1) -> (2,0) 0.7182
Influence score for (1,1) -> (2,0) -0.2764
Influence score for (0,1) -> (2,0) 0.1788
Influence score for (3,1) -> (3,0) 0.7487
Influence score for (2,1) -> (3,0) 0.4088
Influence score for (4,1) -> (4,0) 0.831
Influence score for (1,1) -> (4,0) 0.2829
Influence score for (5,1) -> (5,0) 0.7699
Influence score for (4,1) -> (5,0) -0.3771
Influence score for (6,1) -> (6,0) 0.8502
Influence score for (7,1) -> (7,0) 0.7327
Influence score for (3,1) -> (7,0) 0.4538
Influence score for (8,1) -> (8,0) 0.759
Influence score for (3,1) -> (8,0) -0.4027
Influence score for (9,1) -> (9,0) 0.693
Influence score for (6,1) -> (9,0) -0.1581
Influence score for (5,1) -> (9,0) 0.3163
Influence score for (10,1) -> (10,0) 0.6724
Influence score for (9,1) -> (10,0) 0.1913
Influence score for (8,1) -> (10,0) 0.3029
Influence score for (11,1) -> (11,0) 0.7665
Influence score for (10,1) -> (11,0) -0.4201
Influence score for (12,1) -> (12,0) 0.8469
Influence score for (13,1) -> (13,0) 0.8396
Influence score for (14,1) -> (14,0) 0.8457
Influence score for (15,1) -> (15,0) 0.8427
Influence score for (16,1) -> (16,0) 0.8625
Influence score for (17,1) -> (17,0) 0.8746
Influence score for (18,1) -> (18,0) 0.8476
Influence score for (19,1) -> (19,0) 0.8608
Explanation of the Results
The obtained high-scoring Bayesian network is supplied in the following form (the data for nodes id = 4 to id = 17 are omitted):
Network #1, score: -15935.2860609, first found at iteration 4941 20 0 0: 0 1: 2 0 7 1 0: 0 1: 1 1 2 0: 0 1: 3 0 1 2 3 0: 0 1: 2 2 3 . . . . . . . . . . . . . . . . . . 18 0: 0 1: 1 18 19 0: 0 1: 1 19
The first line indicates the ranking of the network (here: #1), and its associated score (here: -15935.2860609), first encountered at iteration 4941.
The second line indicates that the number of variables is 20.
Since we have a dynamic network with maximum Markov lag 1, we list the parents for each node and for each Markov lag in a separate “block”, starting with the respective Markov lag and a colon (‘:’). I.e., lines 3 to 22 (one for each of the 20 variables) first list the id of a variable, the block for Markov lag 0 (here, ‘0: 0’ indicates that there is no parent of Markov lag 0), and then the block for Markov lag 1 (here, for variable id = 0, ‘1: 2 0 7’ indicates that variable id = 0 has 2 parents of Markov lag 1, namely variable id = 0 and variable id = 7).
The graphical representation of the network, obtained using dot, is this: