Emping 0.6 is a tool (prototype) for the discovery and analysis of predictive relations in nominal data. Its usage is best explained by example.
The source of the example is the well known data set from J.R. Quinlan, as shown in the screenshot from an Open Office Calc spreadsheet. All data and all results in Emping, except for the graphs, are in the .csv (comma seperated file) format which can be read and saved by OO Calc.
When Emping is started (for installation see the end) you can load a table like the one above with 'Open'. The first row should contain the attribute names, and each column should contain the possible values for the attribute at the top. You can use numbers, but they will be treated like any other name. So, type 1 will be no different from type A, and so on. Emping will automatically load the names in the table into its menu.
The next steps to take are shown by the greyed out items in the tool bar. When the table has been loaded, the 'Select' function becomes active, and you can select one of the attributes as consequent of the rules. In this case we select 'Fishing'.
Once this is done the function "Reduce All' becomes active. You can now save a file with the result. (A .csv extension is appended automatically to your file name.) The reduction for this example looks like this in OO Calc.
The result of the reduction is a list of shortest rules which predict the values of the selected consequent. These reduced rules are partially ordered through entailment. When the reduction has been completed, the 'Choose' function becomes active. You can now choose one of the values of the attribute, in this case 'good' or 'bad'. When this has been done 'Abduce' becomes active.
'Abduce' produces two files, a directed graph of all the entailments in .dot format and a legend in .csv format. (The extensions are again appended automatically.) The .dot files can be viewed with dotty or another GraphViz viewer, as shown next. (The screenshot has been zoomed out and resized.)
This shows all entailments for the rules that imply good fishing. The first number is the node number, as shown in the legend below, the second is the number of equivalents, here one for all cases. If x implies y and y also implies x then the rules are equivalent (or equal for short).
Note that the direction of graphs is from least general to most general, and from top to bottom. The menu icons also indicate this direction.
You might be especially interested in the most general rules, which do not imply any others, or the least general, which are not implied by any others. You can extract those through the 'M/L General' menu item in the 'Abduce' menu. Which of the two is determined by the 'Most General' checkbox (checked by default).
The most general for good fishing are
and the least general
Note that rule number 8, the unconnected one in the graph, appears twice. It is both most and least general.
You can also get the relations between the rules for Fishing : bad, of course, by choosing that particular attribute value.
The choice of 'Fishing' as the consequent attribute is natural for a human but, purely technical, any attribute can be selected as the consequent.
The 'New Rules' item in the 'Rule' menu lets you reset the selection and then select another attribute, for example, 'Temperature'. The 'Rule' menu has a check box 'Check Rule', which is active by default, and its purpose is demonstrated here
It appears that for this consequent attribute there are ambiguous rules, rules which have the same antecedent but a different consequent. These cancel each other, and if all are ambiguous, there will be no reductions for that particular value. For temperature, however, there are several remaining shortest rules for each of the three predicates.
You can do an abduction for any of the three attribute values 'hot', 'mild' and 'cool'. It appears there are no entailments for 'mild', and therefore there is no graph.
There may still be equivalencies, however, and this appears to be the case here, as shown by applying "M/L General' in the 'Abduce' menu.
The 5 rules from the reduced table can be grouped into 3 clusters of equivalent rules.
You can load a new data source with 'New Table' from the 'File' menu. This menu also has a 'Table Check' box, which is off by default. It lets you check the data source for duplicate rows, which don't change the results, but do slow things down. If checked, duplicates are automatically removed from the data (but not from your source file). You can also save any duplicates, with their frequencies of occurrence, in a separate file.
The above example runs in a few seconds, but depending on the number of rows and in particular the number of values of the consequent attribute, calculations can take minutes and even more time. You cannot skip any of the described steps if you just want the abduction for one attribute value only, but you can cancel saving results. Because Haskell, the language which Emping was written in, only evaluates results when needed, calculations for other values, which are not needed, are automatically skipped. So, not saving what you don't want will generally speed up things. The messaging to the user is, however, not as sophisticated and, before the appropriate tool item becomes active again, it might seem that the program hangs.
Emping 0.6 is written in Haskell with GHC 6.8.2 and Gtk2Hs 0.9.12.1 on Fedora Core 8 Linux. The compiled program might just run on any Gnome Linux system. It may be compiled from the source modules (unpacked from a tar.gz file) on the command line with ghc --make -O -Main.hs -o yourfilename. Emping is a standalone application and not a library, but it could also be installed from a Cabal package, which is the Haskell standard.
To handle the comma separated files you need the Open Office Calc spreadsheet or something else that can read the .csv format. Anything in Emping is just a text string, but OO Calc writes labels between quotes, and numbers as they are. To see the .dot graphs you need dotty or another GraphViz viewer, for example, ZRGViewer.
Emping Version 0.6 has been completely reviewed and optimized in many ways. The most important difference is the implementation of rule antecedents as Haskell sets instead of lists. This did not have much effect on the reductions, but did speed up the abductions considerably.
Built on experience with previous versions, Emping 0.6 has been redesigned to be a usable tool. Its purpose is to help discover and analyse predictive relationships, in real nominal data, in a practical and timely manner. However, it is still experimental and relatively untested, as its version number of less than one indicates.
In newer versions of Gtk2Hs the function comboBoxGetActive does not return a Maybe Int, but an Int. This results in a build and compile error from line 214 in Main.hs. The function is first used on line 209 and again at line 266 in module Main. Changing the source code to handle an Int instead of a Maybe Int should be trivial, but unfortunately I only have access to GHC 6.8 and Gtk2Hs 0.9.12 , so I cannot test this fix, or whether the API has changed for other Gtk2Hs functions too. The modules which do not use the GUI appear to build with ghc-6.10.3 , as well as 6.8.2 .
The program hangs in two corner cases. The first is if the consequent attribute has only one value (in this case reduction makes no sense). The second is when trying to do an abduction on a value for which all rules are ambiguous. Though the notification is correct, it is not possible to continue with another attribute value. A workaround is to check the reduction spreadsheet for any missing attribute values first (e.g. with the AutoFilter tool).
Emping is open source and licensed under the General Public Licence (GPL). The Haskell libraries it uses are licensed by their respective authors as listed in the GHC documentation.
© 2006-2009 Hans van Thiel
Last updated June 25, 2009