version 3.22

Interpret SOP/DNF expressions: compute, simplify, expand

Description

These functions interpret an expression written in sum of products (SOP) or in canonical disjunctive normal form (DNF), for both crisp and multivalue QCA. The function compute() calculates set membership scores based on a SOP expression applied to a calibrated data set.

A function similar to compute() was initially written by Lewandowski (2015) but the actual code in these functions has been completely re-written and expanded with more extensive functionality (see details and examples below).

The function simplify() transforms a SOP expression into a simpler equivalent through a process of Boolean minimization.

Function expand() performs a Quine expansion to the complete DNF, or a partial expansion to a SOP expression with equally complex terms.

Usage

compute(expression = "", data, separate = FALSE)
simplify(expression = "", snames = "", noflevels, ...)
expand(expression = "", snames = "", noflevels, partial = FALSE, implicants = FALSE, ...)

Arguments

expression String: a QCA expression written in sum of products form.
snames A string containing the sets' names, separated by commas.
noflevels Numerical vector containing the number of levels for each set.
data A dataset with binary cs, mv and fs data.
separate Logical, perform computations on individual, separate paths.
partial Logical, perform a partial Quine expansion.
implicants Logical, return an expanded matrix in the implicants space.
... Other arguments, mainly for backwards compatibility.

Details

An expression written in SOP - sum of products, is a "union of intersections", for example A*B + B*~C. The DNF - disjunctive normal form is also a sum of products, with the restriction that each product has to contain all literals. The equivalent expression is: A*B*~C + A*B*C + ~A*B*~C

The same expression can be written in multivalue notation: A{1}*B{1} + B{1}*C{0}.

Expressions can contain multiple values for the same condition, separated by a comma. If B was a multivalue causal condition, an expression could be: A{1} + B{1,2}*C{0}.

Whether crisp or multivalue, expressions are treated as Boolean. In this last example, all values in B equal to either 1 or 2 will be converted to 1, and the rest of the (multi)values will be converted to 0.

Negating a multivalue condition requires a known number of levels (see examples below). Improvements from version 2.5 allow for intersections between multiple levels of the same condition. For a causal condition with 3 levels (0, 1 and 2) the following expression ~A{0,2}*A{1,2} is equivalent with A{1}, while A{0}*A{1} results in the empty set.

The number of levels, as well as the set names can be automatically detected from a dataset via the argument data. When specified, arguments snames and noflevels have precedence over data.

The product operator * should always be used, but it can be omitted when the data is multivalue (where product terms are separated by curly brackets), and/or when the set names are single letters (for example AD + B~C), and/or when the set names are provided via the argument snames.

When expressions are simplified, their simplest equivalent can result in the empty set, if the conditions cancel each other out.

Value

For the function compute(), a vector of set membership values. For function simplify(), a character expression.

References

Ragin, C.C. (1987) The Comparative Method: Moving beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press.

Lewandowski, J. (2015) QCAtools: Helper functions for QCA in R. R package version 0.1

Examples

# for compute() compute("DEV*~IND + URB*STB", data = LF)
[1] 0.27 0.89 0.91 0.16 0.58 0.19 0.31 0.09 0.13 0.72 0.34 0.99 0.02 0.01 0.03 [16] 0.20 0.33 0.98
compute("DEV*~IND + URB*STB", data = LF, separate = TRUE)
DEV*~IND URB*STB 1 0.27 0.12 2 0.00 0.89 3 0.10 0.91 4 0.16 0.07 5 0.58 0.03 6 0.19 0.03 7 0.04 0.31 8 0.04 0.09 9 0.07 0.13 10 0.72 0.05 11 0.34 0.10 12 0.06 0.99 13 0.02 0.00 14 0.01 0.01 15 0.01 0.03 16 0.03 0.20 17 0.33 0.13 18 0.00 0.98
# for simplify() simplify("(A + B)(A + ~B)")
S1: A
# to force a certain order of the set names simplify("(URB + LIT*~DEV)(~LIT + ~DEV)", snames = "DEV, URB, LIT")
S1: ~DEV*LIT + URB*~LIT
# multilevel conditions can also be specified (and negated) simplify("(A{1} + ~B{0})(B{1} + C{0})", snames = "A, B, C", noflevels = c(2, 3, 2))
S1: B{1} + A{1}C{0} + B{2}C{0}
# Ragin's (1987) book presents the equation E = SG + LW as the result # of the Boolean minimization for the ethnic political mobilization. # intersecting the reactive ethnicity perspective (R = ~L~W) # with the equation E (page 144) simplify("~L~W(SG + LW)", snames = "S, L, W, G")
S1: S~L~WG
# resources for size and wealth (C = SW) with E (page 145) simplify("SW(SG + LW)", snames = "S, L, W, G")
S1: SLW + SWG
# and factorized factorize(simplify("SW(SG + LW)", snames = "S, L, W, G"))
F1: SW(G + L)
# developmental perspective (D = L~G) and E (page 146) simplify("L~G(SG + LW)", snames = "S, L, W, G")
S1: LW~G
# subnations that exhibit ethnic political mobilization (E) but were # not hypothesized by any of the three theories (page 147) # ~H = ~(~L~W + SW + L~G) = GL~S + GL~W + G~SW + ~L~SW simplify("(GL~S + GL~W + G~SW + ~L~SW)(SG + LW)", snames = "S, L, W, G")
S1: ~SLWG + SL~WG

Author

Adrian Dusa