version 3.22
Functions to find a list of implicants that satisfy some restrictions (see details), or to find the corresponding row numbers in the implicant matrix, for all subsets, or supersets, of a (prime) implicant or an initial causal configuration.
superSubset(data, outcome = "", conditions = "", relation = "necessity", incl.cut = 1, cov.cut = 0, ron.cut = 0, pri.cut = 0, depth = NULL, use.letters = FALSE, use.labels = FALSE, add = NULL, ...)findSubsets(input, noflevels = NULL, stop = NULL, ...)findSupersets(input, noflevels = NULL, ...)
data |
A data frame with crisp (binary and multi-value) or fuzzy causal conditions | |||
outcome |
The name of the outcome. | |||
conditions |
A string containing the conditions' names, separated by commas. | |||
relation |
The set relation to outcome , either "necessity" ,
"sufficiency" , "necsuf" or "sufnec" .
Partial words like "suf" are accepted. |
|||
incl.cut |
The minimal inclusion score of the set relation. | |||
cov.cut |
The minimal coverage score of the set relation. | |||
ron.cut |
The minimal score for the RoN - relevance of necessity. |
|||
pri.cut |
The minimal score for the PRI - proportional reduction in inconsistency. |
|||
use.letters |
Logical, use simple letters instead of original conditions' names. | |||
use.labels |
Logical, use category labels if present. | |||
noflevels |
A vector containing the number of levels for each causal condition plus 1 (all subsets are located in the higher dimension, implicant matrix) | |||
input |
A vector of row numbers where the (prime) implicants are located, or a matrix of configurations (only for supersets). | |||
stop |
The maximum line number (subset) to stop at, and return | |||
depth |
Integer, an upper number of causal conditions to form expressions with. | |||
add |
A function, or a list containing functions, to add more parameters of fit. | |||
... |
Other arguments, mainly for backward compatibility. |
The function superSubset()
finds a list of implicants
that satisfy some restrictions referring to the inclusion and coverage with respect to the outcome,
under given assumptions of necessity and/or sufficiency.
Ragin (2000) posits that under the necessity relation, instances of the outcome constitute a subset of the instances of the cause(s). Conversely, under the sufficiency relation, instances of the outcome constitute a superset of the instances of the cause(s).
When relation = "necessity"
the function finds all
implicants which are supersets of the outcome, then eliminates the redundant ones and
returns the surviving (minimal) supersets, provided they pass the inclusion and coverage
thresholds. If none of the surviving supersets pass these thresholds, the function will
find disjunctions of causal conditions, instead of conjunctions.
When relation = "sufficiency"
it finds all implicants
which are subsets of the outcome, and similarly eliminates the redundant ones and return
the surviving (minimal) subsets.
When relation = "necsuf"
, the relation is interpreted
as necessity, and cov.cut
is automatically set equal to the inclusion
cutoff incl.cut
. The same automatic equality is made for
relation = "sufnec"
, when relation is interpreted as sufficiency.
The argument outcome
specifies the name of the outcome, and if multi-value
the argument can also specify the level to explain, using curly brackets notation.
Outcomes can be negated using a tilde operator ~X
. The logical argument
neg.out
is now deprecated, but still backwards compatible. Replaced by the tilde
in front of the outcome name, it controls whether outcome
is to be
explained or its negation. If outcome
is from a multivalent variable, it
has the effect that the disjunction of all remaining values becomes the new outcome to
be explained. neg.out = TRUE
and a tilde ~
in the outcome name don't
cancel each other out, either one (or even both) signaling if the outcome
should be negated.
If the argument conditions
is not specified, all other columns in
data
are used.
Along with the standard measures of inclusion and coverage, the function also returns
PRI
for sufficiency and RoN
(relevance of necessity, see
Schneider & Wagemann, 2012) for the necessity relation.
A subset is a conjunction (an intersection) of causal conditions, with respect to a larger (super)set, which is another (but more parsimonious) conjunction of causal conditions.
All subsets of a given set can be found in the so called implicant matrix, which is a $n^k$ space, understood as all possible combinations of values in any combination of bases $n$, each causal condition having three or more levels (Dusa, 2007, 2010).
For every two levels of a binary causal conditions (values 0 and 1), there are three levels in the implicants matrix:
A prime implicant is a superset of an initial combination of causal conditions, and the reverse is also true: the initial combination is a subset of a prime implicant.
Any normal implicant (not prime) is a subset of a prime implicant, and in the same time a superset of some initial causal combinations.
Functions findSubsets()
and
findSupersets()
find:
- all possible such subsets for a given (prime) implicant, or
- all possible supersets of an implicant or initial causal combination
in the implicant matrix.
The argument depth
can be used to impose an upper number of causal
conditions to form expressions with, it is the complexity level where the search is
stopped. Depth is set to a maximum by default, and the algorithm will always stop at
the maximum complexity level where no new, non-redundant prime implicants are found.
Reducing the depth below that maximum will also reduce computation time.
For examples on how to add more parameters of fit via argument add
, see
the function pof()
.
The result of the superSubset()
function is an object of class "ss", which is a list
with the following components:
incl.cov | A data frame with the parameters of fit. | |||
coms | A data frame with the (m)embersip (s)cores of the resulting (co)mbinations. |
For findSubsets()
and findSupersets()
, a vector with the row numbers corresponding
to all possible subsets, or supersets, of a (prime) implicant.
Cebotari, V.; Vink, M.P. (2013) A Configurational Analysis of Ethnic Protest in Europe. International Journal of Comparative Sociology vol.54, no.4, pp.298-324, DOI: 10.1177/0020715213508567.
Cebotari, Victor; Vink, Maarten Peter (2015) Replication Data for: A configurational analysis of ethnic protest in Europe, Harvard Dataverse, V2, DOI: http://doi.org/10.7910/DVN/PT2IB9
Dusa, Adrian (2007) Enhancing Quine-McCluskey. COMPASSS: Working Paper 2007-49. URL: http://www.compasss.org/wpseries/Dusa2007b.pdf.
Dusa, Adrian (2010) A Mathematical Approach to the Boolean Minimization Problem. Quality & Quantity vol.44, no.1, pp.99-113, DOI: http://doi.org/10.1007/s11135-008-9183-x.
Lipset, S. M. (1959) Some Social Requisites of Democracy: Economic Development and Political Legitimacy, American Political Science Review vol.53, pp.69-105.
Schneider, Carsten Q.; Wagemann, Claudius (2012) Set-Theoretic Methods for the Social Sciences: A Guide to Qualitative Comparative Analysis (QCA). Cambridge: Cambridge University Press.
# Lipset binary crisp sets ssLC <- superSubset(LC, "SURV") require(venn) x = list("SURV" = which(LC$SURV == 1), "STB" = which(ssLC$coms[, 1] == 1), "LIT" = which(ssLC$coms[, 2] == 1)) venn(x, cexil = 0.7)
# Lipset multi-value sets superSubset(LM, "SURV")inclN RoN covN -------------------------------------------- 1 LIT{1} 1.000 0.500 0.615 2 STB{1} 1.000 0.700 0.727 3 LIT{1}*STB{1} 1.000 0.900 0.889 4 DEV{1}+IND{1} 1.000 0.800 0.800 5 URB{0}+IND{1} 1.000 0.000 0.444 6 DEV{2}+URB{1}+IND{0} 1.000 0.100 0.471 --------------------------------------------# Cebotari & Vink (2013) fuzzy data # all necessary combinations with at least 0.9 inclusion and 0.6 coverage cut-off ssCVF <- superSubset(CVF, outcome = "PROTEST", incl.cut = 0.90, cov.cut = 0.6) ssCVFinclN RoN covN ---------------------------------------------------------- 1 GEOCON 0.904 0.492 0.624 2 DEMOC+ETHFRACT+~GEOCON 0.930 0.470 0.626 3 DEMOC+~ETHFRACT+POLDIS 0.918 0.506 0.637 4 DEMOC+ETHFRACT+POLDIS 0.906 0.502 0.630 5 DEMOC+~ETHFRACT+~NATPRIDE 0.905 0.527 0.641 6 DEMOC+ETHFRACT+~NATPRIDE 0.935 0.530 0.656 7 DEMOC+~GEOCON+POLDIS 0.920 0.539 0.654 8 DEMOC+~GEOCON+~NATPRIDE 0.908 0.584 0.671 9 DEMOC+POLDIS+~NATPRIDE 0.916 0.596 0.682 10 ~ETHFRACT+POLDIS+~NATPRIDE 0.911 0.554 0.657 11 ~DEMOC+ETHFRACT+POLDIS+~NATPRIDE 0.913 0.532 0.647 12 ETHFRACT+~GEOCON+POLDIS+~NATPRIDE 0.911 0.613 0.688 ----------------------------------------------------------# the membership scores for the first minimal combination (GEOCON) ssCVF$coms$GEOCON[1] 0.95 0.35 0.35 0.78 0.35 0.78 0.78 0.78 0.78 0.05 0.78 0.35 0.95 0.95 0.35 0.95 [17] 0.78 0.35 0.95 0.35 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95# same restrictions, for the negation of the outcome superSubset(CVF, outcome = "~PROTEST", incl.cut = 0.90, cov.cut = 0.6)inclN RoN covN ----------------------------------------- 1 NATPRIDE 0.932 0.622 0.693 2 ~DEMOC+~ETHFRACT 0.951 0.548 0.663 3 ~ETHFRACT+~POLDIS 0.927 0.443 0.603 -----------------------------------------# to find supersets or supersets, a hypothetical example using # three binary causal conditions, having two levels each: 0 and 1 noflevels <- c(2, 2, 2) # second row of the implicant matrix: 0 0 1 # which in the "normal" base is: - - 0 # the prime implicant being: ~C (sub <- findSubsets(input = 2, noflevels + 1))[1] 5 8 11 14 17 20 23 26getRow(sub, noflevels + 1)[,1] [,2] [,3] [1,] 0 1 1 [2,] 0 2 1 [3,] 1 0 1 [4,] 1 1 1 [5,] 1 2 1 [6,] 2 0 1 [7,] 2 1 1 [8,] 2 2 1# implicant matrix normal values # a b c | a b c # 5 0 1 1 | 5 - 0 0 ~b~c # 8 0 2 1 | 8 - 1 0 b~c # 11 1 0 1 | 11 0 - 0 ~a~c # 14 1 1 1 | 14 0 0 0 ~a~b~c # 17 1 2 1 | 17 0 1 0 ~ab~c # 20 2 0 1 | 20 1 - 0 a~c # 23 2 1 1 | 23 1 0 0 a~b~c # 26 2 2 1 | 26 1 1 0 ab~c # stopping at maximum row number 20 findSubsets(input = 2, noflevels + 1, stop = 20)[1] 5 8 11 14 17 20# for supersets findSupersets(input = 14, noflevels + 1)[1] 2 4 5 10 11 13 14findSupersets(input = 17, noflevels + 1)[1] 2 7 8 10 11 16 17# input as a matrix (im <- getRow(c(14, 17), noflevels + 1))[,1] [,2] [,3] [1,] 1 1 1 [2,] 1 2 1# implicant matrix normal values # 14 1 1 1 | 14 0 0 0 ~a~b~c # 17 1 2 1 | 17 0 1 0 ~ab~c sup <- findSupersets(input = im, noflevels + 1) sup[1] 2 4 5 7 8 10 11 13 14 16 17getRow(sup, noflevels + 1)[,1] [,2] [,3] [1,] 0 0 1 [2,] 0 1 0 [3,] 0 1 1 [4,] 0 2 0 [5,] 0 2 1 [6,] 1 0 0 [7,] 1 0 1 [8,] 1 1 0 [9,] 1 1 1 [10,] 1 2 0 [11,] 1 2 1# implicant matrix normal values # a b c | a b c # 2 0 0 1 | 2 - - 0 ~c # 4 0 1 0 | 4 - 0 - ~b # 5 0 1 1 | 5 - 0 0 ~b~c # 7 0 2 0 | 7 - 1 - b # 8 0 2 1 | 8 - 1 0 b~c # 10 1 0 0 | 10 0 - - ~a # 11 1 0 1 | 11 0 - 0 ~a~c # 13 1 1 0 | 13 0 0 - ~a~b # 14 1 1 1 | 14 0 0 0 ~a~b~c # 16 1 2 0 | 16 0 1 - ~ab # 17 1 2 1 | 17 0 1 0 ~ab~c