Well, I’ve been back from conference for several weeks, but only just got time to come back to post. The conference was nice, though I wasn’t able to see all the presentations and papers I would have liked – par for the course, I guess. The conference organizers do a great job, all things considered, but somehow the topics I want to listen to get scheduled at about the same time, meaning I only get about half the utility out of these conferences that I’d like. I think I may have to shell out for the audio recordings they make of them.

Anyhow…

To follow up an earlier post, I thought people might like to see a bit more about how one can use network data (social networks, proximity data, etc.) to model social influence in groups. The basic question these models address is pretty straightforward: Given a set of actors and data on their ideas, attitudes, and opinions, do people that share certain types of ties tend to have more similar attributes/preferences/attitudes, etc.? As it turns out, there are actually several different ways we can go about addressing this basic question.

For example, we can do it using simulation-based methods (I’ve seen this more often in epidemiology and marketing research), where we take a network with certain structural characteristics, and plot how long it takes for ideas, behaviors, etc. to spread to certain percentages of nodes in the network, conditional on certain assumptions about the parameters of influence/infection and node “recovery” (see this previous post on SI/SIR/SIS-type influence models for a rough overview of some of these parameters.) Like most simulation-based methods, this is nice because it allows us to see the implications of different sorts of assumptions about the way influence happens. Thus, we can vary our assumptions, change the model, and see if we observe network dynamics that occur in real-world networks: for social influence models, this means we want our models to be able to predict things like convergence (and non-convergence) of opinions over time. Simulation-based techniques like this can be quite powerful, but make strong assumptions about the way that influence must occur, besides being hard to test for most typical hard-working social scientists. (Although that is slowly changing.)

Another way of modeling social influence is through statistical models that take into account the interdependence between units, and estimate the extent to which that interdependence results in more similar (or dissimilar) attitudes or opinions. Again, there are several ways of doing this, but one of the more widely used (at least, to my knowledge) models is the *network effects autoregression model*. I’ve sometimes seen this model and related models under slightly different names. For example, geostatistics researchers have names like “spatial regression model”, “spatial error model”, etc., but it comes down to the same thing – predicting values at a point based on the values of that points’ neighbors. Geostatistics people define neighbors using geographical contiguity (for example, neighboring counties), but there’s nothing stopping us from using alternative definitions of neighbors – for example, based on social networks.

The basic network effects model is given by the equation

,

where y is a vector of individual outcomes (for example, organizational attitudes, turnover, etc.), W is the n x n weight matrix which defines the way in which individual scores are believed to be interdependent, and is the autoregressive parameter which defines the extent of social influence, given the matrix W. Similarly, we can define the *mixed regressive-autoregressive model*

,

where and x are typical regression parameters and predictors, respectively. (Note that in this model, if there is no social influence effect, e.g., if , then the model reduces to a standard regression.) A third type of autoregression involves modeling social influence effects through interdependence in the error term associated with the regression of the exogenous predictors X on the dependent variable; this is sometimes referred to as the *network disturbances* model, because it provides for an estimate of the interdependence in individual deviations from their predicted score, based on other predictors:

.

We can also mix these models – for example, by creating a mixed regressive-autoregressive model where interdependencies exist between both the absolute value of the y-scores (as in the network effects model) AND in the deviations from the expected y-score, thus:

, where are different weight matrices.

The exact interpretation of the model depends on the way in which you define W; given a non-directional binary social network, for example, you may define the weight matrix to be the adjacency matrix, where the entries if i and j share a tie, otherwise . In a network effects model using this definition, is the extent to which an individual y score is related to the sum of the y scores of that individual’s neighbors in the social network. The most common W matrix I am familiar with is a *row-standardized* matrix, where the standard adjacency matrix/sociomatrix (e.g., the matrix that defines the connections in your social network, usually binary, but possibly valued) is altered so that each row sums to 1. Using this definition of W, we are essentially estimating the relationship between an individual’s y score and the mean of their friends’ scores; this represents the assumption that influence upon an actor is evenly divided among an individual’s partners in the network. Obviously, the specific definition of W used makes a big difference in the way that the autocorrelation parameter is interpreted.

The nice thing about these models is that they are flexible – for example, allowing for many different ways of defining actor interdependencies, like direct ties, shared social positions, and pretty easily estimated using maximum likelihood procedures in programs like R, or using MCMC estimation in software like WinBUGS\OpenBUGS. However, some of the statistical properties of these estimators is still a little uncertain; one of the research projects I’m currently finishing up explores what these properties are, and how they are likely to impact research into organizational networks. (Time permitting, I’ll post a draft of the paper – dissertation comes first, though.)

As with any regression-based/correlational method, the limitations are pretty obvious, of course – if you don’t have longitudinal data, you can’t really say for certain that you are dealing with a true “social influence” effect, or whether the apparent interdependencies in responses aren’t being driven by some third shared variable. Still, it’s a useful model for exploring possible influence effects in a network.

Finally, it’s worth pointing out (again) that this is only one way of estimating potential social influence effects in social networks; a variety of other procedures are available, that provide slightly different ways of conceptualizing and estimating social influence effects, using both cross-sectional and longitudinal data. For example, if you wished to study social influence effects between husbands’ and wives’ purchasing decisions, and had groups of independent husband-wife dyads, you could use something like HLM pretty easily. If you had longitudinal data on both networks and actor attributes/attitudes, you could use something like Tom Snijder’s actor-oriented random graph models to separate out social influence and social selection effects across time. There are also the simulation-based techniques described at the top of the post, and even agent-based modeling for people who want to get really fancy in terms of modeling network dynamics.