Back to homepage

Communities, Blogs, and Social Networks

This page describes threads of research performed by myself and my collaborators. The goal is to put our work into a meaningful sequence, instead of just a list of papers; the goal is not to give an overview of the field. At some later time, this page may turn into a survey, but at this point the many wonderful contributions of others are not represented here.

Social network studies

Several colleagues and I have studied the structure of the social network corresponding to an online blogging community, drawn from the LiveJournal blog hosting organization. A high level paper appeared in the Communications of the ACM detailing the connections between the social network and attributes such as age, gender, interests, and geography of the members.

Since then, we've been working on models to capture the specific relationship between geography and friendship. There's a paper in PNAS describing the model, and a paper in ESA with a more detailed presentation of theorems.

Identity

We have done some work studying bulletin board postings to understand how effective privacy mechanisms are for keeping different aliases of the same person distinct. Our findings indicate that there are signficant privacy concerns based on content analysis of posts by different aliases of the same person. See the paper on anti-aliasing for details.

Trust and Reputation

We worked on propagation of trust and distrust through a social network. The introduction of distrust presents as an added challenge that standard iterative techniques may result in complex entries in the principal eigenvector, as the Perron-Forbenius theorem does not hold when distrust is modeled as negative trust.

Visualizing communities

Within a social network graph, we often wish to understand the "connection" between two individuals. All too often, this connection is taken to be simply an edge between the individuals, or the shortest path between them. In fact, two people who are "nearby" in a social network are typically connected by a complex web of interrelationships, and the problem of finding this web is more accurate cast as a subgraph discovery problem. This paper gives such a formulation, with a set of algorithms to address it.

Blogs

In late 2002 we wished to understand the growth of blogspace, the set of all blogs and their relationships. We introduced a new combinatorial object called a time graph to perform this study, and showed how to track the evolution of both macroscopic properties of blogspace (like its connectivity) and microscopic properties (like the burstiness of a particular community) over time. The results are given here.

Subsequently, we considered information flow through these blogs, showing some simple techniques for tracking and factoring "memes" flowing from one blog to another, and gave an approach for learning the pathways through blogspace that are most commonly taken: the blogs which most effectively introduce and disseminate ideas. The results are here.

In a follow-on paper, we moved this analysis from the influence of blogs on other blogs to the influence (or at least predictiveness) of blogs on the outside world. We showed that spikes in blog postings about a particular book may predict spikes in sales of the same book. Results are here.

More recently, we looked at the dynamics of some global graph structures for the social network graphs of Flickr and Yahoo! 360. We showed that a significant number of users over time exist in small but non-trivial components of the graph, and that these components are well modeled as stars rather than more well-connected stuctures. The results are shown here.

Back to Andrew Tomkins homepage.