| Home | All papers | Authors | Tags | Topics |
A survey paper from PODS 2000 covers some early work in analysis of the web graph.
In 1999, researchers from Altavista, Compaq and IBM collaborated to perform a large-scale measurement study of the world wide web graph. The resulting paper uncovered a "bow tie" structure in the web, and won the best paper award in WWW9. See a nice summary in Nature on the topic.
There's a long and interesting line of work on measuring the size of corpora like the web, into which we have only limited visibility. We have a recent contribution along these lines, following on to this interesting body of work.
A follow-on paper in VLDB 2001 extended the bow tie structure to communities within the web, showing that any "thematically-unified community" contained similar structure to the web at large, and showing how the bow ties of many smaller communities connected together to form a navigational backbone of the web.
A number of researchers in the physics and theoretical computer
science community and elsewhere have advanced models to explain some
of these observations. We did some early work in the area, developing
a model based on copying, which is strongly related to a model called
"preferential attachment" introduced by Barabasi and Albert. The
model and other general aspects of the area are discussed in this paper from COCOON99. A more
theoretical treatment is given in this paper from FOCS 2000
. There have since been a number of fantastic results on these
and related models that in many cases extend the results in our paper.
Dense subgraphs
A number of algorithms have been proposed for finding "communities" in
the web graph using approaches based on dense subgraphs. The first
such paper covered a methodology known as trawling for
enumerating all communities based on a certain type of signature.
This methodology was first described in WWW8 in this paper, and then extended
to a general framework creating knowledge bases based on particular
types of graphical signatures in a follow-on paper in VLDB 99.
A similar technique was employed to find communities in blogs, using a signature based on a greedily-constructed dense subgraph rather than a bipartite clique. Results are given in this paper.
There are fewer algorithms known for finding large dense subgraphs in graphs. One such algorithm based on iterated shingling of subgraphs is given here.
All of those approaches are offline in the sense that a large graph is processed over a period of days or weeks, and a large number of communities are extracted. There are also online approaches that try to approximate a particular community based on a user query. We have worked on this problem in social networks using a technique called connection subgraphs.
Similarly, early work on the CLEVER project covered finding dense bipartite graphs based on a user query; some results are given in this paper. A more complete summary of the algorithms is here.
Back to Andrew Tomkins homepage.