Web algorithms
This page describes threads of research performed by myself and my
collaborators. The goal is to put our work into a meaningful
sequence, instead of just a list of papers; the goal is not to give an
overview of the field. At some later time, this page may turn into a
survey, but at this point the many wonderful contributions of others
are not represented here.
Iterative methods and linear algebra
This section covers a set of largely-unrelated papers that study
different iterative methods.
- We have looked into treating pagerank as a system of linear
equations rather than an eigenvalue computation. Results are here.
- We have developed a model of the decay of a webpage that can
be computed efficiently based only on probes of the out-neighbors of a
page. Results are in this
paper.
- We study an extendsion of the standard PageRank random walk model
to introduce a notion of back buttons. Because the back button
operates as a stack with potentially infinite history, the dynamics of
the process change considerably. Results are here.
- We also looked at iterative methods for propagating trust through a
social network. The results are in this paper.
- The CLEVER search system uses
iteration to compute web search results.
- We've worked recently on modifying the LSI formulation to find a
low-dimensional subspace optimized for a particular query
distribution. Read more in this
paper.
Combinatorial extraction algorithms
This paper considers
extractions of templates from web pages using combinatorial
techniques. It is focused primarily on measurements of the prevalence
and evolution of web templating behavior.
In some followup work, we
considered algorithms for segmenting websites in order to determine
the most topically cohesive regions of the site.
Algorithms for dense subgraph and community finding
See this description of dense
subgraph extraction for details.
Back to Andrew Tomkins
homepage.