| Home | All papers | Authors | Tags | Topics |
I just updated this site, and haven't had a chance yet to move this page and the corresponding non-web-research page over to the new format, so the links to paper content are broken. Please take a look at the full list to find any particular paper; sorry for the inconvenience.
This article from Scientific American gives an overview of search and related techniques that make use of link analysis over hyperlinked corpora.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Hypersearching the web. Scientific American, June 1999. ( html )
This is a brief summary of some results on web structure, covering
HITS, the bow tie model, and the fractal structure of the web.
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins:
The Web and Social Networks. IEEE Computer 35(11): 32-36 (2002) (pdf)
This is a slightly more detailed view of applications that make use of the web's link structure; it's getting a little dated.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the web's link structure. IEEE Computer, August 1999. ( pdf )
This paper in CACM
2004 gives an analysis of the bloggers on the LiveJournal
blog-hosting site: their interests, locations, and demographics, from
the perspective of the social network.
R. Kumar, P. Raghavan, S. Rajagopalan, and
A. Tomkins. Social networks: From the web to knowledge
management. Book chapter in Web Intelligence, editors: Ning
Zhong, Jiming Liu, Yiyu Yao, by Springer-Verlag, pages 367--379,
January 2003. (pdf)
These slides
are from part of an AMS tutorial. They cover some basic introductory
remaks about power laws and related heavy-tailed distributions, and
discuss a set of generative models for these distributions. Slides
from an AMS tutorial on power laws and generative models.
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Recommendation systems: a probabilistic analysis. Journal of Computer and System Sciences (JCSS), 63(1):42--61, August, 2001. appeared in Proc. 39th Symposium on Foundations of Computer Science, 1998. ( pdf )
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. On targeting markov segments. In Proceedings of the ACM Symposium on Theory of Computing, 1999. ( pdf )
A. Broder, M. Fontura, V. Josifovski, R. Kumar, R. Motwani,
S. Nabar, R. Panigrahy, A. Tomkins and Y. Xu. Estimating Corpus Size
via Queries. In Conference on Information and Knowledge Management (CIKM), 2006. (pdf)
R. Fagin, R. Guha, R. Kumar, J. Novak, D. Sivakumar, and
A. Tomkins. Multi-Structural Databases. In Proceedings of the
24th ACM Symposium on Principles of Database Systems, 2005. ( pdf )
R. Fagin, P. Kolaitis, R. Kumar, J. Novak, D. Sivakumar, and
A. Tomkins. Efficient Implementation of Larce-Scale Multi-Structural
Databases. In IEEE International Conference
on Very Large Databases (VLDB), 2005. ( pdf )
Soumen Chakrabarti, Byron E. Dom, David Gibson, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Topic distillation and spectral filtering. Artificial Intelligence Review, 13:409--435, 1999. ( pdf )
S. Chakrabarti, B. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Spectral filtering for resource discovery. In SIGIR 98 Workshop on Hypertext Analysis, 1998. ( postscript )
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large scale knowledge bases from the web. In IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland, September 1999. ( pdf )
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 31:1481--1493, 1999. Conference version at Eighth Internation World Wide Web Conference, 1999. ( html )
Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web (winner best paper award). In Proceedings of the Ninth International World Wide Web Conference, 2000. ( html )
A. Arasu, J. Novak, A. Tomkins, and J. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms, 2002. ( pdf )
Stephen Dill, Ravi Kumar, Kevin S. Mccurley, Sridhar Rajagopalan, D. Sivakumar, and Andrew Tomkins. Self-similarity in the web. ACM Transactions on Internet Technology (TOIT), 2(3):205--223, 2002. Appeared in IEEE International conference on Very Large Databases (VLDB) 2001, Rome, Italy. ( pdf )
J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing, number 1627 in LNCS. Springer-Verlag, July 1999. ( pdf )
Z. BarYossef, A. Broder, R. Kumar and A. Tomkins. Sic Transit Gloria Telae: Towards an Understanding of theWeb's Decay. In Proceedings of the Thirteenth International World Wide Web Conference, New York, New York, 2004. ( html , pdf )
S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The web as a graph. In Proceedings of the 19th ACM Symposium on Principles of Database Systems, pages 1--10, 2000. ( pdf )
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In Proc. 41st Symposium on Foundations of Computer Science, 2000. ( pdf )
R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins. Random walks with ``back buttons''. In Proceedings of the ACM Symposium on Theory of Computing, 2000. ( pdf )
R. Kumar, J. Novak and A. Tomkins. Structure and Evolution of
Online Social Networks. In Proceedings of the Twelfth ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD), poster
track, 2006. (pdf)
J. Novak, P. Raghavan and A. Tomkins. Anti-Aliasing on the Web. In
Proceedings of the Thirteenth International World Wide Web
Conference, New York, New York, 2004. ( pdf )
R. Guha, R. Kumar, P. Raghavan and A. Tomkins. Propagation of Trust and Distrust. In Proceedings of the Thirteenth International World Wide Web Conference, New York, New York, 2004. ( html , pdf )
C. Faloutsos, K. McCurley and A. Tomkins. Fast Discovery of Connection Subgraphs. In Tenth ACM SIGKDD Conference, Seattle, WA, 2004. ( pdf )
R. Kumar, D. Liben-Nowell, J. Novak, P. Raghavan, and A. Tomkins.
Geographic routing in social networks. In Proceedings of the
National Academy of Science 102(33):11623-11628 (2005). (pdf)
R. Kumar, D. Liben-Nowell and A. Tomkins. Navigating Low-Dimensional and Hierarchical Population Networks. In European Symposium on Algorithms (ESA), 2006. (pdf)
D. Gruhl, R. Guha, D. Liben-Nowell and
A. Tomkins. Information Diffusion Through Blogspace. In
Proceedings of the Thirteenth International World Wide Web
Conference, New York, New York, 2004. ( pdf )
R. Kumar, J. Novak, P. Raghavan, and
A. Tomkins. On the bursty evolution of blogspace. In
Proceedings of the Twelth International World Wide Web
Conference, Budapest, Hungary, 2003. ( html , pdf )
D. Gruhl and R. Guha and R. Kumar and J. Novak and A. Tomkins. The
Predictive Power of Online Chatter. In Proceedings of the
Eleventh ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD), Chicago, IL, 2005. (pdf)
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew
Tomkins. The Web and Social Networks. In IEEE Computer 35(11):32-36
(2002). (pdf)
D. Chakrabarti, R. Kumar and A. Tomkins. Evolutionary Clustering.
In Proceedings of the Twelfth ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD), poster track, 2006. (pdf)
D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins and J. Zien. How to build a WebFountain: An
architecture for very large-scale text analytics. In IBM Systems
Journal, vol 43, number 1, 2004. (pdf)
K. McCurley and A. Tomkins. Mining and knowledge discovery from the
Web. In 7th International Symposium on Parallel Architectures,
Algorithms and Networks, Hong Kong, 2004. (pdf)
A. Dasgupta, R. Kumar, P. Raghavan and A. Tomkins. Variable Latent
Semantic Indexing. In Proceedings of the Eleventh ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD), Chicago,
IL, 2005. (pdf)
D. Gibson, K. Punera, and A. Tomkins. The Volume and Evolution of Web
Page Templates. In Proceedings of the Fourteenth International
World Wide Web Conference (WWW), Chiba, Japan, 2005. (pdf)
R. Kumar, K. Punera and A. Tomkins. Hierarchical Topic
Segmentation of Websites. In Proceedings of the Twelth ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD), 2006. (a href="/andrew/papers/website-segmentation/website-segmentation.pdf">pdf S. Dill, N. Eiron, D. Gibson, D. Gruhl,
R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan,
A. Tomkins, J. Tomlin, and J. Zien. Bootstrapping the
semantic web via automated semantic annotation (winner best paper
award). In Proceedings of the Twelth International World Wide Web
Conference, Budapest, Hungary, 2003. (pdf)
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. On semi-automated web taxonomy construction. In Fourth International Workshop on the Web and Databases (WebDB'2001), Santa Barbara, CA, May 24--25, 2001. ( pdf )
Tutorials
This is a tutorial that Jon Kleinberg and I gave at PODS99 on linear algebra techniques in information retrieval -- it's a broad introduction to the area that assumes little background beyond basic linear algebra.
J. Kleinberg and A. Tomkins. Applications of linear algebra in information retrieval and hypertext analysis. In Proceedings of the 18th ACM Symposium on Principles of Database Systems, 1999. ( pdf )
User targeting
Searching and querying large-scale data
Web graph analysis
D. Gibson, R. Kumar and A. Tomkins. Discovering Large Dense Subgraphs
in Massive Graphs. In IEEE International Conference on Very Large
Databases (VLDB), Trondheim, Norway, September 2005. (pdf)
Communities, blogs and social networks
WebFountain and content analysis
Website Analysis
Web taxonomies