Home All papers Authors Tags Topics

Web Research

This page covers research that's at least peripherally related to the web. For non web-related work, look here .

I just updated this site, and haven't had a chance yet to move this page and the corresponding non-web-research page over to the new format, so the links to paper content are broken. Please take a look at the full list to find any particular paper; sorry for the inconvenience.

General audience surveys

This article from Scientific American gives an overview of search and related techniques that make use of link analysis over hyperlinked corpora. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Hypersearching the web. Scientific American, June 1999. ( html )

This is a brief summary of some results on web structure, covering HITS, the bow tie model, and the fractal structure of the web. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins: The Web and Social Networks. IEEE Computer 35(11): 32-36 (2002) (pdf)

This is a slightly more detailed view of applications that make use of the web's link structure; it's getting a little dated. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the web's link structure. IEEE Computer, August 1999. ( pdf )

This paper in CACM 2004 gives an analysis of the bloggers on the LiveJournal blog-hosting site: their interests, locations, and demographics, from the perspective of the social network.

Tutorials

This is a tutorial that Jon Kleinberg and I gave at PODS99 on linear algebra techniques in information retrieval -- it's a broad introduction to the area that assumes little background beyond basic linear algebra. J. Kleinberg and A. Tomkins. Applications of linear algebra in information retrieval and hypertext analysis. In Proceedings of the 18th ACM Symposium on Principles of Database Systems, 1999. ( pdf )

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Social networks: From the web to knowledge management. Book chapter in Web Intelligence, editors: Ning Zhong, Jiming Liu, Yiyu Yao, by Springer-Verlag, pages 367--379, January 2003. (pdf)

These slides are from part of an AMS tutorial. They cover some basic introductory remaks about power laws and related heavy-tailed distributions, and discuss a set of generative models for these distributions. Slides from an AMS tutorial on power laws and generative models.

User targeting

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Recommendation systems: a probabilistic analysis. Journal of Computer and System Sciences (JCSS), 63(1):42--61, August, 2001. appeared in Proc. 39th Symposium on Foundations of Computer Science, 1998. ( pdf )

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. On targeting markov segments. In Proceedings of the ACM Symposium on Theory of Computing, 1999. ( pdf )

Searching and querying large-scale data

A. Broder, M. Fontura, V. Josifovski, R. Kumar, R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins and Y. Xu. Estimating Corpus Size via Queries. In Conference on Information and Knowledge Management (CIKM), 2006. (pdf)

R. Fagin, R. Guha, R. Kumar, J. Novak, D. Sivakumar, and A. Tomkins. Multi-Structural Databases. In Proceedings of the 24th ACM Symposium on Principles of Database Systems, 2005. ( pdf )

R. Fagin, P. Kolaitis, R. Kumar, J. Novak, D. Sivakumar, and A. Tomkins. Efficient Implementation of Larce-Scale Multi-Structural Databases. In IEEE International Conference on Very Large Databases (VLDB), 2005. ( pdf )

Soumen Chakrabarti, Byron E. Dom, David Gibson, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Topic distillation and spectral filtering. Artificial Intelligence Review, 13:409--435, 1999. ( pdf )

S. Chakrabarti, B. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Spectral filtering for resource discovery. In SIGIR 98 Workshop on Hypertext Analysis, 1998. ( postscript )

Web graph analysis

D. Gibson, R. Kumar and A. Tomkins. Discovering Large Dense Subgraphs in Massive Graphs. In IEEE International Conference on Very Large Databases (VLDB), Trondheim, Norway, September 2005. (pdf)

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large scale knowledge bases from the web. In IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland, September 1999. ( pdf )

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 31:1481--1493, 1999. Conference version at Eighth Internation World Wide Web Conference, 1999. ( html )

Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web (winner best paper award). In Proceedings of the Ninth International World Wide Web Conference, 2000. ( html )

A. Arasu, J. Novak, A. Tomkins, and J. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms, 2002. ( pdf )

Stephen Dill, Ravi Kumar, Kevin S. Mccurley, Sridhar Rajagopalan, D. Sivakumar, and Andrew Tomkins. Self-similarity in the web. ACM Transactions on Internet Technology (TOIT), 2(3):205--223, 2002. Appeared in IEEE International conference on Very Large Databases (VLDB) 2001, Rome, Italy. ( pdf )

J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing, number 1627 in LNCS. Springer-Verlag, July 1999. ( pdf )

Z. BarYossef, A. Broder, R. Kumar and A. Tomkins. Sic Transit Gloria Telae: Towards an Understanding of theWeb's Decay. In Proceedings of the Thirteenth International World Wide Web Conference, New York, New York, 2004. ( html , pdf )

S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The web as a graph. In Proceedings of the 19th ACM Symposium on Principles of Database Systems, pages 1--10, 2000. ( pdf )

R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In Proc. 41st Symposium on Foundations of Computer Science, 2000. ( pdf )

R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins. Random walks with ``back buttons''. In Proceedings of the ACM Symposium on Theory of Computing, 2000. ( pdf )

Communities, blogs and social networks

R. Kumar, J. Novak and A. Tomkins. Structure and Evolution of Online Social Networks. In Proceedings of the Twelfth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), poster track, 2006. (pdf)

J. Novak, P. Raghavan and A. Tomkins. Anti-Aliasing on the Web. In Proceedings of the Thirteenth International World Wide Web Conference, New York, New York, 2004. ( pdf )

R. Guha, R. Kumar, P. Raghavan and A. Tomkins. Propagation of Trust and Distrust. In Proceedings of the Thirteenth International World Wide Web Conference, New York, New York, 2004. ( html , pdf )

C. Faloutsos, K. McCurley and A. Tomkins. Fast Discovery of Connection Subgraphs. In Tenth ACM SIGKDD Conference, Seattle, WA, 2004. ( pdf )

R. Kumar, D. Liben-Nowell, J. Novak, P. Raghavan, and A. Tomkins. Geographic routing in social networks. In Proceedings of the National Academy of Science 102(33):11623-11628 (2005). (pdf)

R. Kumar, D. Liben-Nowell and A. Tomkins. Navigating Low-Dimensional and Hierarchical Population Networks. In European Symposium on Algorithms (ESA), 2006. (pdf)

D. Gruhl, R. Guha, D. Liben-Nowell and A. Tomkins. Information Diffusion Through Blogspace. In Proceedings of the Thirteenth International World Wide Web Conference, New York, New York, 2004. ( pdf )

R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the Twelth International World Wide Web Conference, Budapest, Hungary, 2003. ( html , pdf )

D. Gruhl and R. Guha and R. Kumar and J. Novak and A. Tomkins. The Predictive Power of Online Chatter. In Proceedings of the Eleventh ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, 2005. (pdf)

Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins. The Web and Social Networks. In IEEE Computer 35(11):32-36 (2002). (pdf)

WebFountain and content analysis

D. Chakrabarti, R. Kumar and A. Tomkins. Evolutionary Clustering. In Proceedings of the Twelfth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), poster track, 2006. (pdf)

D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins and J. Zien. How to build a WebFountain: An architecture for very large-scale text analytics. In IBM Systems Journal, vol 43, number 1, 2004. (pdf)

K. McCurley and A. Tomkins. Mining and knowledge discovery from the Web. In 7th International Symposium on Parallel Architectures, Algorithms and Networks, Hong Kong, 2004. (pdf)

A. Dasgupta, R. Kumar, P. Raghavan and A. Tomkins. Variable Latent Semantic Indexing. In Proceedings of the Eleventh ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, 2005. (pdf)

Website Analysis

D. Gibson, K. Punera, and A. Tomkins. The Volume and Evolution of Web Page Templates. In Proceedings of the Fourteenth International World Wide Web Conference (WWW), Chiba, Japan, 2005. (pdf)

R. Kumar, K. Punera and A. Tomkins. Hierarchical Topic Segmentation of Websites. In Proceedings of the Twelth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2006. (a href="/andrew/papers/website-segmentation/website-segmentation.pdf">pdf)

Web taxonomies

S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. Tomlin, and J. Zien. Bootstrapping the semantic web via automated semantic annotation (winner best paper award). In Proceedings of the Twelth International World Wide Web Conference, Budapest, Hungary, 2003. (pdf)

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. On semi-automated web taxonomy construction. In Fourth International Workshop on the Web and Databases (WebDB'2001), Santa Barbara, CA, May 24--25, 2001. ( pdf )