Greg Linden
About: Currently Microsoft Live Labs, ex-Findory, ex-Amazon. Writes the weblog Geeking with Greg.
Blog Posts
KDD 2008 panel on the future of social networks
--A panel at KDD 2008 called "Social Networks: Looking Ahead" was more remarkable for the questions it failed to answer than the ones it did.One ques...
KDD talk on the Future of Image Search
--Jitendra Malik from UC Berkeley gave an enjoyable and often quite amusing invited talk at KDD 2008 on "The Future of Image Search" where he argued ...
Column versus row stores
--Daniel Abadi, Samuel Madden, and Nabil Hachem had a paper at SIGMOD 2008, "Column-Stores vs. Row-Stores: How Different Are They Really?" (PDF), wit...
Going to KDD 2008
--I'll be at KDD 2008 next week. If you make it there, please say hello!
Cassandra data store at Facebook
--Avinash Lakshman, Prashant Malik, and Karthik Ranganathan presented a talk at SIGMOD 2008, "Cassandra: Structured Storage System over a P2P Network...
Y Combinator's list of startup ideas
--Paul Graham at Y Combinator posts an interesting list of "Startup Ideas We'd Like to Fund".My favorites are Fix Advertising, Fixing E-mail Overload...
Clever method of near duplicate detection
--Martin Theobald, Jonathan Siddharth, and Andreas Paepcke from Stanford University have a cute idea in their SIGIR 2008 paper, "SpotSigs: Robust and...
BrowseRank: Ranking pages by how people use them
--Liu et al. from Microsoft Research Asia had the best student paper at SIGIR 2008, "BrowseRank: Letting Web Users Vote for Page Importance" (PDF), t...
Caching, index pruning, and the query stream
--A SIGIR 2008 paper out of Yahoo Research, "ResIn: A Combination of Results Caching and Index Pruning for High-performance Web Search Engines" (ACM ...
To personalize or not to personalize
--Jaime Teevan, Sue Dumais, and Dan Liebling had a paper at SIGIR 2008, "To Personalize or Not to Personalize: Modeling Queries with Variation in Use...
Modeling how searchers look at search results
--Georges Dupret and Benjamin Piwowarski from Yahoo Research had a great paper at SIGIR 2008, "A User Browsing Model to Predict Search Engine Click D...
Easy processing of massive data sets
--Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou have an upcoming paper at VLDB 2008, "SCOPE...
Kai-Fu Lee keynote at SIGIR
--Googler Kai-Fu Lee gave a keynote talk yesterday at SIGIR 2008 on "The Google China Experience".The Google China experience has been fraught with d...
Learning diversity when learning to rank
--Filip Radlinski, Robert Kleinberg, and Thorsten Joachims have a ICML 2008 paper, "Learning Diverse Rankings with Multi-Armed Bandits" (PDF), that a...
Automatic optimization on large Hadoop clusters
--Chris Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava at Yahoo Research had a USENIX 2008 paper, "Automatic Optimization of Paralle...
Social peer-to-peer and Tribler
--I finally got to a fun paper I have been meaning to read for some time, "Tribler: A social-based peer-to-peer system" (PDF).What is interesting abo...
Going to SIGIR
--I will be at SIGIR 2008 next week in Singapore. If you make it to what is very nearly the antipode for those of us in the US, please say hello!
Google Toolbar data and the actual surfer model
--There were a few interesting developments in the past couple weeks from Google that appear to relate to their ability to track much of the movement...
Black, white, and gray spam
--Scott Yih, Robert McCann, and Alek Kolcz had a paper at CEAS 2007 on "Improving Spam Filtering by Detecting Gray Mail" (PDF).The paper focuses on t...
Video recommendations on YouTube
--A WWW 2008 paper out of Google, "Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph" (PDF) presents "a novel me...
Google, the press, and tearing down your heroes
--The press seems to have a pattern reporting on successful technology companies. First, these companies can do no wrong, the heroes of our time, br...
Digg recommendation engine
--Kevin Rose writes that Digg is launching a recommendation engine that "uses your past digging activity to identify what we call Diggers Like You .....
Hadoop and scheduling
--A VLDB 2008 paper out of Yahoo Research, "Scheduling Shared Scans of Large Data Files" (PDF) looks at how "to maximize the overall rate of processi...
Amazon page recommendations
--Brady Forrest at O'Reilly Radar points out Amazon's new page recommendation widget in his post, "Amazon's Page Recommender: Foreshadowing A New Web...
Microsoft, big computation, and big data
--In his post, "Inside Microsoft's Internet Infrastructure & Its Plans for the Future", Om Malik highlights some interesting "facts about Microsoft-o...
Website personas
--Netflix has a feature called Profiles that allows multiple people to use the same Netflix account while keeping their queue and recommendations sep...
Advertising auctions and modeling externalities
--Googlers Gagan Aggarwal, Jon Feldman, S. Muthukrishnan, and Martin Pal have an upcoming paper, "Sponsored Search Auctions with Markovian Users", wi...
Getting things smart
--I was not going to post about this, but I cannot seem to get Steve Yegge's post, "Done, and Get Things Smart", out of my head. It is clever piece o...
Hal Varian on advertising auctions
--Google Chief Economist Hal Varian has a post on the Official Google Blog on "How auctions set ad prices".What is particularly interesting about the...
Stages of Web 2.0 startups
--Stacey Higginbotham at GigaOM has an amusing post, "The 5 Stages of a Consumer Web Startup".Some excerpts:One day an entrepreneur ... gets an idea ...
Jeff Dean on Google infrastructure
--Google Fellow Jeff Dean gave a talk at Google I/O called "Underneath the Covers at Google: Current Systems and Future Directions". Slides (PDF) al...
Sample programs in DryadLINQ
--A new technical report out of Microsoft Research, "Some sample programs written in DryadLINQ" (PDF), shows off some examples of large scale distrib...
The value of fanatical customer service
--Mike Masnick at TechDirt has an insightful post about how businesses should treat customer service as the face of the company, not as a cost center...
Machines versus humans at Google
--A curious revelation from Googler Peter Norvig appears in a recent post by Anand Rajaraman:[To execute a web search] a subset of documents is ident...
Collective intelligence requires more than voting
--Giles Bowkett has an insightful point on voting schemes at sites like Digg:When you build a system where you get points for the number of people wh...
Marissa on personalized search
--Google VP Marissa Mayer had a brief tidbit on personalized search at the Google I/O Conference:There will be a bigger personalization piece [in cor...
Udi on Google search quality
--Google VP Udi Manber offers a high level description of what goes into Google's relevance rank in his recent post, "Introduction to Google Search Q...
Yahoo builds two petabyte PostgreSQL database
--James Hamilton writes about Yahoo's "over 2 petabyte repository of user click stream and context data with an update rate for 24 billion events per...
Yahoo, Hadoop, and Pig Latin
--Chris Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins from Yahoo have an upcoming paper at SIGMOD 2008, "Pig Latin: A Not...
Advertising, search, and drive-by malware
--Googlers Niels Provos, Panayiotis Mavrommatis, Moheeb Rajab, and Fabian Monrose have an amusingly titled 2008 tech report, "All Your iFrames Are Po...
Starting Findory: Funding
--[This is a continuation of the posts in my Starting Findory series describing my experiences building my first company, Findory.]I had the wrong st...
Scaling Facebook's databases
--Impressive numbers from Facebook on their architecture: 1,800 MySQL servers (900 pairs of master/slave) holding a heavily partitioned data set mana...
Netflix-KDD Workshop
--The "Second KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition" will be on August 24 in Las Vegas.Last year's worksh...
Learning to love customers like you
--Michael Schrage at MIT Technology Review writes a fun article, "Recommendation Nation: Learning to love customers like you".Some excerpts:Recommend...
Crawling is harder than it looks
--The best paper award at WWW 2008 went to a paper on large-scale crawling titled "IRLbot: Scaling to 6 Billion Pages and Beyond" (PDF) by Hsin-Tsang...
Finding the location of interests and objects from search logs
--A paper at WWW 2008, "Spatial Variation in Search Engine Queries" (PDF), by Lars Backstrom, Jon Kleinberg, Ravi Kumar, and Jasmine Novak offered ma...
Random walks of the click graph
--I recently have been coming across several interesting efforts that use random walks of a click graph for relevance or keyword suggestions in searc...
Size matters? Or simplicity?
--An amusing WWW 2008 poster by Joshua Blumenstock, "Size Matters: Word Count as a Measure of Quality on Wikipedia" (PDF) found that a very simple me...
Questioning Yahoo Answers
--Zoltan Gyongyi, Georgia Koutrika, Jan Pedersen, and Hector Garcia-Molina published a paper at WWW 2008, "Questioning Yahoo Answers", that has an in...
Search trails and relevance
--Misha Bilenko and Ryen White from Microsoft Research had a paper at WWW 2008, "Mining the Search Trails of Surfing Crowds: Identifying Relevant Web...
Keynotes at WWW 2008
--The keynote talks at WWW 2008 promised to be exciting. With names like Google VP Kai-Fu Lee (who's departure from Microsoft famously revealed Ballm...
GoogleBot starts on the deep web
--Jayant Madhavan and Alon Halevy have the post on the Official Google Blog, "Crawling through HTML forms", announcing that the crawler will start su...
Mousetracking in web search
--Googlers Kerry Rodden, Xin Fu, Anne Aula, and Ian Spiro had a short paper last week at CHI 2008, "Eye-mouse coordination patterns on web search res...
Udi Manber interview in Popular Mechanics
--Google VP Udi Manber has an interesting interview in Popular Mechanics.An excerpt on learning to rank from searcher behavior:Our goal is very simpl...
Detecting near duplicates in big data
--I finally got to a WWW 2007 paper out of Google I have been meaning to read, "Detecting Near-Duplicates for Web Crawling" (PDF) by Gurmeet Manku, A...
Going to WWW
--I will be at the WWW 2008 Conference in Beijing next week.Very much looking forward to it! If you see me there, please say hello!
Google engEdu talk on Bloom filters
--Ely Porat gave a Google engEdu talk, "The Bloom filter", with a good survey of Bloom filters and variants on Bloom filters. Bloom filters let you q...
Contextual advertising and social networks
--Anand Rajaraman makes a counter-intuitive but compelling argument that having a close relationship between people who view a page reduces the value...
Cheap personalization using the referrer
--Danny Sullivan at Search Engine Land writes:Previous Query refinement is now coming to unpaid or "organic" search results, [Google VP Marissa Mayer...
Replication, caching, and partitioning
--Brian Aker (who is a Director at MySQL) posts about "The Death of Read Replication".It is a bit of a ramble, but Brian's thoughts are always worth ...
Massively distributed Ajax profiling
--Emre Kiciman and Ben Livshits have a SOSP 2007 paper, "AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applicatio...
The death and life of newspapers
--Eric Alterman writes an article, "Out of Print", in the March 31 New Yorker about "the death and life of the American newspaper."An excerpt:Until r...
Hadoop Summit notes
--James Hamilton posts detailed notes ([1] [2] [3] [4] [5]) on the talks at the Hadoop Summit.Hadoop is the open source version of Google GFS and Map...
Talk on disk as the new RAM
--Northeastern Professor Gene Cooperman recently gave a curious Google engEdu tech talk, "Disk-Based Parallel Computation, Rubik's Cube, and Checkpoi...
Using dwell times for search relevance
--UNC Professor Diane Kelly gave a Google engEdu talk recently titled "Relevance Feedback: Getting the Most out of Your User".I found the discussion ...
Using IMDb data for Netflix Prize
--In a post titled "More data usually beats better algorithms", Anand Rajaraman talks about the success a team from his data mining class at Stanford...
Designing for internet scale
--James Hamilton wrote a LISA 2007 paper, "On Designing and Deploying Internet-Scale Services" (PDF) with a remarkable brain dump of good advice on b...
Attending ICSWM 2008
--I will be at the "International Conference on Weblogs and Social Media" March 30 to April 2 here in Seattle.If you are interested in weblogs, the n...
Geeking with Greg inside Microsoft
--I have a started an internal version of Geeking with Greg inside Microsoft. It will be similar to this blog, but cover more day-to-day work, crazy...
Interview on MIX Online
--I have a video interview on MIX Online titled "Greg Linden: Trends in Collective Intelligence and Centralization". I ramble on for about 20 minutes...
Re-finding in Firefox 3
--It appears we will be seeing some interesting new support for re-finding in the next version of Firefox. From the release notes for Firefox 3 Beta...
Talk on Hadoop at Google Fremont
--UW CS graduate student Aaron Kimball is giving a talk, "Welcome to the new area of cloud computing", here in Seattle on Hadoop and tools for large ...
People who read this article also read
--I have an article in the March 2008 issue of IEEE Spectrum titled "People Who Read This Article Also Read...".The article is on personalized news. ...
People who read this article also read
--I have an article in the March 2008 issue of IEEE Spectrum titled "People Who Read This Article Also Read...".The article is on personalized news. ...
Does the entropy of search logs indicate that search should be easy?
--Qiaozhu Mei had a fun talk at WSDM 2008 with dramatic, sometimes outlandish, but always thought-provoking statements about web search based on an a...
Back from CI Foo
--I got back from Collective Intelligence Foo Camp late last night. As promised, here is a summary of some of the highlights, the major thoughts I t...
Yahoo deploys large scale Hadoop cluster
--Yahoo's Eric Baldeschwieler reports that Yahoo is now running a 10k+ core Hadoop cluster that holds over 5 petabytes of data.Very cool. It appears...
The madness of a growing crowd
--Nick Carr makes an observation about problems that occur in "self-regulating, super-democratic communities" as they grow. An excerpt:What we've s...
Clever exploit of DRAM to attack disk encryption
--Security guru Ed Felten posts about "Cold Boot Attacks on Disk Encryption", a sideways attack on BitLocker, FileVault, and other disk encryption pr...
Going to CI Foo
--I am very much looking forward to going to Collective Intelligence Foo Camp this weekend.CI Foo is a Tim O'Reilly gathering hosted by Google to dis...
Ranking using Indiana University's user traffic
--Mark Meiss gave an great talk at WSDM 2008 on a fascinating project where they tapped and anonymized all traffic into and out of Indiana University...
Oren Etzioni at WSDM 2008
--University of Washington Professor Oren Etzioni gave a fun and well done keynote talk in the second day of WSDM 2008 titled "Machine Reading at Web...
Hector Garcia-Molina at WSDM 2008
--Stanford Professor and search guru Hector Garcia-Molina gave a keynote talk yesterday at WSDM 2008 titled "Web Information Management: Past, Presen...
Network
|
|
|
|
|
Dragos Manolescu (mutual) fan |
|
|
|
|
|
Joe McCarthy (mutual) fan |
|
|
|
|
|
Marti Hearst (mutual) fan, want-to-meet |
|
|
|
|
|
Alexey Maykov (mutual) fan |
|
|
|
|
|
Natalie Glance (mutual) fan |
|
|
|
|
|
Matthew Hurst (mutual) fan |
Comments
I never miss a "Geeking with Greg" post.





