Monday, July 28, 2008

Cuil vs. Me.dium vs. Google

I love being able to pit startups against the 800 lb gorilla - people always love an underdog and you never know when the little guys might just hit the right pressure point and topple the King.

There have been a few notable new efforts lately in the evolving search wars.

Today Cuil launched its new search engine, following closely on the heals of Me.dium, who announced their search alpha two weeks ago, as well as a few month after Powerset and Searchme.

Let's look at the two recent competitors and do a quick compare of Cuil vs Me.dium. I think these two companies approach the problem of finding information in very different ways, not only as compared to each other, but also compared to existing players. Do either of their search results hint at a potential to beat Google?

A little background on the companies

Cuil.com, which has an impressive staff of search experts and ex-Googlers, has come bounding out of the gate taking several direct pot shots at Google. The biggest one is the size of the cuil index and how quickly they were able to create it. The founders include Tom Costello, Anna Patterson and Russell Power. The company, which has raised $33 million so far, claims to have indexed 120 billion pages prior to launch, and has decided to change the search results page (how dare they?!) from the well known, tried and true single column layout to a multiple column format.

Me.dium.com, a startup founded by Robert Reich (me), Peter Newcomb, David Mandell, and led by Kimbal Musk, launched its own search alpha to the public two weeks ago. The company, which bases its secret sauce on the browsing activity of real people has publicly stated it has half a million unique users surfing the web with the Me.dium sidebar and vetting a half billion web pages per month. The company has raised $20 million so far.

The Difference: People vs. Robots

The big difference between the two companies is how they crawl the web. Cuil uses Twicler, a robotic crawler, to build its index. Me.dium uses the actual browsing activity of people using a proprietary sensor along with a partnership with Yahoo. These two approaches produce very different results. I ran several tests and selected the following examples to illustrate the difference. The first is current and focused on Cuil and the second blends long tail and big head "Iran nuclear talks".

Search 1

Cuil.com - query "cuil new search engine"

me.dium.com - query "cuil new search engine"



Search 2

Cuil.com - query "iran nuclear talks"

me.dium.com - query "iran nuclear talks"

Conclusion

Me.dium's social search did a significantly better job at returning both types of queries. I am sure given enough time the Cuil engine will get better, but measuring Cuil's official release vs. Me.dium's alpha does not seem to be a contest. Yes, I am biased, but round one goes to Me.dium. Power to the people.

Wednesday, July 23, 2008

Why personalization is going to be the next big thing in Web Search

Intent is the holy grail of search. Crawlers and ranking algorithms are continuously being updated to try to squeeze more from the 2+ words people enter into a search box. Google has added web history within the past year, and they are getting much more aggressive with attention data, but as of today no one is leveraging personalization (this feels like an opportunity for a startup).

The big 4 are talking about it. In a recent interview "Search 2010: Thoughts on the Future of Search" , many of the participants Marissa Mayer - Google, Larry Cornett - Yahoo, Justin Osmer - Microsoft and Daniel Read – ASK, all stated personalization as one of the top areas for innovation over the next few years.

Why Is Personalization Important

The search problem is always fuzzy, a web search engine does not have enough information to return the perfect result and the perfect result for one person may be different than it is for another. For example, searching for the single word 'apple' at any of the top 4 web search engines would produce results that included 'Apple Computer', 'Apple Vacations' and 'Fiona Apple'. Depending on the intent of the user, the search term 'Apple' could be expanded to include 'Apple Washington State' or 'Apple IPhone' to produce a search result with significantly more relevant results.

Personalization is one method for accomplishing this goal and if done correctly can significantly reduce the number of results a single user has to scan to find the correct information.

How Does Web Search Personalization Work?

Personalization requires the user to give information about their likes and dislikes. This can be done explicitly like Facebook during the sign up process, implicitly like Google with search history and cookies or implicitly like Me.dium with a browser extension.

Both explicit and implicit data collection can be misleading.
  • To be Explicit, you would tell a web search engine all of your likes and dislikes. This would be time consuming, partially complete and out of date quickly
  • Implicit data capture has a tendency to weight informal but highly repetitive actions as important
My experience has shown that combining the two yields the best results, because you are able to gather initial data about the users interests explicitly and then continually refine them implicitly based on their behavior.

Once a web search engine decides to personalize its results, rather then keep them consistent for all users, it must modify its core ranking algorithms and, in Me.dium's case, also its crawling policies. Personalization, when done correctly feels like magic, when done poorly can be unbelievably confusing.

Let's review a few services that personalize results:

Amazon.com
Amazon uses implicit historical purchase data to recommend additional items.
"people who purchased this also purchased this"


Pandora.com
Pandora uses explicit songs and or band names to create custom radio stations

Facebook.com
Facebook uses explicit social graph data to assist in ranking people's search results

Why haven't the big 4 web search engines adopted personalization?

When I run a search using today's top 4 search engines I pretty much get the same answers from each. Try it I was personally surprised. Google, Yahoo, Live and Ask all use a publisher-centric model. This produces consistent results day-to-day, week-to-week and sometimes year-to-year. A query like 'Bill Clinton' produces results from his presidency, instead of his campaign issues with Hillary.

If I look at this from a financial and historical perspective, not from user value, I believe I understand how and why the web search industry has evolved. My hypothesis: consistent, non-personalized results were an appropriate way to monetize, and implement systems at scale. These challenges became the requirements for the systems we use today. I think we should label these web search engines as Stage 2. Stage 1 type systems were developed prior to Google and Teoma. Next generation web search engines, or shall we say Stage 3, will need to tackle the personalization challenge.

Thursday, July 17, 2008

Why build a robot when you can empower people?

Me.dium's Search Alpha


Me.dium's Social Search allows non Me.dium toolbar users to access the wisdom of the Me.dium community via a search box.

A half a million unique users are currently surfing the web with the Me.dium sidebar and vetting a half billion web pages per month. They scour the internet by searching, selecting links, chatting with friends and reading web pages together. The Me.dium servers then process the user actions into behaviors, which are used to rank the results in real-time.

What's different about Me.dium's Search is the results; every web page you see after clicking “I Feel Social,” has been vetted by actual Me.dium users. Me.dium's approach generates a very different result set than Google's publisher driven model. For example a recent search for "Hillary Clinton" on Google.com produced www.hillaryclinton.com as the number 1 result, while that same query on Me.dium produced YouTube - Barack Obama Hillary Clinton - Umbrella as the number 1 result. The Hillary Clinton result was in the top 10 for Me.dium but it was not number 1.

Side note: if you run the same query today, the results would be different on Me.dium but most likely the same on Google. This is another important difference, the crowds constantly evaluate what's interesting and Me.dium continually updates it's ranking based on the activity of the crowds. I ran the following search "G8" on July 9 2008 at Me.dium and at Google.

Me.dium has flipped the search model upside down and empowered the users of the information to decide what is important rather than the publishers. The dynamic nature of the internet is not the only area where the Me.dium Search excels. The pages being vetted include everything from from technical documentation to sports scores, from movie reviews to YouTube videos and from gossip to the best cures for most health problem.

The Business

Web search is all about ads these days, but the business actually starts with web crawlers. Leveraging another companies crawler limits your ability to do something different, you are bound by the metadata they collect. Crawl the internet yourself and it requires a large infrastructure. Each of the top search engines maintain their own snapshot of the internet. Me.dium is doing something different, we have our 2 million and growing editors, who have downloaded the sidebar, adding metadata to our own indexes daily.

Medium also has a partnership with Yahoo. Yahoo is trying to disrupt the search industry. I always loved the old Snapple add campaign from the 80's, "We are number 2, yay". Being number 2 means you can take risks. This is exactly what Yahoo is doing, first with modifications to the results page, SearchMonkey and now with the introduction of BOSS (Build Your Own Search Service). BOSS has its technical challenges, but its real power comes from a license change. The ranking algorithm has always been part of the crown jewels of any search engine, and the licensing of search results always comes with a big disclaimer, "You may not modify our results". Yahoo is changing the rules with it's release of BOSS, and in the process trying to disrupt Google's search/ad domination. Anyone with a license key, provided by Yahoo, can now leverage Yahoo's infrastructure.

The idea is simple enough, empower multiple startups by providing them with infrastructure in exchange for ad revenue. The hope, one or more of the companies stick and fractures the search /ad business in Yahoo's favor.

Why Is This Significant?

The search engine business seems simple enough from the outside, a user enters a few key words and the system finds some matching results. If you look under the covers you quickly realize this is not the case. Web Search is comprised of 5 key dimensions, ranking, comprehensiveness, freshness, presentation and speed. Yahoo, Google, Microsoft and Ask all maintain large data centers and staff to continually refine their search offerings. The majority of startups cannot raise the amount of capital necessary to compete but BOSS provides the opportunity to change the landscape. The only thing missing is the announcement of a new Venture Fund.

Me.dium was excited by the opportunity and the challenge. The key question we had to answer was how to blend our social search results with Yahoo's traditional web search. The end result, which is still in alpha launches today, give it a try and let me know what you think?

Wednesday, July 16, 2008

Why SEO is destined to change within the next 2 years

Search Engine Optimization (SEO) has numerous techniques for getting URL's into the top spots at the different search engines.

  • Clean URL’s
  • Sitemap’s
  • Paid blogging
  • Link purchasing

SEO experts are starting to master the process. For example, in the past week several news stories have covered people poking fun at Google’s core algorithms publicly via their web trends service.

Social media optimization (SMO) is the first new kid on the block to emerge after SEO and it’s a set of methods for generating publicity through social media. These techniques are proving effective at driving traffic from sites like Digg.

The underlying link structure of the web is most likely not going to change anytime soon, but the engines that interpret them are evolving quickly and new dimensions to the internet are starting to emerge. The Social Graph is the best example in the past few years, but others are coming:

These alternative dimensions or indexes are creating buzz. Me.dium for example launched a search engine last week that leverages attention data to determine which pages should be indexed and in what order they should be displayed. The attention data replaces the need for a web crawler in Me.dium's case, which means the global link structure of the internet is less important.

Obviously for Me.dium to succeed it has to provide a method for SEO to exist. Software is all about information ecosystems. Ecosystems made Microsoft, Oracle, SAP, Facebook and Google. If Me.dium is to succeed as a standalone brand it must find away to demonstrate its value to its customers and then create an ecosystem. Ecosystems also create viral loops or double viral loops. When this is accomplished the network is able to grow at an alarming rate.

What might Attention Data Search Optimization look like?

Attention Data Search Optimization (ADSO) is in its early days. I do not think anyone can predict what it will look like, but if I were to guess, I would suggest something similar to Google Adwords. The core algorithms of companies like Me.dium could be designed with dials that can be tweaked at run-time. These dials would enable an ADSO to increase the relevance of one URL over another in real-time.

A few possible ways this could be accomplished:

  • ADSO idea 1- use good old fashion money - An auction market is created and the highest bidders influence the results.
  • ADSO idea 2 use influence credits: Active participants of the system gets credits. The credits can be used to influence the weight of one URL over another in real-time.

What do you think ADSO might look like?

Tuesday, July 15, 2008

Why is web search such a hard problem?

Modern web search engines can be traced back to AltaVista, which was originally developed in 1995 by 2 DEC employees, Mike Burrows (google) (recently moved from Powerset to Microsoft) and Louis Monier (now with Cuill).

Historically when you think about web search, you had to think about 4 dimensions:
  1. Ranking
    • Put the best answer in the first few positions
      • (As the indexes get bigger and data types keep growing, finding the best answer gets complicated)
  2. Comprehensiveness
    • The web keeps getting bigger and bigger everyday
      • (Can search engine's find, access and index data fast enough?)
  3. Freshness
    • Just like the weather, web pages change at an alarming fast pace
      • (Can search engines update their own indexes fast enough?)
  4. Presentation
    • Make it attractive, quick to scan and support the query process
      • (Out of the gate, a search engine needs: spell check, also try, titles and abstracts and highlighting)
Google has added a fifth dimension "SPEED"
  • When fast page load times are combined with fast page scans, users are able to spend more time focused on the problem and less time waiting on technology.
18 years ago Louis Monier while working at DEC designed today's popular Search Engine Results Pages (SERP's) and times are finally changing. Startups as well as the big 4 are trying to accomplish a lot more with their SERP's. The historical layout appears to be breaking down.

Blended Results pages are coming

The presentation of blended information is forcing a change and potentially providing opportunity for those willing to take on the 800 lb gorillas. Below is one of gorillia's current attempts at blended SERP.

Google.com

The loss of the straight line down the left and the inclusion of a single photo dramatically changes the usability of this SERP. When you include a partial second column for ads and a sponsored link section at the top of the page, focus becomes an issue. You can see from the heat map below that we loose the golden triangle and eye movement seems to focus around the photo.


How are startups tackling this problem

SearchMe's beta UI leverages Apple's cover flow concept along with several other UI elements.


Kosmix.com

Kosmix's alpha UI represents a larger snippet and tons of related links.


Quintura.com

Quintura adds a tag cloud UI element to the left of the traditional search results

Viewzi has multiple UI's depending on the type of SERP you think you need.



Danny Sullivan reviews several of these sites at searchengineland.com.

What to do?

A single query today may yield 100's of videos, 1000's of images and millions of URL's. A new design grid has to be created, one that is organic enough to handle multiple data types, rigid enough to convey a consistent structure and visual enough to work across multiple demographics.

I have attached a wireframe to start the discussion, let me know your thoughts?


The wireframe above supports multiple data types on a single page and forces them into containers based on their data or semantic type. The containers themselves are also dynamic, visible only when appropriate. All containers can be minimized and maximized with a single click.

Monday, July 14, 2008

FireFox 3's extra large back button

FireFox Back Button

Why did Firefox 3 modify the forward and back button?

Historical research into browsing activity shows the back button was clicked about 35.7% of the time; hyperlinks being the only thing clicked more often (45.7%). This research was conducted in 1994 by Catledge and Pitkow.

Since that time the web has changed significantly. Research in this area is limited, which is surprising due to the popularity of modern web browsers. The most recent study I found, Weinreich, Obendorf, Herder and Mayer [2004-2005] suggests modern web site design has minimized the need for the back button. New browser features like tabbed browsing, new programming techniques and better site navigation have changed the way we navigate the web.

According to Weinreich, Obendorf, Herder and Mayer, the back button has decreased in usage from 35.7% to 14.3%. My informal study suggests this number has decreased even more. I find myself and those I polled for this article, opening tabs or windows instead of maintaining a single window and using the back button.

Esthetically, I like the look of the larger button and its reset style, but I feel it is distracting to the browsing experience. When I am viewing popular websites like cnn.com or the nytimes.com it becomes a dominate element for my eye.

The functionality is obvious and well executed once activated, but I am still wondering why they choose to make it so LARGE and contrasting by default?