Mar 082013
 

Along the same lines as Google Flu Trends, researchers at Microsoft, Stanford and Columbia University are investigating whether search data can be used to find interactions between drugs. They recently found an interaction.

Using automated software tools to examine queries by six million Internet users taken from Web search logs in 2010, the researchers looked for searches relating to an antidepressant, paroxetine, and a cholesterol lowering drug, pravastatin. They were able to find evidence that the combination of the two drugs caused high blood sugar.

The idea is that people are searching for symptoms and medications, and this data is stored in anonymized search logs. They then followed a suspicion that using the two drugs at the same time might cause hyperglycemia. Those that searched for the two drugs were more likely to search for hyperglycemia than the control group (probably those who didn't search for hyperglycemia).

The work is still in its infancy, but it'll be interesting to see how this sort of data can be used to supplement existing work by the Food and Drug Administration.

Better Beaches

 Search  Comments Off
Jan 272013
 

Having recently returned from a trip to Kauai where I used my beach search engine with middling success, I've now got a few updates out on the site.

Firstly, there is a full map showing either all the beaches in a location, or all the beaches from a search within a location. This was a pretty obvious missing feature.

HeroMap

Secondly, as this is an active map, you can zoom and pan the map which interactively restricts the set of results.

ZoomedMap

There are some minor improvements to other elements of the site as well.

Note - something that always interests me is the relationship between back-end data quality and the presentation of the data. By having a complete map of beaches, it highlights cases where there are duplicates in the results (a topic for another post).

If you are heading to Hawaii - give it a try and let me know how you get on.

Beach Search Engine Demo

 Search  Comments Off
Nov 292012
 

I've written recently about building a perfect beach search engine. Here is a brief example of using the site.

Let's imagine you want to find a beach that offers snorkeling, but you want to find one that is shallow because you have small children with you. A query for 'snorkeling AND shallow' brings up the following results:

Beach

(currently the data on the site is limited to Kauai and Maui).

From these, I might decide that I'm more interested in Kauai than Anini (it does appear to have more of the beaches that I'm looking for after all).

Clicking on Kauai, then selecting Anini beach leads me to the following:

Beach2

Here I can see that there is plenty of content describing the beach. Some of the content contains highlighted terms that lead me to confirm that it is indeed described as a beach that offers snorkeling and that it is shallow. Content from both Fodor's and FourSquare supports this:

A great family park, Anini is unique in that it features one of the longest and widest fringing reefs in all Hawaii, creating ashallow lagoon that is good for snorkeling and following the occasional turtle.... [Fodors]

A nice, shallow place for snorkeling, but can get tiring to swim around if the current is too strong.... [FourSquare]

There are many issues with the site that I'm slowly addressing, but the basic task of finding beaches seems to be up and running. If you are interested, please take a look and let me know what you think - the site is available here.
Nov 122012
 

Currently, as I've mentioned in previous posts, beaches are a strangely under-served segment of the local search space. Searches on Google and Bing for beaches are fielded by entities such as resorts and restaurants that happen to be matches for certain beach related terms. If you search for 'beaches in kauai' you will get hits for beach resorts, etc.

There is plenty of content about beaches, from the many dedicated locale sites to general travel related community sites (like Trip Advisor) and editorial sites (like Fodor's). In addition, there are a number of resources that aggregate structural data about beaches. These include open data resources like GeoNames and GNIS but also proprietary resources like Foursquare.

Unfortunately, there is nothing that brings all these things together. There is not product which provides an aggregate view of the set of beaches or the collection of things said or otherwise reported about them.

With an upcoming trip to Hawai'i at the end of the year, I wanted to make sure I was getting the best value for my travel dollars. I've build a prototype beach search engine which provides the following.

  • a partly curated set of beach data covering approximately 12, 000 international beaches
  • aggregation of beach related content
  • search funtionality (so you can search for kid friendly beaches that offer good snorkeling)
  • summarization of Flickr images so that an impression of what it's like to be at the beach can be formed

I believe there is plenty of potential for such a system. I've already found some hidden beaches that I wasn't aware of at our destination that I'm excited to check out when we get there. My goal is to make the system public in the next few weeks (my trip will be a forcing function for this!).

For now, here is a screen shot of part of the experience.

Beachgeek

Sep 302012
 

[I work at Microsoft where I work on projects that drive data quality in our local search experiences on Bing and other clients.]

Most of the civilized world, but this time, has heard about Apple's fumble with their new mapping and local search capabilities in iOS. Apple replaced Google's application - which is possibly the largest investment in cartography, imagery and local data ever made - with a home grown solution reportedly rolled out of maps from a number of providers including TomTom and local data from providers including Yelp.

As Apple has realised, there is a lot to learn for an entrant in this space. The hardest lesson they are learning now is actually not about data sources but about metrics and how to assess the quality of the product - something which they don't appear to have invested in in a manner fitting to their global user base.

Apple will soon learn another lesson. Once the fog has lifted over the state of their entity data set (e.g. fixing the location of cities and ensuring coverage for local businesses), Apple will have to start worrying about ranking search results. When a user asks for {kid friendly sushi in seattle} which of the many sushi places ought they to return. They will be presented with a choice between specialized providers - with whom they will actually be in competition - or creating the resources required for relevance ranking themselves.

A key aspect of providing appropriate indexing and ranking features is the association of content with the entities. Where does this content come from? The web. How is it acquired? Through large scale crawling, understanding and indexing.

Apple will likely find that as they pull on the thread of local search, their scope will have to open up to quite a different world, another world which - like local - they haven't yet the expertise in.

Sep 302012
 

[I work at Microsoft where I work on projects that drive data quality in our local search experiences on Bing and other clients.]

Most of the civilized world, by this time, has heard about Apple's fumble with their new mapping and local search capabilities in iOS. Apple replaced Google's application - which is possibly the largest investment in cartography, imagery and local data ever made - with a home grown solution reportedly rolled out of maps from a number of providers including TomTom and local data from providers including Yelp.

As Apple has realised, there is a lot to learn for an entrant in this space. The hardest lesson they are learning now is actually not about data sources but about metrics and how to assess the quality of the product - something which they don't appear to have invested in in a manner fitting to their global user base.

Apple will soon learn another lesson. Once the fog has lifted over the state of their entity data set (e.g. fixing the location of cities and ensuring coverage for local businesses), Apple will have to start worrying about ranking search results. When a user asks for {kid friendly sushi in seattle} which of the many sushi places ought they to return. They will be presented with a choice between specialized providers - with whom they will actually be in competition - or creating the resources required for relevance ranking themselves.

A key aspect of providing appropriate indexing and ranking features is the association of content with the entities. Where does this content come from? The web. How is it acquired? Through large scale crawling, understanding and indexing.

Apple will likely find that as they pull on the thread of local search, their scope will have to open up to quite a different world, another world which - like local - they haven't yet the expertise in.

Aug 262012
 

We will soon be embarking on a short trip to Hawai'i. Naturally, I'm turning to search engines to find out about the best beaches to go to. However, it turns out that this simple problem - where to go on vacation - is terribly under supported by today's search engines.

Firstly, there is the problem with the Web Proposition. The web proposition - the reason for traditional web search engines to exist at all - states that there is a page containing the information you seek somewhere online. While there are many pages that list the 'best beaches in Hawai'i' as the analysis below demonstrates these are just sets of opinions - often very different in nature. An additional problem with the Web Proposition is that information and monetization don't always align. Many of the 'best' beaches pages are really channels through which hotel and real estate commerce is done. Thus a balance is needed between objective information and commercial interests.

Secondly, beaches are not considered local entities by search engines. While the query {beaches in kauai} is very similar in form to the query {restaurants in kauai} the later generates results of entities of type <restaurant> while the former generates results of entities of type <businesses that have beach or kauai in their name or associated content>. While local search sounds like search over entities which have location, it is largely limited to local entities with commercial intent.

Finally, there is general confusion due to the fact that the state of Hawai'i contains a sub-region (an island) called Hawai'i.

To get to the answer to my original search query, I reviewed 8 sites which resulted in a search on Bing or Google for the query {best beaches hawaii}. I then reviewed each of these and created a spread sheet tabling all the beaches and whether they were voted for by the site.

Of the 57 beaches that were mentioned on at least one site, the average number of mentions was 1.89. This indicates a general lack of consensus regarding which are the best beaches. In fact, most beaches (38 out of 57) have only a single vote. Consequently, while there might be a set of pages returned by search engines for queries looking for such information, a user will be reading - in isolation - very different opinions with no aggregate view summarizing them.

The top beaches are summarized in the following table showing the beach and the total votes.

Beachtable

Search engines could do a far better job by:

  1. Generalizing local search to include any entity which has location, not just commercial entities.
  2. Leveraging editorial content (like that reviewed in this post) so that variance may be exposed to the user but aggregates can also be synthesized.

In addition, there is a very large opportunity here in analyzing the content associated with these local entities do determine which beaches are best for different activities, their accessibility, and so on.

Aug 262012
 

We will soon be embarking on a short trip to Hawai'i. Naturally, I'm turning to search engines to find out about the best beaches to go to. However, it turns out that this simple problem - where to go on vacation - is terribly under supported by today's search engines.

Firstly, there is the problem with the Web Proposition. The web proposition - the reason for traditional web search engines to exist at all - states that there is a page containing the information you seek somewhere online. While there are many pages that list the 'best beaches in Hawai'i' as the analysis below demonstrates these are just sets of opinions - often very different in nature. An additional problem with the Web Proposition is that information and monetization don't always align. Many of the 'best' beaches pages are really channels through which hotel and real estate commerce is done. Thus a balance is needed between objective information and commercial interests.

Secondly, beaches are not considered local entities by search engines. While the query {beaches in kauai} is very similar in form to the query {restaurants in kauai} the later generates results of entities of type <restaurant> while the former generates results of entities of type <businesses that have beach or kauai in their name or associated content>. While local search sounds like search over entities which have location, it is largely limited to local entities with commercial intent.

Finally, there is general confusion due to the fact that the state of Hawai'i contains a sub-region (an island) called Hawai'i.

To get to the answer to my original search query, I reviewed 8 sites which resulted in a search on Bing or Google for the query {best beaches hawaii}. I then reviewed each of these and created a spread sheet tabling all the beaches and whether they were voted for by the site.

Of the 57 beaches that were mentioned on at least one site, the average number of mentions was 1.89. This indicates a general lack of consensus regarding which are the best beaches. In fact, most beaches (38 out of 57) have only a single vote. Consequently, while there might be a set of pages returned by search engines for queries looking for such information, a user will be reading - in isolation - very different opinions with no aggregate view summarizing them.

The top beaches are summarized in the following table showing the beach and the total votes.

Beachtable

Search engines could do a far better job by:

  1. Generalizing local search to include any entity which has location, not just commercial entities.
  2. Leveraging editorial content (like that reviewed in this post) so that variance may be exposed to the user but aggregates can also be synthesized.

In addition, there is a very large opportunity here in analyzing the content associated with these local entities to determine which beaches are best for different activities, their accessibility, and so on.

Apr 192012
 

A colleague brought to my attention a post on the influential search blog Search Engine Land which makes claims about the quality of local data found on search engines and local verticals: Yellow Pages Sites Beat Goolge In Local Data Accuracy Test. The author describes surprise at the outcome reported - that Yellow Pages sites are better at local search than Google. Rather, we should express surprise at how poorly this article is written and at the intentional misleading nature of the title.

The article describes an analysis done by Implied Intelligence. The analysis looks at 1, 000 local businesses in the US. Here is the first problem - these businesses exclude chains and franchises. In addition, if a website wasn't known for the business, it too was excluded. With some general assumptions about the definition of local business, it is safe to assert that firstly there are many instances of chains and franchises out there and secondly that many (if not most) businesses don't have a website (the distribution varies by category of course). Quite where the original sample of 1, 000 came from is not reported.

This biases the analysis - Google, like Bing is intersted in all local entities.

The initial part of the analysis is reasonable - looking at coverage (% in the sample found on the site) and quality (duplicates, phone number errors and adderss errors). Note, however, that this is a measure of the local data, not of local search. A search product includes a relevance component and it is quite possible that a well tuned relevance algorithm might suppress duplicates.

The last table in the analysis sees us swinging back to bad reporting. It describes the percentage of records that have a certain attribute: URL, Hours of Operation and 'additional info'. Did you see what they did there? This is what we call the coverage of an attribute, and it tells us nothing as to the quality of the value. I can quite easily populate a local database with 100% coverage for all attributes. They might all be wrong, but the coverage could be 100%. Consequently, this table is reasonably close to meaningless. If they had included the precision of these values then coverage can be used to compute recall, but that wasn't done.

In summary, an important search publication has either written an intentionally misleading article, or has demonstrated that it doesn't really get data.

Feb 142012
 

One of the options available to NINES users is the ability to tag objects with their own keywords – as many as they would like. These keywords can be used for personal reference (that is, while researching, preparing class materials or exhibit building) but they are also shared anonymously with the entire NINES community to facilitate browsing. And, when examined more closely, the most frequently used tags tell us a bit about the interests of those most active in the NINES community.

The bar graph above shows the top 25 most often used keywords in NINES, and the frequency of their use. Unlike the great majority of NINES tags, which may be idiosyncratic and the result of one or two users’ interests, these keywords have been used as many as 150 times to describe NINES objects, and are more likely to reflect multiple users’ research and interests. Below is a glimpse of the top 3!

Frontispiece (150 objects, as of February 13, 2012)

         

Maps (141)

     

Portrait (128)

         

Browse the tag cloud or search and add your own tags to the NINES community today!