Showing posts with label semantic web. Show all posts
Showing posts with label semantic web. Show all posts

Monday, September 14, 2009

Heuristic Search Algorithms : StumbleUpon.com

Amazon.com > Books > Computers & Internet > Computer Science > Artificial Intelligence > Heuristic & Constrained Search >
"Good decisions are born of experience. Experience is born of bad decisions."-- Unknown

Don't let the techie title throw you -- heuristic is just a ten-dollar word for learning, and algorithms are just collections of computer commands that accomplish a certain task, much like a recipe. In fact, computer algorithm books are often called cookbooks. So what we're talking about here are computer searches that improve as you use them.

As an example, take Amazon.com's recommendation feature. (If you've read this blog before, you probably saw that coming!) Amazon tracks your browsing behavior on their site using magic browser cookies, and based on the information they collect, guesses what other pages might interest you. BTW, the recommendations I get aren't particularly focussed, since I crawl all over Amazon looking for niche market products. The heuristic algorithm therefore assumes that I'm interested in just about anything. Come to think of it, that's exactly what its supposed to do.

Another familiar example is the Yahoo! Search suggestions feature. If, for example, you enter the search term "router," a box of suggestions will appear after a brief delay. These might include "woodworking routers" or "ethernet routers" or "Netgear routers". I'm not using actual examples from Yahoo! since those are subject to change, but these examples should be sufficient to illustrate the point. I like the fact that Yahoo! merely suggests possible refinements -- Microsoft has an irritating tendency to assume that they know what you want better than you do.

StumbleUpon.com is a social-networking site and search engine that allows you to select up to 127 interests that will be used to customize your search results. As you use the service, you can click on one of two buttons (Thumbs Up) I like this or (Thumbs Down). Simple. If you're ambivalent, you don't have to rate a page. I've seen a lot of rating systems, but this is probably my favorite because you're not ruining somebody's day by giving them the thumbs down -- and they can't ruin yours. All you are doing is demoting the page and others like it in your own search results. People who think Dick Cheney and Donald Rumsfeld are great American patriots are unaffected.

I looked at this service a couple of years ago and wasn't very impressed, but I don't recall the search feature being present then. I may have overlooked it, but I think you just had to "stumble" from page to page. That might be alright for casual surfing, but it's not very focussed. The database is now a lot bigger too. To try Stumbling, go to the Getting Started page, where you will learn all you need to know. If you have trouble adding pages with the toolbar because of your firewall settings, old or weird browser, or whatever, you can use the form below:

Submit Page to StumbleUpon.com

http://

Last, but certainly not least, your StumbleUpon history generates your own personal StumbleUpon blog, which is visible to the major search engines if you're just looking for backlinks, and to other stumblers, of course, if you are more interested in the social networking aspects of the site. You are cordially invited to subscribe to my StumbleUpon blog at any time.

Monday, August 24, 2009

Lijit Search

I've added a new Lijit Search form to this blog which I hope will make it easier to find content from a variety of places where I publish. I've been struggling to be found by the well-known search engines for some time now, but this service is a new one to me. Apparently it allows you to build a custom search engine using your social bookmarks.

It's a very different kind of crawler, and I'm still trying to sort out exactly how it works. It seems to rely primarily on RSS data, another technology I need to learn more about. I did an initial set-up about a week ago, and it's probably too soon to expect much from the search results, but it is possible to get search hits for specific keywords known to be near the top of the queue -- such as "vaporizer", for example. So I know it's working, I'm just not sure how well.

Additionally, I set up the embedded search results feature today, which requires a bit of JavaScript running on an assigned landing page. If you're reading this from a feed, that should explain the previous cryptic post. You have to be at the actual Whole Ed Cata-Blog site and enter a search term into the Lijit Search form in the blog template for results to be displayed.

I don't care much for blogs about nothing ("I'm sitting at my computer now, writing a post about the post I'm writing for my blog...") Therefore, I'll cut this short. I really just wanted to explain that last post to subscribers, and explain why the Lijit search results are as bad as they are right now. It just takes some time for me to figure out how to set it up efficiently, and to crawl the network.

Saturday, January 05, 2008

Featured Classical Music

This article really is about Classical Music, but first I'm going to "entertain" you with a brief diatribe on the semantic web and search engine optimization. People like choices, but they don't like to be overwhelmed with a very large number of choices. Amazon.com's approach to this is a logical one: They arrange their products into broad categories, which are divided into subcategories, and further into sub-subcategories for as many levels as they find appropriate, as in the familiar outline form.

This is reasonable, but it presents a problem to those searching for products. Without a key to Amazon's taxonomy (or outline), one might easily take a wrong turn in the process of "drilling down" to the specific target classification. Search engines don't adequately address this problem. Attempts to search for a specific category simply return all items containing the search keywords instead of limiting results to category headings.

Google's PageRank alogrithm (and other search engine ranking schemes that are less well-documented) only compounds this problem. Briefly, search engines assume that the more specific "lower" levels of the outline are less significant than the "higher," more general headings. In fact, we humans already have our own taxonomy in our rich and diverse languages (English, in this case). It is far more likely that we will select highly specific terms in our very first search attempt than extremely broad ones. PageRank is upside-down.

What does this have to do with classical music? Only this: Amazon's Classical Music category is a recent spin-off of their more general Music category and has relatively few subcategories. Therefore, I hope to achieve my goal of creating more useful search engine listings sooner and more directly in these top-level categories: