Recently, there has been a lot of buzz about real-time search, but is it necessary? First, let’s look at the current state of search and crawl.*
Unless your site is decidedly authoritative, like CNN.com, you’re likely to get crawled as Google indexes more authoritative sites that are linking to your own. Your site will end up on a particular crawling schedule.
The lengthening or shortening of the crawl schedule, with blogs especially, is largely determined by the amount of new content found on the site each time it’s crawled. In the chart below, the diagonal lines represent getting crawled by the search engine and the ominous black spots represent posting new content. In this case, if you haven’t posted in a while, you’ve probably worked up a fairly large interval between crawls. If you suddenly return to posting on a consistent schedule, over time the crawl interval will be narrowed until your content gets indexed soon after posting.
In essence, you can and should train Google to index your site more frequently by posting new content regularly or by getting new backlinks to your site.
Real-time indexation is just what it sounds like. Content is indexed and searchable immediately upon publication. None of the big three engines are there yet.
Is real-time indexing by search engines (and hence real-time search) inevitable? It’s starting to appear so.
Twitter is already considered to be real-time, though it’s far from a genuine search engine. Microsoft seems to have tweaked Bing to place higher value on more recent news. In tests, Google Caffeine, the new under-infrastructure version of the search giant, seems to be indexing a lot more pages and giving higher placement to the newest content than the current version. And Facebook’s FriendFeed acquisition suggests they’re definitely eyeing the real-time search space.
Real-time search helps anybody who reads or writes content with a short shelf-life. If you post about an in-progress disaster, a celebrity death, or a limited-time offer, your content is hot one minute, cold the next, so quick indexation by search engines means that your content will be found while it’s still relevant. You would probably gain a good amount of site traffic just by riding the wave and capitalizing on long-tail searches, regardless of how frequently you post.
The real-time search goal has plenty of obstacles. Real-time indexation takes a mountain of data computation power. Plus, algorithmically, how do you consistently showcase an on-scene Twitterer’s play-by-play updates over the Huffington Post side commentary during a crisis? Or do you? You can’t use backlinks as a determinant. Authority is negligible. One practical solution would be to house real-time search separate from regular search, just like Google News is separate from the primary index. Regardless, real-time search is only as valuable as the relevance of the top-ranking content and is likely to look different from today’s version.
Until we get there, the most important thing you can do now is get your site as close as possible to real-time indexation using the available SEO techniques.
Create good content on a consistent schedule, applying other relevant SEO tactics to optimize your site, and building up your authority
Create sitemaps for your site so search engines know which pages to crawl
Use NoFollow tags on non-critical pages as a way of shining a light on the more important ones
Submit your site and content to directories and social bookmarking sites
Work on building links from more authoritative sites pointing to your own
*For clarification, crawling (or spidering) is the method search engines use to populate their data repositories so people can search using their websites. It involves running programs called bots (or spiders) that go from link to link scouring web pages and returning information to be indexed.