Diagrams for Solving Crawl Priority & Indexation Issues
December 28, 2009
Google’s Indexation Cap
December 28, 2009
Google (very likely) has a limit it places on the number of URLs it will keep in its main index and potentially return in the search results for domains.
Let’s examine some of the potential metrics Google looks at to determine indexation:
- Importance on the Web’s Link Graph
We’ve talked previously about metrics like a domain-level calculation of PageRank (Domain mozRank is an example of this). It’s likely that Google would make this a backbone of the indexation cap estimate, as sites that tend to be more important and well-linked-to by other important sites tend to also have content worthy of being in the index. - Backlink Profile of the Domain
The profile of a site’s links can look at metrics like where those links come from, the diversity of the different domains sending links (more is better) and why those links might exist (methods that violate guidelines are often getting caught and filtered so as not to provide value). - Trustworthiness of the Domain
Calculations like TrustRank (or Domain mozTrust in Linkscape) may make their way into the determination. You may not have as many links, but if they come from sites and pages that Google trusts heavily, your chances for raising the indexation cap likely go up. - Rate of Growth in Pages vs. Backlinks
If your site’s content is growing dramatically, but you’re not earning many new links, this can be a signal to the engine that your content isn’t “worthy” of ongoing attention and inclusion. - Depth & Frequency of Linking to Pages on the Domain
If your home page and a few pieces of link-targeted content are earning external links while the rest of the site flounders in link poverty, that may be a signal to Google that although users like your site, they’re not particularly keen on the deep content – which is why the index may toss it out. - Content Uniqueness
Uniqueness is a constantly moving target and hard to nail down, but basically, if you don’t have a solid chunk of words and images that are uniquely found on one URL (ignoring scrapers and spam publishers), you’re at risk. Google likely runs a number of sophisticated calculations to help determine uniqueness, and they’re also, in my experience, much tougher on pages and sites that don’t earn high quantities of external links to their deep content with this analysis. - Visitor, CTR and Usage Data Metrics
If Google sees that clicks to your site frequently result in a click of a back button, a return to the SERPs and the selection of another result (or another query) in a very short time frame, that can be a negative signal. Likewise, metrics they gather from the Google toolbar, from ISP data and other web surfing analyses could enter into this mix. While CTR and usage metrics are noisy signals (one spammer with a Mechanical Turk account can swing the usage graph pretty significantly), they may be useful to decide which sites need higher levels of scrutiny. - Search Quality Rater Analysis + Manual Spam Reports
If your content is consistently reported as being low value or spam by users and or quality raters, expect a visit from the low indexation cap fairy. This may even be done on a folder-by-folder basis if certain portions of your site are particularly egregious while other material is index-worthy (and that phenomenon probably holds true for all of the criteria above as well).
Now let’s talk about some leading indicators that can help to show if you’re at risk:
- Deep pages rarely receive external links – if you’re producing hundreds or thousands of pages of new content and fewer than “dozens” earn any external link at all, you’re in a sticky situation. Sites like Wikipedia, the NYTimes, About.com, Facebook, Twitter and Yahoo! have millions of pages, but they also have dozens to hundreds of millions of links, and relatively few pages that have no external links. Compare that against your 10 million page site with 400K pages in the index (which is more pages than what Google reports indexing on Adobe.com, one of the best linked-to domains on the web).
- Deep pages don’t appear in Google Alerts – if Google Alerts is consistently passing you by (not reporting, this can be (but isn’t universally) an indication that they’re not perceiving your pages as being unique or worthy enough of the main index in the long run.
- Rate of crawling is slow – if you’re updating content, links and launching new pages multiple times per day, and Google’s coming by every week, you’re likely in trouble. XML Sitemaps might help, but it’s likely you’re going to need to improve some of those factors described above to get in good graces for the long term.
New Keyword Tools: Grouping and Niche Term Discovery
December 16, 2009
Wordstream has released two new “Free” Keyword Tools for both SEO and SEM uses. These tools are actually a bit more refined and produce, at a glance at least, effect results when you test it out.
1. Keyword Grouping – this tool allows you to dump a list of Keywords into the tool, and have it Grouped into the most common combined searches, and further spits out more related keyword streams that supposedly get traffic within that keyword group.
View screen shots below to see an example:
I basically took a list of the top entrance keywords that we lost the most visits for through Analytics, and dropped the list into the box.
Here are the results: You can see suggested groups and tail endings to discover long tail grouped keywords.
2. Keyword Niche Finder: This tool works similar to t he keyword grouping tool, but as opposed to submitting a list of keywords to group, you can submit a single Head Term type of Keyword to discover common tail endings, and more long tail niche keyword within each grouped tail ending of the main head term.
These tools seem to work pretty good for keyword discovery and grouping, but in order to have the Keyword Lists e-mailed back to you, or to filter the results, you need to sign up for a Paid Account.
Keyword Discovery for New Content
December 11, 2009
One effective way to discover new content ideas and keyword discovery for your websites SEO campaign is analyzing the entrance keywords for relatively new content that you add to your website. For example, articles that you loaded within the past 30 days that begin getting some good traction, it seems as if Google tests out your content for different keyword sets, and continues sending you traffic for keywords they seem match to your content more appropriately with, and stop sending you traffic for keywords they do not deem to be a good fit.
View the screen shots below to see an example of how this works.
Step 1 – View Top landing pages and look at the traffic traction of new content that you have added. Identify content that first picked up traffic then lost it. Compare 2-3 weeks to a month vs. previous time line.



Step 2 – Now view the entrance keywords, you’ll notice the first few pages are newer keywords you are getting traffic for, and the pages towards the end are keywords you lost traffic for. Look at those keywords to identify common phrases/head terms.
Step 3 – Check Traffic for head terms you identify
You can find several new head terms/phrases that recieve quite a bit of traffic and warrant individual topics to build content around through this process.
How fake sites trick search engines to hit the top
December 9, 2009
The key to getting ranked well in a search result is to go everything that a search engine bot would deem “quality content” which will give it a higher ranking.
In this example, Stickley created a fake site (creditunionofsc.org) with the consent of Credit Union of Southern California (cusocal.org), which focused on tricking the search engines into believing that the crawlers were scanning a legitimate site. All Stickley did was put link after link inside the site to create the appearance of “depth”, even if the links led to only the same picture of the credit union’s front page.
He ranked #2 on yahoo search and #1 on bing search (both already removed. However, he never made it past the 6th page on google (which does over 2/3rds of all search traffic in the US).
An extremely well trafficked site such as Bank of America would always outrank a fake site, but hackers have been known to hack into education websites such as university sites, stuff them with links to the scam site, and then via “link building”, make the search engine interpret the scam site as a legit site.
Original Source: http://news.yahoo.com/s/ap/us_tec_search_engine_safety
Google Search Engine Updates
November 24, 2009
Here is an overview of some recent updates and refinements to Google’s search engine. It seems as if Google is collecting more and more click data across many verticals, ramping up local results further with future monetization objectives in mind, and last but not least, dipping into the product search query market.
1. For both universal and vertical search terms, the search result page is starting to display the local 10 pack listing above organic results. Now, the top 75% of the 1st page is primarily Google directed clicks. This makes things more difficult for SEO campaigns and some industry experts such as Matt Cutts are suggesting focusing on more and more long tail searches would be the way to go. Google would not want to dilute long tail search queries with universal/local listings displayed ahead of organic results. This also signals the importance of being #1 for your main keywords on search result pages that primarily display the local pack ahead of organic listings. #2 wouldn’t do much good.
2. Google is moving more into the display product ad market. Their e Commerce search engine has just recently opened up and the similar concept that has been taken into effect with Book, Loan related searches (e.g. Lending Tree model), Google portal results will be displayed ahead of organic. With their display ads, if you search any product, not only will the eCommerce results be listed above organic, but on the right hand corner you will see actual images of the products in sponsored listings. Fortunately, this won’t effect lead generation SEO campaigns but there is speculation that Google is collecting more and more click data for lead gen verticals and will eventually move into that market.
3. Google Caffeine Put on Hold. Small elements of caffeine may have been implemented, but this should have little to no effect for most webmasters just yet. Matt Cutts announced Google will roll out Caffeine at one data center after the holidays, and slowly build it up over time. It is speculated that Google moving into more affiliate verticals such as product and lead gen, is a bigger threat than caffeine is.
source: seobook.com/blog
Google Adwords go Local!
November 23, 2009
They are including phone number, address and also maps. Please view http://searchengineland.com/google-appears-to-be-testing-new-local-adwords-presentation-30250
Advanced Filtering in Analytics
November 23, 2009
I just noticed this weekend that there is a new filtering feature in analytics that allows you to url by more than one include or exclude filter. Pretty handy when you’re trying to drill down into one group of pages and filter out exceptions.

Common SEO Problems and Tools to Solve Them
November 20, 2009
1. View Source OR Valuing a potential link
a. http://www.seomoz.org/mozbar
b. has this spiffy “Analyze Page” button that opens a visual overlay with critical stats like meta data, link counts, rel=”canonical,” Hx tags, and even counts of characters in content areas.
c. The mozRank and mozTrust factor help understand the value of a link
2. Determining a PageRank Penalty
a. http://www.seomoz.org/toolbox/pagerank
b. This tools give you a history of Page Ranks reported
c. When PageRank has been lowered more than one point, particularly in a timeframe that doesn’t correlate with a standard PR update, you can feel relatively confident that some sort of PR penalty was incurred.
d. When PR is significantly lower than mozRank, particularly on the homepage of a website, there’s a potential that a PR penalty may exist
3. Watching rankings over time
a. http://www.seomoz.org/rank-tracker OR http://www.advancedwebranking.com/
b. You can watch rankings across multiple engines and geographies, and the interface is simple + easy to use.
4. Comparing Page Metrics
a. http://www.seomoz.org/labs/lsvisualize
b. The visual shapes represent the degree to which the page is meeting that metric’s potential
5. Finding Competitors’ links
a. http://www.seomoz.org/labs/link-intersect
b. enter your site plus at least two competitors. The tool results will show you a list of domains that contain links to pages from your competitors but don’t point to you
6. Tracking links and mentions in the fresh web
a. http://www.seomoz.org/labs/blogscape
b. Gives a graph of what’s been happening in the blogosphere/twitosphere with a list of URLs where the action’s taking place
7. Backlink analysis
a. http://www.seomoz.org/labs/backlinks
b. Not only do you get a list of links ordered by relative importance in just a few seconds (slightly longer if the URL/domain has many thousands of links), you also retrieve an ordered list of anchor text distribution pointing to the page, subdomain or root domain.
8. Metrics from different sources
a. http://www.seomoz.org/trifecta
b. If need a long list of metrics from a variety of sources – Compete, Alexa, Google PageRank, Yahoo! Link Counts, Google News mentions, etc
9. Finding Competitors’ Most Successful Linkbait
a. http://www.seomoz.org/labs/toppages
b. Gives data about which pages on a given subdomain or root domain have earned the most links.
10. Identifying Pages that Can Flow Link Juice Internally
a. http://www.seomoz.org/labs/toppages
b. Not only can we see which pages have earned link juice, but we can also identify potential problems (302s and blocking w/ robots.txt being two of the big ones)
11. Social Media monitoring
a. http://www.seomoz.org/labs/blogscape_prototype
12. Find link search queries
a. http://www.seomoz.org/labs/link-finder/index.php
b. Enter a few pieces of data about your site and the link campaign you’re running and it will spit back links to tons of relevant search queries and link lists. While it doesn’t automate everything, it can also be a huge boost in exposing ways to find and earn links you might not have considered.
13. Determine a Keyword’s Relative SEO Competitiveness
a. http://www.seomoz.org/keyword-difficulty
b. Provides a quick view into metrics that have historically helped SEOs determine potential competitiveness, as well as a percentage score that gives a sense of relative competition level.
14. On Page Optimization
a. http://www.seomoz.org/term-target
b. just plug in the keyword you’re targeting and the page you want to rank and it sends back an analysis of the keyword usage, along with recommendations for where and how to employ the query term.
Source: http://www.seomoz.org/blog/30-seo-problems-the-tools-to-solve-them-part-1-of-2
Eight Local SEO Signals You May Not Have Considered
November 19, 2009
I’ve been flipping through all the interviews by Michael Gray, found here, and tried to pull out some ideas that aren’t already floating all around the web regarding local search signal and rankings. I came up with eight items I thought would provide both value for people landing on a local website or profile page, as well as adding a lot of unique local signals to tell google and others your site caters to locals.
1. Directions
Obviously, having your address and phone number seems to be one of the most common (and likely necessary) signals on a local page.
Beyond that, and interesting idea is to have more creative “human” directions like, across the street from McDonalds, or right next to the USCIS building on Sansome in San Francisco. As much as possible, try to correlate your business geographically to landmarks near it.
It might be helpful to have a dedicated directions page.
2. Hours of Operation
Just like your storefront or office door, list your operating hours. Unlike websites, local business open and close daily.
3. Emergency Contact Info
This was not in the list, but for criminal attorneys or others where urgent help might be required, it would be helpful for visitors to know how to contact you in emergency situations.
4. Local Affiliations
Add information regarding any local groups, chamber of commerce, bar association, clubs or any other local affiliations you may have. Not only is this another potential signal for search engines, but also establishes that you’re an active member of the searchers community.
5. Relevant Local Resources
Offer up some recommended local resources that your potential clients might find helpful. For example, a criminal attorney might recommend a bail bondsmen or location of the DA’s office. An auto accident or personal injury lawyer probably knows some good bodyshops or even doctors.
6. Local Events Your Business is Part of
Many local business partake in “main street” gatherings, or have local speaking engagements. List out the events you are, have been and are going to be part of.
7. Capture Niche Regional Phrases
Every geography has its own geographical lingo. If you’re in Santa Monica, you serve the “West Side”. Oakland lawyers service the “East Bay”. San Jose intellectual property lawyers are part of “Silicon Valley”. Honestly, I don’t have any data on how frequently people use terms like these when searching, but it’s one more way to establish your local “relevance”.
8. Include the Names of Local Cities and Towns in Which You Operate
Often times, a business may be located in “Bumsville”, but get’s most of it’s business from “The City”. Make sure to have all those local cities and towns included in your site content.
Local search is coming, so make sure your site or profile page is ready to take advantange of it.










