Monday, February 1, 2016

Interface of Serach

Search Interface

The search algorithm and search interface are used to find the most relevant document in the index based on the search query. First the search engine tries to determine user intent by looking at the words the searcher typed in.

These terms can be stripped down to their root level (e.g., dropping ing and other suffixes) and checked against a lexical database to see what concepts they represent. Terms that are a near match will help you rank for other similarly related terms. For example, using the word swims could help you rank well for swim or swimming.

Search engines can try to match keyword vectors with each of the specific terms in a query. If the search terms occur near each other frequently, the search engine may understand the phrase as a single unit and return documents related to that phrase.


WordNet is the most popular lexical database. At the end of this chapter there is a link to a Porter Stemmer tool if you need help conceptualizing how stemming works.

Searcher Feedback

Some search engines, such as Google and Yahoo!, have toolbars and systems like Google Search History and My Yahoo!, which collect information about a user. Search engines can also look at recent searches, or what the search process was for similar users, to help determine what concepts a searcher is looking for and what documents are most relevant for the user’s needs.

As people use such a system it takes time to build up a search query history and a click-through profile. That profile could eventually be trusted and used to

       aid in search personalization

       collect user feedback to determine how well an algorithm is working

       help search engines determine if a document is of decent quality (e.g., if many users visit a document and then immediately hit the back button, the search engines may not continue to score that document well for that query).

I have spoken with some MSN search engineers and examined a video about MSN search. Both experiences strongly indicated a belief in the importance of user acceptance. If a high-ranked page never gets clicked on, or if people typically quickly press the back button, that page may get demoted in the search results for that query (and possibly related search queries). In some cases, that may also flag a page or website for manual review.

As people give search engines more feedback and as search engines collect a larger corpus of data, it will become much harder to rank well using only links. The more satisfied users are with your site, the better your site will do as search algorithms continue to advance.

Real-Time versus Prior-to-Query Calculations

In most major search engines, a portion of the relevancy calculations are stored ahead of time. Some of them are calculated in real time.

Some things that are computationally expensive and slow processes, such as calculating overall inter-connectivity (Google calls this PageRank), are done ahead of time.

Many search engines have different data centers, and when updates occur, they roll from one data center to the next. Data centers are placed throughout the world to minimize network lag time. Assuming it is not overloaded or down for maintenance, you will usually get search results from the data centers nearest you. If those data centers are down or if they are experiencing heavy load, your search query might be routed to a different data center.

Search engines such as Google and Yahoo! may update their algorithm dozens of times per month. When you see rapid changes in your rankings, it is usually due to an algorithmic shift, a search index update, or something else outside of your control. SEO is a marathon, not a sprint, and some of the effects take a while to kick in.

Usually, if you change something on a page, it is not reflected in the search results that same day. Linkage data also may take a while to have an effect on search relevancy as search engines need to find the new links before they can evaluate them, and some search algorithms may trust links more as the links age.

The key to SEO is to remember that rankings are always changing, but the more you build legitimate signals of trust and quality, the more often you will come out on top.


The more times a search leads to desired content, the more likely a person is to use that search engine again. If a search engine works well, a person does not just come back, they also tell their friends about it, and they may even download the associated toolbar. The goal of all major search engines is to be relevant. If they are not, they will fade (as many already have).


Search engines make money when people click on the sponsored advertisements. In the search result below you will notice that both Viagra and Levitra are bidding on the term Viagra. The area off to the right displays sponsored advertisements for the term Viagra. Google gets paid whenever a searcher clicks on any of the sponsored listings.

The white area off to the left displays the organic (free) search results. Google does not get paid when people click on these. Google hopes to make it hard for search engine optimizers (like you and I) to manipulate these results to keep relevancy as high as possible and to encourage people to buy ads.

Later in this e-book we will discuss both organic optimization and pay-per-click marketing.


Index website

The Index

The index is where the spider-collected data are stored. When you perform a search on a major search engine, you are not searching the web, but the cache of the web provided by that search engine’s index.

Search engines organize their content in what is called a reverse index. A reverse index sorts web documents by words. When you search Google and it displays 1-10 out of 143,000 websites, it means that there are approximately 143,000 web pages that either have the words from your search on them or have inbound links containing them. Also, note that search engines do not store punctuation, just words.

The following is an example of a reverse index and how a typical search engine might classify content. While this is an oversimplified version of the real thing, it does illustrate the point. Imagine each of the following sentences is the content of a unique page:

The dog ate the cat.

The cat ate the mouse.

Since search engines view pages from their source code in a linear format, it is best to move JavaScript and other extraneous code to external files to help move the page copy higher in the source code.


Some people also use Cascading Style Sheets (CSS) or a blank table cell to place the page content ahead of the navigation. As far as how search engines evaluate what words are first, they look at how the words appear in the source code. I have not done significant testing to determine if it is worth the effort to make your unique

page code appear ahead of the navigation, but if it does not take much additional effort, it is probably worth doing. Link analysis (discussed in depth later) is far more important than page copy to most search algorithms, but every little bit can help.

Google has also hired some people from Mozilla and is likely working on helping their spider understand how browsers render pages. Microsoft published visually segmenting research that may help them understand what page content is most important.

As well as storing the position of a word, search engines can also store how the data are marked up. For example, is the term in the page title? Is it a heading? What type of heading? Is it bold? Is it emphasized? Is it in part of a list? Is it in link text?

Words that are in a heading or are set apart from normal text in other ways may be given additional weighting in many search algorithms. However, keep in mind that it may be an unnatural pattern for your keyword phrases to appear many times in bold and headings without occurring in any of the regular textual body copy. Also, if a page looks like it is aligned too perfectly with a topic (i.e., overly-focused so as to have an abnormally high keyword density), then that page may get a lower relevancy score than a page with a lower keyword density and more natural page copy.


By storing where the terms occur, search engines can understand how close one term is to another. Generally, the closer the terms are together, the more likely the page with matching terms will satisfy your query.

If you only use an important group of words on the page once, try to make sure they are close together or right next to each other. If words also occur naturally, sprinkled throughout the copy many times, you do not need to try to rewrite the content to always have the words next to one another. Natural sounding content is best.


Words that are common do not help search engines understand documents. Exceptionally common terms, such as the, are called stop words. While search engines index stop words, they are not typically used or weighted heavily to determine relevancy in search algorithms. If I search for the Cat in the Hat, search engines may insert wildcards for the words the and in, so my search will look like

* cat * * hat.

percentage of their page copy composed of a few keyword phrases. Thus, there is no magical page copy length that is best for all search engines.

The uniqueness of page content is far more important than the length. Page copy has three purposes above all others:

       To be unique enough to get indexed and ranked in the search result

       To create content that people find interesting enough to want to link to

       To convert site visitors into subscribers, buyers, or people who click on ads

Not every page is going to make sales or be compelling enough to link to, but if, in aggregate, many of your pages are of high-quality over time, it will help boost the rankings of nearly every page on your site.


Term Frequency (TF) is a weighted measure of how often a term appears in a document. Terms that occur frequently within a document are thought to be some of the more important terms of that document.

If a word appears in every (or almost every) document, then it tells you little about how to discern value between documents. Words that appear frequently will have little to no discrimination value, which is why many search engines ignore common stop words (like the, and, and or).

Rare terms, which only appear in a few or limited number of documents, have a much higher signal-to-noise ratio. They are much more likely to tell you what a document is about.

Inverse Document Frequency (IDF) can be used to further discriminate the value of term frequency to account for how common terms are across a corpus of documents. Terms that are in a limited number of documents will likely tell you more about those documents than terms that are scattered throughout many documents.

When people measure keyword density, they are generally missing some other important factors in information retrieval such as IDF, index normalization, word proximity, and how search engines account for the various element types. (Is the term bolded, in a header, or in a link?)

Search engines may also use technologies like latent semantic indexing to mathematically model the concepts of related pages. Google is scanning millions of books from university libraries. As much as that process is about helping people find information, it is also used to help Google understand linguistic patterns.

If you artificially write a page stuffed with one keyword or keyword phrase without adding many of the phrases that occur in similar natural documents you may not 

show up for many of the related searches, and some algorithms may see your document as being less relevant. The key is to write naturally, using various related terms, and to structure the page well.


Search engines may use multiple reverse indexes for different content. Most current search algorithms tend to give more weight to page title and link text than page copy.

For common broad queries, search engines may be able to find enough quality matching documents using link text and page title without needing to spend the additional time searching through the larger index of page content. Anything that saves computer cycles without sacrificing much relevancy is something you can count on search engines doing.

After the most relevant documents are collected, they may be re-sorted based on interconnectivity or other factors.

Around 50% of search queries are unique, and with longer unique queries, there is greater need for search engines to also use page copy to find enough relevant matching documents (since there may be inadequate anchor text to display enough matching documents).

Goal of Search Engines & How They Work

Goal of Search Engines & How They Work


Many people think search engines have a hidden agenda. This simply is not true. The goal of the search engine is to provide high-quality content to people searching the Internet.

Search engines with the broadest distribution network sell the most advertising space. As I write this, Google is considered the search engine with the best relevancy. Their technologies power the bulk of web searches.


The biggest problem new websites have is that search engines have no idea they exist. Even when a search engine finds a new document, it has a hard time determining its quality. Search engines rely on links to help determine the quality of a document. Some engines, such as Google, also trust websites more as they age.

The following bits may contain a few advanced search topics. It is fine if you do not necessarily understand them right away; the average webmaster does not need to know search technology in depth. Some might be interested in it, so I have written a bit about it with those people in mind. (If you are new to the web and uninterested in algorithms, you may want to skip past this to the search result image on page 35.)

I will cover some of the parts of the search engine in the next few pages while trying to keep it somewhat basic. It is not important that you fully understand all of it (in fact, I think it is better for most webmasters if they do not worry about things like Inverse Document Frequency, as I ranked well for competitive SEO-related terms without knowing anything about the technical bits of search); however, I would not feel right leaving the information out.



The phrase vector space model, which search algorithms still heavily rely upon today, goes back to the 1970s. Gerard Salton was a well-known expert in the field of information retrieval who pioneered many of today’s modern methods. If you are interested in learning more about early information retrieval systems, you may want to read A Theory of Indexing, which is a short book by Salton that describes many of the common terms and concepts in the information retrieval field.

Mike Grehan’s book, Search Engine Marketing: The Essential Best Practices Guide, also discusses some of the technical bits to information retrieval in more detail than this book. My book was created to be a current how-to guide, while his is geared more toward giving information about how information retrieval works.


While there are different ways to organize web content, every crawling search engine has the same basic parts:

       a crawler

       an index (or catalog)

       a search interface


The crawler does just what its name implies. It scours the web following links, updating pages, and adding new pages when it comes across them. Each search engine has periods of deep crawling and periods of shallow crawling. There is also a scheduler mechanism to prevent a spider from overloading servers and to tell the spider what documents to crawl next and how frequently to crawl them.

Rapidly changing or highly important documents are more likely to get crawled frequently. The frequency of crawl should typically have little effect on search relevancy; it simply helps the search engines keep fresh content in their index. The home page of CNN.com might get crawled once every ten minutes. A popular, rapidly growing forum might get crawled a few dozen times each day. A static site with little link popularity and rarely changing content might only get crawled once or twice a month.

The best benefit of having a frequently crawled page is that you can get your new sites, pages, or projects crawled quickly by linking to them from a powerful or frequently changing page.


Sunday, January 31, 2016

The Social Elements of Relevancysocial

The Social Elements of Relevancy


Hasil gambar untuk The Social Elements of Relevance



Since many of you who have bought will not read all of it, I need to make sure I deliver great value in the first few pages to ensure you get your money’s worth.

Relevancy is never static. Due to commercial market forces, search is CONSTANTLY broken. Thus, if you think of this e-book as a literal guide, it too will always be broken. Instead of thinking of the web and search in terms of algorithms it helps to think of the web as a large social network. Ask yourself questions like

       What are people talking about?

       What stories are spreading?

       Why are they spreading?

       Who is spreading them?

       How are they spreading them?

The reason search relies so heavily on the social elements is that page content and site structure are so easy to manipulate. It takes a mind well-tuned into marketing to be able to influence or manipulate people directly.


There are ways to fake authority, and when you are new it may make sense to push the envelope on some fronts. But invariably, anything that is widely manipulated is not a strong signal of authority.

Google wants to count real editorial votes. Consider the following:

       It is not common for news sites to link section-wide to an online bingo site.

       Most of the ads are irrelevant to the content of the pages.

       There are a large number of paid links right next to each other.

       The site has amazing authority.

Given all the above, it makes sense that Google would not want to count those links. When I posted about how overt that PageRank selling was, Matt Cutts, a leading Google engineer, hinted that Google had already taken care of not counting those links.

And since UPI is a slow moving, 100 year-old company, the fact that they are selling PageRank should also tell you that Google’s relevancy algorithms have moved far beyond just considering PageRank. I have PageRank 5 sites that get 100 times the traffic that some of my PageRank 7 sites do, because they have better content and a more natural link profile.

If you do buy links, think of the page as though you were an editor for a search engineer. Does the link look like it is a natural part of the page? Or is it an obviously purchased link?

What if instead of thinking of ways to try to create false authority, you looked at the web in terms of a social network, where the best ideas and the best marketed ideas spread? Now that might get you somewhere.


What if you are starting with nothing? Can you still compete? Of course you can.

At the end of 2002, I got kicked out of the military for using drugs. At that point, I was experiencing a number of things:

       Suicidal depression

       Financial bankruptcy, living on credit cards

       Social isolation

       Ignorance to the web, SEO, and marketing (slightly less serious, I know)

Within 4 years, I had pulled myself out of this emotional and psychological slump, and had achieved success:

       I was fairly knowledgeable about the web, SEO, and marketing.

       I had made lots of friends.

       I was getting mentioned in the Wall Street Journal (and many other newspapers).
       I was speaking at colleges about SEO (one college even wanted to hire me to become a professor).

       I had venture capitalists offering to invest in this site.

       I had a mainstream publisher offer to publish this book.

       I got married to the most wonderful woman in the world

What did I have that allowed me to do well? I had a passion for learning. That passion helped me attract great friends who took me under their wing and helped me far more than I could have ever expected. It takes time to do well, but if you keep pushing, keep learning, keep sharing, and are trying to help others, eventually you will do well on the web.

Many true web authorities started out as topical hubs. People who had no intent of creating a business would just freely talk about a subject they loved, and linked out to related websites they found useful. Every web marketer should read this post:

You become a platform worthy of attention by talking about others who are worthy of attention. Getting people to pay attention is a real cost. You have to get people to pay attention before you can extract value from your work.

To most people, the single most relevant and important thing in the world is themselves.

Here is a quote from Radiohead’s Meeting People is Easy:

If you have been rejected many times in your life, then one more rejection isn't going to make much difference. If you're rejected, don't automatically assume it's your fault. The other person may have several reasons for not doing what you are asking her to do: none of it may have anything to do with you. Perhaps the person is busy or not feeling well or genuinely not interested in spending time with you. Rejections are part of everyday life. Don't let them bother you. Keep reaching out to others. When you begin to receive positive responses then you are on the right track. It's all a matter of numbers. Count the positive responses and forget about the rejections.

You are not always going to be able to predict what will work and what doesn’t, but the more you keep learning and the more things you try the better the odds are that something will stick. Internet marketing is just like offline marketing, but cheaper, faster, and more scalable.

Social scientists have studied why things become popular, and many things are popular only because they are already popular. In Is Justin Timberlake a Product of Cumulative Advantage Duncan J. Watts wrote about how groups tended to like the same things, but random different things in each group. Even if success is random and unpredictable there is a self reinforcing effect to marketing. 

If you keep reaching out to people you will be successful. It might take 3 months. It might take 5 years. But eventually it will happen.


Saturday, January 30, 2016

Link Building Website

Make sure your site has something that other webmasters in your niche would be interested in linking to.

Create content that people will be willing to link at even if it is not directly easy to monetize. These linkworthy pages will lift the authority and rankings of all pages on your site.

When possible, get your keywords in the link text pointing to your site.

Register with, participate in, or trade links with topical hubs and related sites. Be in the discussion or at least be near the discussion.
Look for places to get high-quality free links from (like local libraries or chambers of commerce).
Produce articles and get them syndicated to more authoritative sites. Participate in forums to learn about what your potential consumers think is important.
Issue press releases with links to your site.
Leave glowing testimonials for people and products you really like.
Start an interesting and unique blog and write about your topics, products, news, and other sites in your community.
Comment on other sites with useful relevant and valuable comments. Sponsor charities, blogs, or websites related to your site.
Consider renting links if you are in an extremely competitive industry. Mix your link text up, if you can.


Survey your vertical and related verticals. What ideas/tools/articles have become industry standard tools or well-cited information? What ideas are missing?

Read Brett Tabke’s quick couple-page guide http://www.webmasterworld.com/forum3/2010.htm


Since many of you who have bought will not read all of it, I need to make sure I deliver great value in the first few pages to ensure you get your money’s worth.

Relevancy is never static. Due to commercial market forces, search is CONSTANTLY broken. Thus, if you think of this e-book as a literal guide, it too will always be broken. Instead of thinking of the web and search in terms of algorithms it helps to think of the web as a large social network. Ask yourself questions like

       What are people talking about?

       What stories are spreading?

       Why are they spreading?

       Who is spreading them?

       How are they spreading them?

The reason search relies so heavily on the social elements is that page content and site structure are so easy to manipulate. It takes a mind well-tuned into marketing to be able to influence or manipulate people directly.

There are ways to fake authority, and when you are new it may make sense to push the envelope on some fronts. But invariably, anything that is widely manipulated is not a strong signal of authority.

Here is an advertisement I found in Gmail (Google’s email service):

Google wants to count real editorial votes. Consider the following:

       It is not common for news sites to link section-wide to an online bingo site.

       Most of the ads are irrelevant to the content of the pages.

       There are a large number of paid links right next to each other.

       The site has amazing authority.

Given all the above, it makes sense that Google would not want to count those links. When I posted about how overt that PageRank selling was, Matt Cutts, a leading Google engineer, hinted that Google had already taken care of not counting those links.

And since UPI is a slow moving, 100 year-old company, the fact that they are selling PageRank should also tell you that Google’s relevancy algorithms have moved far beyond just considering PageRank. I have PageRank 5 sites that get 100 times the traffic that some of my PageRank 7 sites do, because they have better content and a more natural link profile.

If you do buy links, think of the page as though you were an editor for a search engineer. Does the link look like it is a natural part of the page? Or is it an obviously purchased link?

What if instead of thinking of ways to try to create false authority, you looked at the web in terms of a social network, where the best ideas and the best marketed ideas spread? Now that might get you somewhere.


Friday, January 29, 2016

SEO Quick-Start Checklist

SEO Quick-Start Checklist


Hasil gambar untuk SEO Quick-Start Checklist





Analyze your product. Are you interested in it yourself?

Analyze your market. Is it oversaturated? Is it growing or changing?
Is it easy to order your product from the web? Or are you selling commodity dog food that is expensive to ship?

What can you do to be unique in the market?

What creative and original ideas can you add to your site?


FOR NEW SITES: Ponder your domain name choice. Depending on your brand strategy, it should either be highly brandable or have your primary keywords in it. Consider buying different domain names for each targeted language or niche in your market.



Choose an ICANN accredited registrar. Register a .com as soon as possible.
Register a country’s top-level domain if your primary market is local in nature. Choose a host that supports the technology you will be using (ASP or PHP, etc.).



Use keyword tools and customer feedback to find the most targeted keyword phrases for your site.

Develop grouped themes of keywords that reflect the different sections of your site.

Keeping within a grouped theme, choose different keywords to target each page.


Put your chosen words for each page in your page title tags. Make sure your page title tag text is unique to each page.

Write a description for the meta description tag. Make sure your description is unique to each page.

Use only one H1 header per page, and target similar keyword phrases as the ones you targeted when writing the page title.

Use subheaders H2 and H3 on your page when necessary. Use bulleted lists and bolding to make content easier to read.
Make sure your text is written for human consumption—not bots.



Make sure your home page builds credibility and directs consumers to the most important parts of your site.

Target your most competitive keyword on your home page or a page that is well integrated into your site.


Link to major theme pages from your home page. Link to your home page from every sub page.


Use text-based navigation.

If you already have, or insist on using, graphic navigation, use descriptive alt text on the images, and link to every primary page from your sub pages in the footer of the sub pages.

Use descriptive keyword breadcrumb navigation. Make a site map.

Check the text that links pages of your site to make sure it’s descriptive whenever possible.

Link to resources outside your own site that improve each user's experience. Deep link to related articles and content from your page copy.

Rely as little as possible on the site navigation. Instead, guide your visitor through your site with links in the active content portion of the site.

Link to, and use, a cascading style sheet from every page.
Avoid duplicate content issues. Ensure that each page has significantly unique content that does not exist on other pages on your site or other sites.



Register your site with the major directories.

Register your site with a couple better second-tier directories.

Register with a couple local or niche-specific directories.



LAST UPDATE

PageRank Checker