Monday, February 1, 2016

Tags 2015

Meta Tags

Meta tags were used to help search engines organize the Web. Documents listed keywords and descriptions that were used to match user queries. Initially these tags were somewhat effective, but over time, marketers exploited them and they lost their relevancy.

People began to stuff incredibly large amounts of data (which was frequently off topic) into these tags to achieve high search engine rankings. Porn and other high-margin websites published meta tags like “free, free, free, free, Disney, free.” Getting a better ranking simply meant you repeated your keywords a few more times in the meta tags.

It did not help anything that during the first Web bubble stocks were based on eyeballs, not profits. That meant that people were busy trying to buy any type of exposure they could, which ended up making it exceptionally profitable to spam search engines to show off topic random banners on websites.

The Internet bubble burst. What caused such a fast economic recovery was the shift from selling untargeted ad impressions to selling targeted leads. This meant that webmasters lost much of their incentive for trying to get any kind of traffic they could. Suddenly it made far greater sense to try to get niche-targeted traffic.

In 1998, Overture pioneered the pay-per-click business model that most all major search engines rely on. Google AdWords enhanced the model by adding a few more variables to the equation—the most important one is factoring ad click-through rate (CTR) into the ad ranking algorithm.

Google extended the targeted advertisement marketing by delivering relevant contextual advertisements on publisher websites via the Google AdSense program.

More and more ad spending is coming online because it is easy to track the return on investment. As search algorithms continue to improve, the value of having well-cited, original, useful content increases daily.

Instead of relying exclusively on page titles and meta tags, search engine now index the entire page contents. Since search engines have been able to view entire pages, the hidden inputs (such as meta tags) have lost much of their importance in relevancy algorithms.

The best way for search engines to provide relevant results is to emulate a user and rank the page based on the same things the user see and do (Do users like this website? Do they quickly hit the back button?), and what other people are saying

about the document (For example, does anybody link to this page or site? Who is linking at it? What is the link text? And so on.).

Search engines make billions of dollars each year selling ads. Most search engine traffic goes to the free, organically listed sites. The ratio of traffic distribution is going to be keyword dependent and search engine dependent, but I believe about 85% of Google’s traffic clicks on the organic listings. Most other search engines display ads a bit more aggressively than Google does. In many of those search engines, organic listings get around 70% of the traffic. Some sites rank well on merit, while others are there due exclusively to ranking manipulation.

In many situations, a proper SEO campaign can provide a much greater ROI than paid ads do. This means that while search engine optimizers—known in the industry as SEOs—and search engines have business models that may overlap, they may also compete with one another for ad dollars. Sometimes SEOs and search engines are friends with each other, and, unfortunately, sometimes they are enemies.

When search engines return relevant results, they get to deliver more ads. When their results are not relevant, they lose market share. Beyond relevancy, some search engines also try to bias the search results to informational sites such that commercial sites are forced into buying ads.

I have had a single page that I have not actively promoted randomly send me commission checks for over $1,000. There is a huge sum of money in manipulating search results. There are ways to improve search engine placement that go with the goals of the search engines, and there are also ways that go against them. Quality SEOs aim to be relevant, whether or not they follow search guidelines.

Many effective SEO techniques may be considered somewhat spammy.

Like anything in life, you should make an informed decision about which SEO techniques you want to use and which ones you do not (and the odds are, you care about learning the difference, or you wouldn’t be reading this).

You may choose to use highly aggressive, “crash and burn” techniques, or slower, more predictable, less risky techniques. Most industries will not require extremely aggressive promotional techniques. Later on I will try to point out which techniques are which.

While there will always be ways to manipulate the search engines, there is no telling if you will eventually get caught and lose your rankings if you optimize your site

using overtly deceptive techniques. In any business such as SEO, there will be different risk levels.

Search engines try hard not to flag false positives (label good sites as spam), so there is usually a bunch of slack to play with, but many people also make common mistakes, like incorrectly using a 302 redirect, or not using specific page titles on their pages, or allowing spiders to index multiple URLs with the same content. If you are ever in doubt if you are making technical errors, feel free to search a few SEO forums or ask me.

The search engines aim to emulate users. If you design good content for users and build a smart linking campaign, eventually it will pay off.

New aggressive techniques pop up all the time. As long as they are available, people will exploit them. People will force the issue until search engines close the loophole, and then people will find a new one. The competitive nature of web marketing forces search engines to continuously improve their algorithms and filters.

In my opinion, the ongoing effort of keeping up with the latest SEO tricks is usually not worth it for most webmasters. Some relational database programmers and people with creative or analytical minds may always be one step ahead, but the average business owner probably does not have the time to dedicate to keeping up with the latest tricks.

Tying ethics to SEO techniques is a marketing scam. Either a technique is effective, or it is not. There is nothing unethical about being aggressive. You probably do not want to take big risks with domains you cannot afford to have blacklisted, but there is nothing wrong with owning a few test sites.

Some sites that are not aggressively promoted still fall out of favor on occasion. As a webmaster following Google’s guidelines, you still can not expect Google to owe you free traffic. You have to earn it by making others cite your website.

The effects of SEO do take time to kick in. At any given time, considering how dynamically the web changes, there will be some holes in search algorithms that make certain SEO techniques exceptionally effective.

I have spoken with current search engine engineers working at major search engines in regards to this e-book. I also have spoken with database programmers who later became some of the world’s most technically advanced SEOs. Some of these programmers have told me what some would consider tricks that work really well, but they only work really well because few people know about them.

I do not try to promote the latest search spamming techniques in this e-book for the following reasons:

• They are the most likely to quickly change. Some things that are cutting-edge and effective today can become ineffective and actually hurt you tomorrow.

• Some of them can be damaging to your brand.

• Aggressive techniques are the some of the most likely techniques to get your site banned.

• Some things are told to me as a secret, and if they are made openly available to anyone (including search engine engineers—some who have read this e-book), then they lose their value, and I lose my friends and resources.

• I do not have a lot of experience with exceptionally aggressive promotional techniques, as I have not needed them to rank well in most the markets I worked in.

• People who use aggressive techniques are not evil or bad, but I cannot possibly put accurate, current, useful, and risky information out to everyone in an e-book format and expect it to not cause problems for some people.

• To me, effective web promotion is balancing risk versus reward. SEOBook.com got on the first page of Google for SEO within nine months of making the site, with less than $5,000 spent on promotion. Most sites do not need to use overly aggressive and risky promotional techniques. SEO works so well because most sites on the web do not actively practice effective SEO.

The Web Origins

6:55:00 PM Common SEO No comments

Origins of the Web

The Web started off behind the idea of the free flow of information as envisioned by Tim Berners-Lee. He was working at CERN in Europe. CERN had a somewhat web-like environment in that many people were coming and going and worked on many different projects.

Tim created a site that described how the Web worked and placed it live on the first server at info.cern.ch. Europe had very little backing or interest in the Web back then, so U.S. colleges were the first groups to set up servers. Tim added links to their server locations from his directory known as the Virtual Library.

Current link popularity measurements usually show college web pages typically have higher value than most other pages do. This is simply a function of the following:

• The roots of the WWW started in lab rooms at colleges. It was not until the mid to late 1990s that the Web became commercialized.

• The web contains self-reinforcing social networks.

• Universities are pushed as sources of authority.

• Universities are heavily funded.

• Universities have quality controls on much of their content.

The Web did not have sophisticated search engines when it began. The most advanced information gatherers of the day primitively matched file names. You had to know the name of the file you were looking for to find anything. The first file that matched was returned. There was no such thing as search relevancy. It was this lack of relevancy that lead to the early popularity of directories such as Yahoo!.

Many search engines such as AltaVista, and later Inktomi, were industry leaders for a period of time, but the rush to market and lack of sophistication associated with search or online marketing prevented these primitive machines from having functional business models.

Overture was launched as a pay-per-click search engine in 1998. While the Overture system (now known as Yahoo! Search Marketing) was profitable, most portals were still losing money. The targeted ads they delivered grew in popularity and finally created a functional profit generating business model for large-scale general search engines.

As the Internet grew in popularity, people realized it was an incredibly cheap marketing platform. Compare the price of spam (virtually free) to direct mail (~ $1 each). Spam fills your inbox and wastes your time.

Information retrieval systems (search engines) must also fight off aggressive marketing techniques to keep their search results relevant. Search engines market their problems as spam, but the problem is that they need to improve their algorithms.

It is the job of search engines to filter through the junk to find and return relevant results.

There will always be someone out there trying to make a quick buck. Who can fault some marketers for trying to find holes in parasitic search systems that leverage others’ content without giving any kickback?

Though I hate to quote a source I do not remember, I once read that one in three people believe the top search result is the most relevant document relating to their search. Imagine the power associated with people finding your view of the world first. Whatever you are selling, someone is buying!

I have been quoted as a source of information on Islam simply because I wrote about a conversation I had with a person from Kuwait who called me for help on the web. I know nothing about Islam, but someone found my post in a search engine…so I was quoted in their term paper. College professors sourced some sites I am embarrassed to admit I own.

Sometimes good things happen to you and sometimes the competition gets lucky. Generally the harder you work, and the more original and useful your site is, the more often you will get lucky.

As easy as it is to get syndicated with useful interesting and unique information, it is much harder to get syndicated with commercial ideas, especially if the site does not add significant value to a transaction. Often times links associated with commercial sites are business partnerships.

Many people do well to give information away and then attach a product to their business model. You probably would have never read this e-book if I did not have a blog associated with it. On the same note, it would also be significantly easier for me to build links to SEOBook.com if I did not sell this e-book on it.

Depending on your skills, faults, and business model, sometimes it is best to make your official voice one site and then sell stuff on another, or add the commercial elements to the site after it has gained notoriety and trust. Without knowing you, it is hard to advise you which road to take, but if you build value before trying to extract profits, you will do better than if you do it the other way around.

If my site was sold as being focused on search and I wrote an e-book or book about power searching, it would be far easier for me to get links than running a site about SEO. For many reasons, the concept of SEO is hated in many circles. The concept of search is much easier to link at.

Sometimes by broadening, narrowing, or shifting your topic it becomes far easier for people to reference you.

As the Web grew, content grew faster than technology did. The primitive nature of search engines promoted the creation of content, but not the creation of quality content. Search engines had to rely on the documents themselves to state their purpose. Most early search engines did not even use the full page content either, relying instead on page title and document name to match results. Then came along meta tags.

Interface of Serach

6:53:00 PM Common SEO No comments

Search Interface

The search algorithm and search interface are used to find the most relevant document in the index based on the search query. First the search engine tries to determine user intent by looking at the words the searcher typed in.

These terms can be stripped down to their root level (e.g., dropping ing and other suffixes) and checked against a lexical database to see what concepts they represent. Terms that are a near match will help you rank for other similarly related terms. For example, using the word swims could help you rank well for swim or swimming.

Search engines can try to match keyword vectors with each of the specific terms in a query. If the search terms occur near each other frequently, the search engine may understand the phrase as a single unit and return documents related to that phrase.

WordNet is the most popular lexical database. At the end of this chapter there is a link to a Porter Stemmer tool if you need help conceptualizing how stemming works.

Searcher Feedback

Some search engines, such as Google and Yahoo!, have toolbars and systems like Google Search History and My Yahoo!, which collect information about a user. Search engines can also look at recent searches, or what the search process was for similar users, to help determine what concepts a searcher is looking for and what documents are most relevant for the user’s needs.

As people use such a system it takes time to build up a search query history and a click-through profile. That profile could eventually be trusted and used to

• aid in search personalization

• collect user feedback to determine how well an algorithm is working

• help search engines determine if a document is of decent quality (e.g., if many users visit a document and then immediately hit the back button, the search engines may not continue to score that document well for that query).

I have spoken with some MSN search engineers and examined a video about MSN search. Both experiences strongly indicated a belief in the importance of user acceptance. If a high-ranked page never gets clicked on, or if people typically quickly press the back button, that page may get demoted in the search results for that query (and possibly related search queries). In some cases, that may also flag a page or website for manual review.

As people give search engines more feedback and as search engines collect a larger corpus of data, it will become much harder to rank well using only links. The more satisfied users are with your site, the better your site will do as search algorithms continue to advance.

Real-Time versus Prior-to-Query Calculations

In most major search engines, a portion of the relevancy calculations are stored ahead of time. Some of them are calculated in real time.

Some things that are computationally expensive and slow processes, such as calculating overall inter-connectivity (Google calls this PageRank), are done ahead of time.

Many search engines have different data centers, and when updates occur, they roll from one data center to the next. Data centers are placed throughout the world to minimize network lag time. Assuming it is not overloaded or down for maintenance, you will usually get search results from the data centers nearest you. If those data centers are down or if they are experiencing heavy load, your search query might be routed to a different data center.

Search engines such as Google and Yahoo! may update their algorithm dozens of times per month. When you see rapid changes in your rankings, it is usually due to an algorithmic shift, a search index update, or something else outside of your control. SEO is a marathon, not a sprint, and some of the effects take a while to kick in.

Usually, if you change something on a page, it is not reflected in the search results that same day. Linkage data also may take a while to have an effect on search relevancy as search engines need to find the new links before they can evaluate them, and some search algorithms may trust links more as the links age.

The key to SEO is to remember that rankings are always changing, but the more you build legitimate signals of trust and quality, the more often you will come out on top.

The more times a search leads to desired content, the more likely a person is to use that search engine again. If a search engine works well, a person does not just come back, they also tell their friends about it, and they may even download the associated toolbar. The goal of all major search engines is to be relevant. If they are not, they will fade (as many already have).

Search engines make money when people click on the sponsored advertisements. In the search result below you will notice that both Viagra and Levitra are bidding on the term Viagra. The area off to the right displays sponsored advertisements for the term Viagra. Google gets paid whenever a searcher clicks on any of the sponsored listings.

The white area off to the left displays the organic (free) search results. Google does not get paid when people click on these. Google hopes to make it hard for search engine optimizers (like you and I) to manipulate these results to keep relevancy as high as possible and to encourage people to buy ads.

Later in this e-book we will discuss both organic optimization and pay-per-click marketing.

Index website

6:51:00 PM Common SEO No comments

The Index

The index is where the spider-collected data are stored. When you perform a search on a major search engine, you are not searching the web, but the cache of the web provided by that search engine’s index.

Search engines organize their content in what is called a reverse index. A reverse index sorts web documents by words. When you search Google and it displays 1-10 out of 143,000 websites, it means that there are approximately 143,000 web pages that either have the words from your search on them or have inbound links containing them. Also, note that search engines do not store punctuation, just words.

The following is an example of a reverse index and how a typical search engine might classify content. While this is an oversimplified version of the real thing, it does illustrate the point. Imagine each of the following sentences is the content of a unique page:

The dog ate the cat.

The cat ate the mouse.

Since search engines view pages from their source code in a linear format, it is best to move JavaScript and other extraneous code to external files to help move the page copy higher in the source code.

Some people also use Cascading Style Sheets (CSS) or a blank table cell to place the page content ahead of the navigation. As far as how search engines evaluate what words are first, they look at how the words appear in the source code. I have not done significant testing to determine if it is worth the effort to make your unique

page code appear ahead of the navigation, but if it does not take much additional effort, it is probably worth doing. Link analysis (discussed in depth later) is far more important than page copy to most search algorithms, but every little bit can help.

Google has also hired some people from Mozilla and is likely working on helping their spider understand how browsers render pages. Microsoft published visually segmenting research that may help them understand what page content is most important.

As well as storing the position of a word, search engines can also store how the data are marked up. For example, is the term in the page title? Is it a heading? What type of heading? Is it bold? Is it emphasized? Is it in part of a list? Is it in link text?

Words that are in a heading or are set apart from normal text in other ways may be given additional weighting in many search algorithms. However, keep in mind that it may be an unnatural pattern for your keyword phrases to appear many times in bold and headings without occurring in any of the regular textual body copy. Also, if a page looks like it is aligned too perfectly with a topic (i.e., overly-focused so as to have an abnormally high keyword density), then that page may get a lower relevancy score than a page with a lower keyword density and more natural page copy.

By storing where the terms occur, search engines can understand how close one term is to another. Generally, the closer the terms are together, the more likely the page with matching terms will satisfy your query.

If you only use an important group of words on the page once, try to make sure they are close together or right next to each other. If words also occur naturally, sprinkled throughout the copy many times, you do not need to try to rewrite the content to always have the words next to one another. Natural sounding content is best.

Words that are common do not help search engines understand documents. Exceptionally common terms, such as the, are called stop words. While search engines index stop words, they are not typically used or weighted heavily to determine relevancy in search algorithms. If I search for the Cat in the Hat, search engines may insert wildcards for the words the and in, so my search will look like

* cat * * hat.

percentage of their page copy composed of a few keyword phrases. Thus, there is no magical page copy length that is best for all search engines.

The uniqueness of page content is far more important than the length. Page copy has three purposes above all others:

• To be unique enough to get indexed and ranked in the search result

• To create content that people find interesting enough to want to link to

• To convert site visitors into subscribers, buyers, or people who click on ads

Not every page is going to make sales or be compelling enough to link to, but if, in aggregate, many of your pages are of high-quality over time, it will help boost the rankings of nearly every page on your site.

Term Frequency (TF) is a weighted measure of how often a term appears in a document. Terms that occur frequently within a document are thought to be some of the more important terms of that document.

If a word appears in every (or almost every) document, then it tells you little about how to discern value between documents. Words that appear frequently will have little to no discrimination value, which is why many search engines ignore common stop words (like the, and, and or).

Rare terms, which only appear in a few or limited number of documents, have a much higher signal-to-noise ratio. They are much more likely to tell you what a document is about.

Inverse Document Frequency (IDF) can be used to further discriminate the value of term frequency to account for how common terms are across a corpus of documents. Terms that are in a limited number of documents will likely tell you more about those documents than terms that are scattered throughout many documents.

When people measure keyword density, they are generally missing some other important factors in information retrieval such as IDF, index normalization, word proximity, and how search engines account for the various element types. (Is the term bolded, in a header, or in a link?)

Search engines may also use technologies like latent semantic indexing to mathematically model the concepts of related pages. Google is scanning millions of books from university libraries. As much as that process is about helping people find information, it is also used to help Google understand linguistic patterns.

If you artificially write a page stuffed with one keyword or keyword phrase without adding many of the phrases that occur in similar natural documents you may not

show up for many of the related searches, and some algorithms may see your document as being less relevant. The key is to write naturally, using various related terms, and to structure the page well.

Search engines may use multiple reverse indexes for different content. Most current search algorithms tend to give more weight to page title and link text than page copy.

For common broad queries, search engines may be able to find enough quality matching documents using link text and page title without needing to spend the additional time searching through the larger index of page content. Anything that saves computer cycles without sacrificing much relevancy is something you can count on search engines doing.

After the most relevant documents are collected, they may be re-sorted based on interconnectivity or other factors.

Around 50% of search queries are unique, and with longer unique queries, there is greater need for search engines to also use page copy to find enough relevant matching documents (since there may be inadequate anchor text to display enough matching documents).

Goal of Search Engines & How They Work

6:49:00 PM Common SEO No comments

Goal of Search Engines & How They Work

Many people think search engines have a hidden agenda. This simply is not true. The goal of the search engine is to provide high-quality content to people searching the Internet.

Search engines with the broadest distribution network sell the most advertising space. As I write this, Google is considered the search engine with the best relevancy. Their technologies power the bulk of web searches.

The biggest problem new websites have is that search engines have no idea they exist. Even when a search engine finds a new document, it has a hard time determining its quality. Search engines rely on links to help determine the quality of a document. Some engines, such as Google, also trust websites more as they age.

The following bits may contain a few advanced search topics. It is fine if you do not necessarily understand them right away; the average webmaster does not need to know search technology in depth. Some might be interested in it, so I have written a bit about it with those people in mind. (If you are new to the web and uninterested in algorithms, you may want to skip past this to the search result image on page 35.)

I will cover some of the parts of the search engine in the next few pages while trying to keep it somewhat basic. It is not important that you fully understand all of it (in fact, I think it is better for most webmasters if they do not worry about things like Inverse Document Frequency, as I ranked well for competitive SEO-related terms without knowing anything about the technical bits of search); however, I would not feel right leaving the information out.

The phrase vector space model, which search algorithms still heavily rely upon today, goes back to the 1970s. Gerard Salton was a well-known expert in the field of information retrieval who pioneered many of today’s modern methods. If you are interested in learning more about early information retrieval systems, you may want to read A Theory of Indexing, which is a short book by Salton that describes many of the common terms and concepts in the information retrieval field.

Mike Grehan’s book, Search Engine Marketing: The Essential Best Practices Guide, also discusses some of the technical bits to information retrieval in more detail than this book. My book was created to be a current how-to guide, while his is geared more toward giving information about how information retrieval works.

While there are different ways to organize web content, every crawling search engine has the same basic parts:

• a crawler

• an index (or catalog)

• a search interface

The crawler does just what its name implies. It scours the web following links, updating pages, and adding new pages when it comes across them. Each search engine has periods of deep crawling and periods of shallow crawling. There is also a scheduler mechanism to prevent a spider from overloading servers and to tell the spider what documents to crawl next and how frequently to crawl them.

Rapidly changing or highly important documents are more likely to get crawled frequently. The frequency of crawl should typically have little effect on search relevancy; it simply helps the search engines keep fresh content in their index. The home page of CNN.com might get crawled once every ten minutes. A popular, rapidly growing forum might get crawled a few dozen times each day. A static site with little link popularity and rarely changing content might only get crawled once or twice a month.

The best benefit of having a frequently crawled page is that you can get your new sites, pages, or projects crawled quickly by linking to them from a powerful or frequently changing page.

SEOANDA

Search Engine Optimization SEO Learn Domain Hosting Blogging And How To Succed Pagerank

Monday, February 1, 2016

Tags 2015

The Web Origins

Interface of Serach

Index website

Goal of Search Engines & How They Work

LAST UPDATE

Popular Posts

Blog Archive

Label

Translate

Search This Blog

Labels

Search This Blog