Advertisements
News Ticker

Learn Everything About Website Crawling, Indexing, Ranking And Rendering

have a sound knowledge of what crawling, Indexing, ranking and rendering means . You will also learn how crawling, rendering and indexing affect Google and Bing Seo .

In Episode 5 of my blogging series, I promised to cover how to make your blog known to robots and as well popular amongst humans. However, it is imperative you have an idea of what crawling, indexing, ranking and rendering are all about so as to enjoy the golden episode 6 .

In summary, this is a prerequisite for “learn blogging and seo Episode 6”. You may want to follow the blogging series by clicking here .

Now, if you are yet to jump into the blogging world, click here to start your personal blog . Also, if you have up to $35 or #15,000 but having issues starting a blog, feel free to drop your comment or click here to visit the wordpress blog designing and hosting episode . I will help you start your blog today.

Back to the topic, “The difference between crawling, indexing, ranking and rendering”. What really are these terms about and does the knowledge help me? You will see them right here. But, note that this topic is more of technical concepts and as such may introduce you to strange terms. However, I will try to simple things as much as I can so that life will be easy for you.
Crawling, Ranking, Indexing And Rendering

Crawling?

Talking about crawling, spiders and other crawling creatures come to mind right? You are not wrong at all. However, this topic takes you to another form of crawling mechanism.

Now, what Is Crawling? Crawling is the process by which search engines discover updated content on the web, such as new blogs or pages, changes to existing sites or blog posts, and dead links.

Let me simplify this; crawling is the process whereby search engines like Google, bing and yahoo uses their crawler or bots to go through the web to find new posts, new blogs, new links and updated posts and as well update the cache version they have.

After crawling your pages, the Google crawler will now decide whether to index it or not. But when you use ROBOTS.TXT to stop bots from crawling your webpage , then they will not be able to search, index or rank your posts. A program that can be referred to as a ‘crawler’, ‘bot’ or ‘spider’ is used to your blog pages.

INDEXING

What Is Indexing? Once a search engine processes each of the pages it crawls, it compiles a massive index of all the words it sees and their location on each page. It is essentially a database of billions of web pages.

This extracted content is then stored, with the information then organised and interpreted by the search engine’s algorithm to measure its importance compared to similar pages.

Servers based all around the world allow users to access these pages almost instantaneously. Storing and sorting this information requires significant space and both Microsoft and Google have over a million servers each.

INDEXING

What Is Ranking? Everyday, you say things like, “I want to rank higher on Google and Bing search results”.. Do you really know what ranking is about and how Google ranks Flashlearners? Let’s see…

Once a keyword is entered into a search box, search engines will check for pages within their index that are a closest match; a score will be assigned to these pages based on an algorithm consisting of hundreds of different ranking signals .

These pages (or images & videos) will then be displayed to the user in order of score.

So in order for your site to rank well in search results pages, it’s important to make sure

search engines can crawl and index your site correctly – otherwise they will be unable to appropriately rank your website’s content in search results.

What Is Googlebot, Crawling, And Indexing?

Kissmetrics was able to simplify terms as regards Googlebot crawling and Indexing. If Googlebot hasn’t crawled and indexed this Flashlearners page, you won’t have been able to see it as a result in Google search. It is crawling and indexing that makes your blog visible in Search Engines.

The Googlebot is simply the search bot software that Google sends out to collect information about documents on the web to add to Google’s searchable index.

Crawling is the process where the Googlebot goes around from website to website, finding new and updated information to report back to Google. The Googlebot finds what to crawl using links.

Indexing is the processing of the information gathered by the Googlebot from its crawling activities. Once documents are processed, they are added to Google’s searchable index if they are

determined to be quality content. During indexing, the Googlebot processes the words on a page and where those words are located. Information such as title tags and ALT attributes are also analyzed during indexing.

RENDERING:

What is Rendering? Rendering displays what you see on your screen while surfing the internet. It communicates with the networking layer of the browser to grab HTML code and other items passed from a remote server. The majority of web pages crawled are now being rendered by Google. This page you are reading now is a rendered webpage.

Most time, in fact all the time, Google bots blocks Adsense codes and so it cannot be rendered. This can be shown when you fetch and render your webpage using the

Google Search Console or

Webmaster tool.

How A Web Page Is Rendered

With every passing day, you search things on Google and Bing, you get results and then click to view the results. Then flashlearners opens and you read what you have searched. Nice one right?, but have you ever wondered the process it takes for a webpage to open (render)? That what you are about to learn in the steps below as adapted from Friendlybit:

  1. There is need to search for something in flashlearners and you quickly type an

URL into address bar of your Opera Mini, Internet Explorer or Google Chrome.

  1. The browser parses the address you entered to find the protocol, host, port, and path.
  2. It forms a HTTP request and to reach the host, it first needs to translate the human readable host into an IP number , and it does this by doing a DNS lookup on the host

  3. Then a socket needs to be opened from the user’s computer to that IP number, on the port specified (most often port 80)

  4. When a connection is open, the HTTP request is sent to the host

  5. The host forwards the request to the server software (most often Apache) configured to listen on the specified port

  6. The server inspects the request (most often only the path), and launches the server plugin needed to handle the request (corresponding to the server language you use, PHP, Java, .NET, Python?)

  7. The plugin gets access to the full request, and starts to prepare a HTTP response.

  8. To construct the response a database is (most likely) accessed . A database search is made, based on parameters in the path (or data) of the request

  9. Data from the database, together with other information the plugin decides to add, is combined into a long string of text (probably HTML).

  10. The plugin combines that data with some meta data (in the form of HTTP headers), and sends the HTTP response back to the browser.

  11. The browser receives the response, and parses the HTML (which with 95% probability is broken) in the response

  12. A DOM tree is built out of the broken HTML

  13. New requests are made to the server for each new resource that is found in the HTML source (typically images, style sheets, and JavaScript files). Go back to step 3 and repeat for each resource.

  14. Stylesheets are parsed, and the rendering information in each gets attached to the matching node in the DOM tree

  15. Javascript is parsed and executed , and DOM nodes are moved and style information is updated accordingly

  16. The browser renders the page on the screen according to the DOM tree and the style information for each node

  17. You see the page on the screen

  18. You get annoyed the whole process was too slow.

The 7 Key Components Of Your Web Browser

Path interactive has the following to say as regards the 7 components of your web browser. You will really enjoy it.

  1. Layout Engine: This is what you click or type into to expect a result. For example, you type into search box or the space to enter your web address and it passes to the rendering engine. This is what the layout engine does.
  • Rendering Engine: This converts a mere code into beautiful pictures and visual displays.

  • User Interface: This is what you see while using a browser. It is the interface through which you communicate with your browser. It is in the user interface you search for things or check your bookmarks and browsing history.

  • JavaScript Engine: This engine takes JavaScript code, parses it, executes it, and returns the results.

  • Network Layer – This is a function of the browser that happens behind the scenes and handles network functions such as encryption, http and ftp requests, and all network settings such as timeouts and the handling of HTTP status codes

  • Storage – Browser’s must store some data which can include cached files, cookies, and recently browsers have updated to be able to store data and objects created with JavaScript

  • Operating System Interface – The browser must interact with the operating system to draw out several elements of the page like drop down boxes and the chrome of a window (close, maximize, and minimize buttons).

  • What Does This Mean For SEO?

    The fact that Google looks at the fully rendered version of a webpage means that you can no longer look solely at the source code of a site to understand how it is perceived by a search engine spider. You should assume that search engine spiders see the same page you see in your browser as it appears on page load. Here are some examples of when this matters:

    SUMMARY : Google crawls your site and then indexes what it sees as a cached version of the page. The page may change in design before they crawl and then reindex the new page hence the term cache is used as almost a caveat to say that the page may’ve changed since they crawled it.

    If your webpages aren’t crawled then they can’t be indexed. Making sure your site can be crawled by bots is a priority. Set up a

    Google Webmaster Tools account and then submit a XML sitemap to help Google crawl and index it.

    Advertisements

    NGstudents Team Cares….. 

    About Amakvitaa (5794 Articles)
    Web Developer, Educationalist And A Lover Of The Nigerian Students

    Say Something !! (Comment)

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out /  Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s

    %d bloggers like this: