Monday, July 16, 2007

Getting Started

To get started first we will need to learn how browsers get web pages. Then we will need to set up our test machine (virtual machine) and load several different browsers in it. Let’s start with how browsers get a web page by learning its protocols.

HTTP Protocol

What’s HTTP

Before trying to start troubleshooting browser issues we should take a quick look at the HTTP. HTTP is an acronym for Hypertext Transfer Protocol. This is the basic protocol that all browsers use to sent request and receive responds to display web pages. This request/response happens between a client (the web browser) and the web server. The client, sends a request to the web server in the form of a URL (Uniform Resource Locator), for example http://www.mydigitalsplendor.com. The web server takes that request and response by giving the browser that page (see figure 1.1).

Figure 1.1

HTTP is a stateless protocol, which means that once the Request and Response is done, the server has no idea what is on the clients browser and the browser has no idea the server exist, till the next request.

HTTP Cookies

HTTP cookies, sometimes known as web cookies or just cookies, are small blocks of text sent by a server to a web browser and then sent back unchanged by the browser each time it accesses that server. HTTP cookies are used for authenticating, tracking, and maintaining specific information about users, such as site preferences and the contents of their electronic shopping carts.

Allowing users to log in to a website is another use of cookies. Users typically log in by inserting their credentials into a login page; cookies allow the server to know that the user is already authenticated, and therefore is allowed to access services or perform operations that are restricted to logged-in users.

Another use for cookies is to maintain a session state with the web browser. But wait a minute, you just said that HTTP was stateless, so what is this session state stuff. You see web application Frameworks like ASP.NET (what Online Banking uses) or PHP use mechanisms for storing information session information, and give that information a unique id or session state id as a method for working around the fact that HTTP is stateless.

So what does a cookie look like:

Set-Cookie: SessionId=732423sdfs73242; expires=Fri, 13-Jul-2007 23:59:59 GMT; path=/; domain=mydigitalsplendor.com;

HTTP Headers

HTTP headers are how HTTP handles the request/response nature of the protocols. Both web browsers and web servers use Headers to communicate what they want and what they are giving each other. For example a request from a web browser will look like the following:

GET / HTTP/1.1
Host: www.mydigitalsplendor.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Accept: application/x-shockwave-flash,text/xml,application/xml,application/
xhtml+xml,text/html;q=0.9,
text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

 

Okay, great, that’s what it looks like but what does it mean. Well let’s go through each field and talk about what they mean and do.

GET / HTTP/ 1.1 : Hey web server, I want something and this is what protocol I’m using to communicate with you. host: www.mydigitalsplendor.com: This is where I want the page from. The server is set to serve up a specific page when just the domain is given as in this example, but we could ask for www.mydigitalsplendor.com/blog/default.aspx and get the same thing. Now the rest of the Header is telling the server what the client it and what it can do.

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4: this is simply the type of browser you are using and what operation system you are on. So from this User-Agent information, we can see that the user is on Windows XP, they are using an US version of windows, and they are Using Firefox 2.0.0.4 with a version that is using the Gecko html rendering engine.

Accept: application/x-shockwave-flash,text/xml,application/xml,app-lication/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
This is all the things the browser is capable of doing. For example, this one says it can handle, flash, xml, xhtml, html, plain text and images.

Accept-Language: en-us,en;q=0.5 Is just what you think it is, it tells the server what a language the browser is set to use. So from this line we can see that the language being used here is English, however it also tells us the country as well, so not only is it English, but it is English spoken in the US. This is helpful to the server to allow it to server correct content. A good example of this is currency. If the server sees this field and its EG-US, it uses the US Dollars as the currency to calculate, however if its EG-GB (Great Britain) it will use the British Pound for the currency.

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7: This is what character set the browser is using. This one is especial helpful when you see a square box or question mark instead of the letter "A". IS0-8859 is character set known as western. UTF-8 is the Unicode 8 bit character set. So this browser is telling us it can handle both of them. Keep-Alive: 300: This one simply says, this is how long I will keep the connection open waiting for you to give me a response to my request.

Connection: keep-alive: This means that the browser will keep the connection alive waiting for the response to the request.

Now that we've sent our request, the server responds with this HTTP Header which looks like the follow:

HTTP/1.x 200 OK
Cache-Control: private
Date: Fri, 13 Jul 2007 14:58:43 GMT
Content-Type: text/html
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Encoding: gzip
Vary: Accept-Encoding
Transfer-Encoding: chunked
Let's go over the response like we did the request. It's full of HTTP goodness. so let's check it out.

HTTP/1.x 200 OK : This is the server saying, hey, we're using the same protocol and everything is A-OK with your request. The code 200 is what tells us that.
Cache-Control: private : This tells us that the server wants the cache to be controlled by the browser and not let some proxy service getin the way.
Date: Fri, 13 Jul 2007 14:58:43 GMT: This is the date and time that the page was served up and sent to the browser.
Content-Type: text/html : Here we have the content type of what is being served. So its giving the browser an HTML page,
Server: Microsoft-IIS/6.0: This is the kind of server that is giving us the date. Microsoft, is the software creator, and IIS/6.0 is the web server products name and version number.
X-Powered-By: ASP.NET : This field tells us what framework the server is using, which in this Server happens to be ASP.NET,
which is what you'd expect from a IIS server.Content-Encoding: gzip: Content-Encoding is telling us that content coming from the server has been compressed using HTTP-Compression and that the browser is going to have to uncompress the content to be able to read it.
Vary: Accept-Encoding: This field is saying that it can Accept-Encoding in a Variety of ways and it "Varys" depending on the browser.
Transfer-Encoding: chunked : This field means the message body is send to the client as chunks that are stamped with the
size of the chunks. With chunked transfer encoding, the client can make sure that it has received all of the data that the server sends.
In Part II we'll go over Browsers....The good, the bad and the ugly.
7/16/2007 5:32:47 PM (Pacific Daylight Time, UTC-07:00)  #    Comments [0]