Saturday, 10 March 2012

Why are cookies used on web sites? (Part 2)

Web pages use the Hyper Text Transfer Protocol (HTTP) to transfer the page from the web server to the client’s browser, it uses Hyper Text Markup Language (HTML) to code the page and the browsers render the HTML to create the web page on the screen. When Sir Tim Berners-Lee developed HTTP at CERN the particle physics laboratory on the French-Swiss border it was a stateless protocol, each transaction of transferring a single web page was a single session within the protocol and independent of the any other session, it was not possible to transfer information between web pages.

It was not long before information was being exchanged between web pages by using Uniform Resource locator (URL) encoding in a GET request or in the body of a POST request. GET and POST are two types of HTTP request methods used by the client to request a resource from the web server. The URL of the request object is contained in the header of a request method.

Sample GET Request showing URL encoding

GET /path/script.cgi?field1=value1&field2=value2 HTTP/1.0

From: someuser@internetuserl.com

User-Agent: HTTPTool/1.0

[blank line here]

Sample GET Request showing data in the body of the method

POST /path/script.cgi HTTP/1.0

From: someuser@internetuserl.com

User-Agent: HTTPTool/1.0

Content-Type: application/x-www-form-urlencoded

Content-Length: 32

[blank line here]

home=Cosby&favorite+flavor=flies

When using GET request the transferred data is visible in the address box of the browser, in a POST request the data is not so visible.

However these methods of transferring data are transient and don’t provide for persistence of data which is required for a more complex web application and for a personalised experience. As web pages are rendered on the client machines, a technique of using variables that will be stored in the client’s browser where developed, these variables are known as cookies.

The document Request for Change (RFC) 2019, Feb 1997 deals with HTTP State Management Mechanism and describes the two new headers introduced to the HTTP protocol, Cookie and Set-Cookie. The header Cookie is used in the Request object to send a cookie to the server, the Set-Cookie header is used in the response method to set a cookie on the client browser.

In the RFC 2109, 3rd party cookies where not allowed, however this was ignored by some companies and RFC 2965, Oct 200 and RFC 6265, April 2011 have redefined HTTP State Management Mechanism.

There are a number of controls built into session management by the use of cookies to try and protect the user, such as that a cookie should only be read by the domain that created it, however these controls can be by passed and the newer attributes introduced into cookie header in later RFC’s are meant to control exploiting cookies, however the browser’s themselves can be exploited to give up cookie information.

There are a number of types of cookies

Session cookie

Only lasts whilst using the website that created it, a session cookies is created when no expires attribute is provided during its creating, a browser should delete session cookies as it quits

Persistent cookie

A persistent cookie outlasts its session retaining information until the expiry or max-age is reached, allowing information to be exchanged across multiple sessions with the same domain.

Third party cookie

A third party cookie is one set with a domain not the same as the domain of the web site visited

Attributes of cookies

Domain & Path

These set the scope of the cookie; it can be a single host, all the hosts in a domain, or a folder and sub folders within a host if the part is set to a folder other than root of the domain.

Setting a domain to a top level domain (TLD) is not allowed i.e. .com, or .co.uk

Expires & Max-Age

Sets the persistence of a cookie, if an age is not set the cookie expires at the end of the session, however it is possible to set an exact date for the expiry of the cookie or how long in seconds it will last.

Secure cookie

When set limits the cookie to being transmitted by secure connections only i.e. https, it goes without saying the cookie should only be created within a secure connection

HttpOnly cookie

Only allows access to the cookie via the HTTP protocol and prevents access from within scripts by using the document object model (DOM) i.e. document.cookie

Cookies are used on web sites to allow session management, personalisation and tracking, session management allows interaction between web pages to create a web application; typically session cookies that expire at the end of the session are used. Personalisation allows data to be retained by the client about settings used on a web site, allowing for personalisation without having to get a user to authenticate to the web site every time; persistent cookie are used with a suitable expiry limit. The final use and the one that causes problems with privacy is the use of cookies for tracking a user and which pages and the sequence of visiting them is logged on every visit to the site; again persistent cookie are used.

Cookies are created by a web server sending the set-cookie header to the browser, from then onwards every time the browser requests a page from that domain the cookie header is sent as part of the request, this continues until the cookie expires. However cookies can also be set by a script on a web page manipulating the DOM if supported and enabled on the clients browser.

No comments:

Post a Comment