How the browser renders a web page
My thinking: if I'm going to build websites that are fast and reliable, I need to really understand the mechanics of each step a browser goes through to render a web page, so that each can be considered and optimised during development. This post is a summary of my learnings of the end-to-end process at a fairly high level.
Also very helpful was the article How Browsers Work: Behind the scenes of modern web browsers by Paul Irish and Tali Garsiel. It's from 2011 but many of the fundamentals of how browsers work remain relevant at the time of writing this blog post.
Ok, here we go. The process can be broken down into these main stages:
- Start to parse the HTML
- Fetch external resources
- Parse the CSS and build the CSSOM
- Merge DOM and CSSOM to construct the render tree
- Calculate layout and paint
1. Start to parse the HTML
The Document Object Model (DOM) is the data representation of the objects that comprise the structure and content of a document on the web.
The first step of this parsing process is to break down the HTML into tokens that represent start tags, end tags, and their contents. From that it can construct the DOM.
2. Fetch external resources
defer means that the execution of the file will be delayed until the parsing of the document is complete. If multiple files have the defer attribute, they will be executed in the order that they were discovered in the HTML.
async means that the file will be executed as soon as it loads, which could be during or after the parsing process, and therefore the order in which async scripts are executed cannot be guaranteed.
3. Parse the CSS and build the CSSOM
The CSS Object Model (CSSOM) is a map of all CSS selectors and relevant properties for each selector in the form of a tree, with a root node, sibling, descendant, child, and other relationship. The CSSOM is very similar to the Document Object Model (DOM). Both of them are part of the critical rendering path which is a series of steps that must happen to properly render a website.
The CSSOM, together with the DOM, to build the render tree, which is in turn used by the browser to layout and paint the web page.
Similar to HTML files and the DOM, when CSS files are loaded they must be parsed and converted to a tree - this time the CSSOM. It describes all of the CSS selectors on the page, their hierarchy and their properties.
Where the CSSOM differs to the DOM is that it cannot be built incrementally, as CSS rules can overwrite each other at various different points due to specificity. This is why CSS blocks rendering, as until all CSS is parsed and the CSSOM built, the browser can't know where and how to position each element on the screen.
5. Merge DOM and CSSOM to construct the render tree
The render tree is a combination of the DOM and CSSOM, and represents everything that will be rendered onto the page. That does not necessarily mean all nodes in the render tree will be visually present, for example nodes with styles of
opacity: 0 or
visibility: hidden will be included, and may still be read by a screen reader etc., whereas those set to
display: none will not be included. Additionally, tags such as
<head> that do not contain any visual information will always be omitted.
6. Calculate layout and paint
Now that we have a complete render tree the browser knows what to render, but not where to render it. Therefore the layout of the page (i.e. every node's position and size) must be calculated. The rendering engine traverses the render tree, starting at the top and working down, calculating the coordinates at which each node should be displayed.
Once that is complete, the final step is to take that layout information and paint the pixels to the screen.
And voila! After all that, we have a fully rendered web page!
Please share, react to, or comment on this post on dev.to.