Building the web platform by Anne van Kesteren
All right. Thanks a lot for having me, I've been having a lot of fun this far, also with the jam sessions we had earlier. The talks have been great.
I guess my talk might be somewhat similar to what Mathias has been talking about yesterday, just the little things that actually make the platform. That's also what I do as a...I was going to say, "for a living," but I'm actually unemployed at the moment. I've been working seven years for Opera Software and I decided I wanted a change. I haven't quite figured out what the change is, but I've been unemployed since August 1 and I'm trying to figure it out. [laughs]
Yes, first, I wanted to pitch a thing. A friend of mine who works at Apple now, he worked at the W3C before, he had this goal. He had it quite a while back. He posted this in March or so, or May, I don't remember. But then my register sent me this email that they had these new top-level domains available. You could have .tf. I sort of recall, "What can you make with TF?" Of course you can make WTF, but they don't do single letter. Then I recalled, "Oh, Dean had that thing."
I registered this domain and you can all go there and you can retweet his tweet. That's pretty much the only purpose for it now. I kind of like just setting up domains, they're pretty cheap to get and you can do fun things with them.
That brings me to the next thing. I made sure this exists, as well, so if you go there you get something, but the main point is to discuss URLs a little bit. The URLs is the thing I've been working on during my free time, basically, the last month or two, because there's quite a few problems.
But before we get into that, let's talk a little bit about the terminology. People always call these things in different ways. The bit that comes before the first column, I'm going to call the scheme. Then, the bit that's a domain name, I guess you guys, everyone sort of realizes that. And then, the path, then after the path, you have the query. And then, you get the fragment, the last bit.There's a whole lot of different terminology,
But URLs are fun. They're quite complex to comprehend and I had a lot of fun figuring it out. But before we dive into that, let's talk briefly about bytes and code points, which is a thing I researched in the first half of this year. And the thing is, bytes and code points, they're important for URLs and we'll get to that later. This is the basic concept.
This is what I personally had, I couldn't really comprehend that when you have characters on the screen, there's actually an underlying representation of that on the disk, and you get exposed to it in various ways.
Then there is all the encoding mess and transcoding things. So I researched that subject and I wrote this document called "The Encoding Standard." It's hosted on the WHATWG site, at the URL you can see above. Let's dismiss this thing.
It basically explains there's a whole ton of legacy encodings, and they're all quite complex, how to go from one of those encodings to Unicode and back. These are the things that browsers do. It's one of the main themes of my talk. It's important to sort of understand one level of extraction lower than what you're working with.
So if you work a lot with HTML and writing it out, it's important to have some understanding of bytes and how you get to the HTML, as well, because if you understand that then it's much easier to spot bugs as you encounter them. You have a fuller understanding of what's actually going on in the browser with your product.
Let's not go into this in too much detail. Actually, the most interesting thing is the indexes. In the Asian part of the world, they had all of these legacy encodings. Since they have so many more characters than us, they couldn't fit them all in single-byte encodings.
Here in Europe, we usually have single-byte encodings. The first 128 bytes were mapped to ASCII, and then the remainder were used...it depends in the country. In Russia, it was usually filled with Cyrillic. Here in the Netherlands, we would have accented characters. Greece would also have its own Cyrillic variant, et cetera.
But there they had their own encodings, multi-byte encodings. When they designed Unicode, they didn't actually map them straight, so you need to have these huge mapping tables. Let's see if I can load a small one. I think this one is not too big. That actually works, and that's quite nice.
You have all these indexes, which you get from Unicode. You go from a couple of bytes, and you get an index via some calculation, which is the traditional index in that encoding, and then it maps to a character. Is it just me, or is it not...yeah, I guess it's just me, because I'm standing closely. It doesn't look that sharp.
You have these huge maps, and browsers have all these things built in. There's quite a bit of complexity there already.
Back to URLs. Here's a simple example of a URL. And if we inspect it, the first one you get, you get the Euro characters, you get back because what happens is once the parser hits it, it replaces those things and puts it in the DOM. It's pretty clear.
But then if you take a look at the href IDL attribute, you actually get the resolved path. As you can see, a lot of things have been going on there. The URL parser has been going over what you actually put in, and transcoded those characters. Just look at the first bit, it's basically three bytes, and it's the UTF-8 representation of the Euro character, those three bytes.
That's because URLs, they work with UTF-8 as their representation for characters that are outside of ASCII. You get these kind of things, you get a lot. Once you search out of ASCII or whatever, you typically get these results.
Except, it's not quite that simple, of course. You can get weirder things, like this. What happened here is that, the path is actually, as always, UTF-8. But the query string depends. That is because of a lot of the legacy. We didn't always have the UTF-8 convention for URLs.
Form submission and things also went through URLs, so once URLs came to the point that we had to standardize on these characters outside of ASCII, there's already a lot of legacy that was using the local encoding of the server that was serving the document, so if you submitted forms, it had to be encoded in that encoding.
This is in Windows-1252, which you might also know is Latin-I or ISO88591. It actually encodes the Euro sign as byte 80.
It's best as a front end developer to always do your stuff in UTF-8. You won't hit any of these problems. Always send UTF-8 over the wire, always declare it in your document, and then you never get stuff like this, where you have mismatches going on and the server has no clue what's going on.
Especially if you have stuff like this, if you have an API endpoint on the server, and you want to talk to it from another domain, you can do that these days with Chorus and stuff. Your site is encoded in Windows-1252 and you try to talk to this endpoint, it will get the wrong results.
Except, of course, that's not always the case. It's more complicated. It depends on the context. In some contexts, like form submission, URLs generated from form submission, URLs in the A element, and most HTML elements, I think, image, script, they all follow the same pattern. There you will get the query component will be encoded according to the document encoding, which should be UTF-8 but isn't always.
In different contexts, UTF-8 is actually forced. If you use XMLHttpRequest, your URLs will always be treated as if they were UTF-8. The same goes for CSS and things like that. They're a special kind of fun.
The reason I've been researching URLs, I guess I should maybe explain it a little bit, is that there are standards for them of course. You have the RFCs for URI and IRI, which is a crappy term. I wanted to improve the term. They actually don't define things.
Like if you have a space in your URL like if you a href and you put "hello world" there and there is a space there it will work everywhere. The browser will parse it, and it will replace that space with %20, and that will go to the server. But those things are not defined, so every browser sort of guesses and has different results.
Those things are also very ill-defined, and they differ greatly between different browser vendors. That's because there's not a single definition for them so everyone is unclear and unsure what to do.
That's why I decided to try to fix that problem once and for all and make sure that URLs work properly and that they work the same in all browsers. It's kind of like a fundamental piece of infrastructure. Twenty years after their inception, we still don't quite know how they work. I think it's a sad thing, but it's a fun challenge too, to make it work, of course.
So this is another side track again. I have a few different things. I want to talk a little bit about copyright because I kind of figured out earlier, after I left Opera, that I felt quite strongly about it. What is this? Not now.
So I'm not a lawyer, and it's not legal advice or anything. I was asked to join the W3C again as an invited expert. I read the document through, and the thing that sort of stung me was that the W3C documents has a fairly permissive license.
You cannot actually reuse their text in open source software completely. You cannot fork their documents to write other specifications. So it is exactly, I think, what is important for the web.
The web is for the whole world, and its documentation should be too. People should be able to take a specification and rewrite it or improve it and fix it. If their version is the one that ends up mattering, the one that implementers and people start listening to, then that is the one...
I mean, then it proves itself. Right? It shouldn't be proven by the fact that it's tied to some kind of organization that has restrictions on it. I don't think that is the way it should go.
Actually a lot of the work I've done on the DOM and XMLHttpRequest, it's all based on prior art. It's just researching the subject and improving what already exists. Like URLs, I didn't make them up. I'm just trying to fix their documentation.
I feel pretty strongly that that standard should be in the public domain and easily forkable. On the WHATWG, we put most of our stuff on Github, so people can actually go to github.com/WHATWG and you can fork the specifications. You can write new text and do pull requests, the whole thing.
So I declined for now. I'm going to talk with the W3C about this, they have a thing coming up later this month. But, for now, I decided to not join as an invited expert and, instead, do my work at WHATWG, and publish in the public domain. These are the standards for the Internet. They should be free. Like the Internet itself, they should be forkable and everyone should be able to influence them. [applause]
Thanks. Right. I care about forking. All right. That was my little sidetrack there on copyright. We can go back to the complexity stuff now. Actually, I wanted to do the complexity talk about something simple. Otherwise, it gets too complex.
This is a little web page. It has a title, an image, a picture I took. I'm quite proud of this picture. I think it's quite nice. It is somewhere in Greenland, near a small island called Umanak, I recommend visiting it, 1,200 people live there. You get there by helicopter, after a couple of planes. It's really beautiful. You can take this boat tour, at the local hotel, where I don't recommend staying. I recommend taking a tent and camping somewhere, because it's way cheaper.
You can camp for free. The hotel is €150 a night. But the boat tour is really cool. They take you along all these huge ass icebergs. These are way, way tall. I have this thing now. Let me try this. Over there, that is a bird. These things are 10 meters high or so. I have another picture somewhere and you can go with the boat underneath. It's a pretty big boat. Anyway. Iceland is cool and Greenland is too. [laughs]
Let's see. I hope the next slide...yeah, it is simple and there are a lot of layers.
This is a document. As Mathias has explained to you...let me just demonstrates that as well, since we are here. You should have a simple DOM like this. You have the markup to test at the top. Then there is the DOM view there and the render view underneath. This is called the Live DOM Viewer. Mathias probably has a short URL for it, but you can also Google it or you can try to read that. Oh, you can't read the thing at the top, because there is some kind of obscured stuff.
Anyway, what happens here is that the browser just parses this. It's already converted to characters here. The input here is characters, not bytes.
It gets the Doctype. It turns it into a token and the token gets handed off to the parser. The parser does the rest of the work. The parser builds up a tree. Here it sees the Doctype. All right. We'll add the Doctype to the tree. Then it sees an image element and it's like, "Oh, what's going on? That's too soon." It starts popping these other elements in. It's like, "We first need to create an HTML element." That's what the HTML stage does. Then the HTML stage says, "We are done."
Then it gets kicked off to the head insertion phase, which is like, "Hey, that is an image element. We can't do anything with that. But let's insert a head element." Then it gets kicked on. The body says, "Hey, there is no body. Let's insert a body." Then you get to the image thing. It inserts the image and it renders.
This is not actually valid, because you need a title, as Mathias explained. It doesn't really matter. It works. It also needs an alt attribute. I'm sorry to all the blind people, and Google. [laughs] That is basically what happens in the HTML parser. You get this DOM. Therefore, what I showed here works fine.
What I wanted to talk about were the layers. A browser is very complex software. I wanted to illustrate that a little bit, by just going through and talking a little bit about all these things. Do people want to guess how many layers there are? Who thinks it's more than 10? Who thinks it's more than 20? 30? 40? I don't really know, to be honest.[laughter]
There is a lot of code. Browsers are pretty complex software.
Someone, yesterday, said that Photoshop was pretty complex. I think browsers are more complex. For browsers, you need an engineering team of at least 200 people or so. There is a bunch of code attached to it that is open source. All these people are freaking smart, too. It's a whole lot of work to make a browser and to keep it competitive as well.
For simplicity, we will forget about the network for now. Of course, there's a lot of stuff there. It is nice. But let's just talk about the rendering.
I wanted to skip rendering too, because it's kind of complicated. But, you know, I have to talk about something. [laughs]
All right. If we start at the top, do you see the page? You don't have to see it. You remember it, right? There is a heading and an image. It is pretty simple. If you go backwards in the process, the last thing that happened, or the first, depending on how you count, is the painting. You have some kind of input. It gets painted and you see it on screen. That is simple enough. But it's not exactly that simple, because there is hardware acceleration and all these other things.
Before that, you had the layout tree. That is the kind of thing that was painted. What is important to understand is that the layout tree is not equal to the DOM tree. It is a separate thing that was created. Let me just check. Yeah. CSS and the DOM created the layout tree.
The CSS specs confusingly talk about markup all the time, but the DOM is what matters. That is what is being styled. If you update the DOM, it is it is reflected in the page. Let me see. The thing is, if you have a "display none" element, it doesn't end up in the layout tree. Otherwise, it would.
The layout tree is kind of like the DOM tree. It consists of a lot of boxes. But it is also not. If you have just a single in-line element that wraps, in the layout tree, you will end up with several in-line boxes, one for each line. You get the idea. Right. Yeah.
I wanted to point this out. The layout tree is not quite like the CSS Object Model. The CSS Object model is what actually happens. Once the browser fetches the style sheet and creates and parses it, it goes through this process, where you basically get a representation of the style sheet. That is what the CSS Object Model is. It is quite similar to the DOM, which is a representation of the markup.
But the layout tree is separate and it's not actually scriptable in any instance. You can query it in various ways, as Peter Paul showed yesterday. You can query it using offset width and offset height and you can do it on the elements as well. But the interaction between that and CSS is kind of sketchy. Part of that is because the offset stuff came from the DHTML era and it never reconciled with the CSS work very well.
So, the kind of work that happens in CSS and the kind of work that happens on exposing what happens in CSS is not really...there is not a lot of work going on in that area, actually. I think it is improving now in the CSS working group. I believe Google is investing effort in that area. But, actually, the layout tree is an unknown thing, although it's actually quite important, because that is what we end up seeing.
I'm not sure why I put this away. [laughs] Once the browser has built the style sheet and it has all these rules in it, the rules are matched to the nodes in the DOM, using the selectors. All these little things seem very simple, but they are not. Selectors themselves are already quite complex.
Do people know how selectors work? Could people raise their hands? All right. Quite a few people know how selectors work. Quite a few didn't either, or at least they didn't raise their hands. I will briefly...what happens is that the browser goes through the DOM tree. For each node it sees, it goes through all the selectors in the style sheet and checks whether it matches or not.
To make this a bit more obvious, let's look at the example here and make a selector. All right. We have a style sheet. We have one selector. How many nodes are matched or how many nodes does it traverse through? Let's do that.
It actually goes through five. First it checks whether it matches the HTML element. Doesn't. Head, doesn't. Style, doesn't. Body, doesn't. Image, does. So you have a match. Of course, there is optimization. Browsers have hash tables for tag names and those things. Effectively, we go through all of those and check whether it matches or not.
Also, there are not really many other ways you can do it. We add one. Let's add more borders. Now, it also goes through all these elements and checks whether they are matched or not. There could be a second body element. So it doesn't quite know, whether that is the first.
If we insert another body element, it will still have to work. Maybe I should give it a different style, because otherwise I guess you can't see it. We can do something like this.
There you go. There it is. [laughs]
You can do all kinds of weird stuff with the DOM. You can't do this in HTML. There was a question about that yesterday. What happens, if you omit the end tags? Is it faster or not? It does, indeed, not matter much. I think it is actually a little bit faster, because it ends up being ignored. If I add this body here, and then add this tags test, you see it still ends up in the document. So the body end tag is kind of useless. It doesn't do anything. The moment you insert text, it just gets there.
Anyway, now we have a quite complex document. The selector has stopped multiplying. Now we have seven elements and three selectors, so you get 21 matches. This is fairly simple, but once you start Gmail-like applications with thousands and thousands of DOM nodes and thousands of selectors, there are really big style sheets. I guess you guys all know these things. You do the same kinds of things. It gets really, really hard to stay performing. That is the kind of reason why...
How many people have heard of matches? It is a promissed pseudo-class to select parents. You can select the parent nodes and you can make decisions, based on what parent it is and how you want to style the child. So, you would have something like, "I want to style this image element. But only if it has a P as a parent." That can let you do really powerful stuff. You can select elements and you don't have to have all these classes and stuff. It makes it a little bit simpler. But it makes it a lot harder for the browser to stay performing and to do all these computations effectively and quickly.
The other thing is, once you get to... the main thing that is hard with the browser is that it is dynamic. Each time you might make changes to the DOM and through the parsing stage, everything needs to be updated and all those selectors need to be recomputed.
So, if you have your DOM with 10.000 nodes and your 10.000 selectors and you start updating things and the user starts hovering around with the mouse and interacting, all these things start spinning. You get a lot of work and you understand why something quite simple is actually very complex, because all these processes are going on.
Let me just get some water.
CSS syntax is kind of cool. I think Mathias actually showed us this already. You can basically escape anything. In the selector or the property names, you can use all kinds of weird escapes. Nobody does it and I don't really understand why it was added. I think it was partially because CSS was grammar-based.
You have got all this weird stuff. You can write the background line like this and you can write it in a lot of different ways. If you want to do CSS-based filtering, you really have to do whitelist things. Otherwise you are going to be in trouble. It's kind of weird.
Even this is a tremendous simplification, because there are so many things going on in CSS. Many of you have seen the standard.
It is a whole lot of pages. You have to deal with inherent and initial failures, specificity and complex syntax. There are a whole lot of pieces of code involved in doing all that work. There are 10 people working on the layout team, optimizing it and making sure that it works fast. This is for each browser, of course. You can have external style sheets, just multiple style sheets.
There is an interaction with scripts. All that works together and tries to render this page to make this really simple thing possible for us. I think that is really quite cool.
We covered the layout part a little bit. There is the DOM as well. Everyone knows about the DOM, of course. Nobody really likes it, but it is a core part of the web infrastructure. It is the API, against which script and layout happens. It is basically what the web page is. It's where the semantics are at. The DOM is what everyone reads off. The assistive technology. It is what is being updated by scripts.
So, the semantics are actually at the DOM. They are not really at the markup level. That is why the whole discussion about HTML versus XML, years ago, was not very interesting. In the end, what you get is a DOM. That is what is important. Everyone builds up some kind of representation of the markup. The markup is just a convenient syntax you have created. You can also create a DOM for script or something.
We had to make this thing work for multiple languages. It had to work for Server and Client. So they designed this really weird thing around the, "Oh My God" IDL. It's not actually "Oh my God," but it's OMG IDL. It is a kind of syntax to define all these interfaces.
They also designed a DOM, not just for doing scripts and stuff, but also for editing applications, to preserve things in the DOM, like comments. You can find processing instructions in the DOM. The doctype is there. There's not really any reason for a doctype to be in the DOM. It's just a syntactic detail. But yet it is there.
So you need to skip it, if you traverse the document tree, which is kind of annoying. They even had this thing, where attributes were nodes. We are actually trying to fix that. The new DOM standard makes attributes no longer inherent from node. We are not quite sure if it's going to work out or if it breaks the Web. The Mozilla guys are brave enough to pioneer this for us and test it out in Firefox, slowly deprecating the methods over time. If you use an getAttributeNode, you will see warnings in Firefox, which I think is quite awesome. Hopefully, the other browsers move in this direction too.
The main reason is that it is already complex enough. If you can remove some of the complexity and all that it takes, you can focus that on other efforts and improve other areas of the Web.
That is another thing that is really important is that by keeping it simple, or simple enough, we at least make it possible for other players to enter the market. That is, in part, why I am writing standards and I'm trying to write them at such a level of detail that anyone can implement them and they will work the same way as Chrome or Opera. Otherwise, the Web will end up being a locked up space, like it was a couple of years ago.
Not too long ago, when IE6 was still dominating, the other browsers had to actively reverse engineer IE6 behavior. Of course, they did it wrong. They didn't know what to do. But, if they didn't, they couldn't compete with IE effectively, because a lot of sites wouldn't work. It's really important to keep the Web an open marketplace.
I'm a little bit afraid of the whole WebKit dominance thing as well, because what you get is that WebKit gets really big, people start coding towards WebKit, people start coding towards WebKit bugs and WebKit bugs get enshrined in the platform.
I also think and the people at Google agree that it is a problem for the WebKit project itself, because if people start relying on their bugs, they cannot fix them. You end up with all these extra holes and problems in the platform. That's why it's really important that we have solid standards and tests to keep the Web relatively open and progressing.
To get back on the DOM itself, this is quite a complex thing. There are many, many objects. Just for HTML, I think there are about 100 different interfaces. All these interfaces and all these objects have a lot of attributes. These attributes have their own algorithms to set and read.
How many people have actually looked at the HTML spec and read an interface definition, for instance? How many people understand Web IDL? One hand goes up. I agree with Alex Russell I think. He doesn't like Web IDL. Let me just load the HTML spec.
Here we have the image element. Let me just explain how to read this, because I think it is useful to know. Then, if you're dealing with some scripts and you are wondering what you can do with this script, you can actually look up what you can do. In the image element, you have the Names Constructors.
Once you have the image element, these attributes are the ones you can set. This is actually different. There is a difference between the set of properties you can set.
These are the properties that the image element has. You have "alt", "src", "srcset". There is a new one, "crossorigin" for doing cross support. There is "usemap". There is "ismap". You can read out the natural width, which is the original, intrinsic width of the image. It's not actually the rendered. I have five minutes left? Oh, geez. That goes quick. I thought I had 50 minutes.
You can take 50 minutes.
It is daunting at first. For me, it was daunting at first to learn about these things and, "How do I get hold of this object?" But, as you read more, it gets more understandable. Maybe, actually, if Alex goes through, we can make it even better and do it in a slightly different way.
Let's see. I have a few slides left, I think. Yeah. I already explained this. Like the DOM. You create it from a large string, from markup and code points, which are transferred as bytes originally. That's how you end up with the DOM.
There are many, many layers. These were not all of them. I think it's hard to make a coherent picture of that, as you have seen. [laughs] Here are a few of them. You have the network, the inputs to the parser, the parser itself, the DOM, the scripts, the style, layout, the tree, the painting. All these concepts have a whole lot of different code paths and complexity tied to them. What it comes down to is that this is why we have bugs.
All these systems are created by people, including the standards and the software. They are not quite perfect. So there are bugs everywhere. But knowing a little bit about how these things fit together, you can realize where the bugs are and make a more coherent bug report maybe.
It's not all bad. There is a little progress as well, by figuring out all this old stuff, like the HTML parser. The HTML was invented in '94. Up until 2011, we haven't had interoperable HTML parser implementations. All the browsers did their own thing. Now, they are finally interoperable.
This is based on work from Ian Hickson, who, in 2006, started figuring out HTML, wrote down the parser algorithm and got all the vendors on board to implement it. Of course, this helps browsers as well. The code base becomes more maintainable. The upside is that we can now make changes to the HTML parser and add new features, without introducing all kinds of problems. This is why I'm doing the work on encodings and URLs. By fixing all the groundwork, we can slowly move ahead with less problems in the future. So, I'm on time. Chris told me.
Actually, before you applaus, I wanted to just raise one thing. We had the jam sessions. I felt that was really cool with the 10 minute things. I was wondering how people feel about doing that same kind of thing for a full conference. There are so many of you here. You have obviously way more interesting things to talk about than I do. I was just invited here, because I worked on standards for seven years. But it would be way cooler to have everyone here just give a five or ten minute pitch on whatever is their passion.
I was just throwing it out there and I'll read on Twitter or something what people think about that idea. Instead of having just 10 people share their thing for two days, have 400 people, or maybe a little bit less share their passion and have some kind of collaboration going on. [applause]
These things happen. We call them BarCamp. [laughter]
Yeah. I was thinking it would probably be BarCamps, yeah.
Yeah. They are totally happy to do it. You can organize them. I did three of them. They are really easy to organize, rather than these big ones here that cost a lot of money and places. People want food and it's really annoying.
The thing with BarCamps though is that there are a lot of different rooms usually and a lot of different tracks. It would be really nice to just give everyone an audience of 400.
It scares a lot of people as well.
I guess, yeah.
That was a lot of words and kitten pictures, I think, as well.
I have this wonderful question here. Please explain the [inaudible]. [applause]
I'll have to read that again. [laughs]
Yeah, please don't. [laughs]
Oh, OK. [laughs]
There are a few that actually have a more interesting meaning. Does "querySelectorAll" use the same matching algorithm as internal browser methods and is it very fast, because of that?
querySelectorAll uses the same algorithm as the CSS parser does. I would assume that it is faster, yeah. Yeah. It uses the same optimized code path. Selectors are very complex pieces of code actually, in browsers, these days. They are extremely optimized, because layout itself, because it's so complex, because CSS is at such a high level of extraction, there is a lot of time dedicated to it.
We also made CSS much more complex. Now we have animations and transitions and shaders and effects.
Exactly, yeah. Yeah. We keep adding this stuff, while we do not actually understand what we have so far. Yeah.
Here is a good one here. If you could replace the DOM with something better, what would it look like and what are the biggest things in the DOM that need replacing? Is it something like the jQuery selector engine?
I guess we would have a better API and I would drop the redundant stuff. The DOM has a lot of methods that we don't really need. There are a lot of interfaces we don't really need. We don't need doctype. We don't need attributes to be nodes. Attributes could just be strings on the element object, I think. But we have to keep them as objects, to make those attributes work, unfortunately.
I would remove comments. I wouldn't keep that in the tree. That is not needed in the tree. Maybe you could even drop the document node itself. You would just have elements to start. Yeah. You make it much more like the APIs you have in Python like etree or whatever it's called.
One thing about this that ails me a lot, being somebody who came to the Web in 1996, because of passion, coming from another media, radio, to the Web, I look at HTML and it was simple rules and it was wonderful and easy to understand what I do with this. I mark up text with a few commands.
I start with a p and I end with a p and I know it is a paragraph.
That a rendering engine doesn't need this doesn't mean that we should bastardize HTML that way, I think.
Oh, no. I was trying to explain how the parser works. If you work on a team, you should probably always use quotes. Closing your tags makes sense, unless you have a common understanding on the team and everyone knows what they are doing. Personally, I don't usually bother closing an LI, because LIs are only going to be in an OL anyway. So there's not much point in closing them.
Or a UL.
Yeah. But they are always going to be in a container.
But they have a different meaning. One of them has an order of the lists, a defined one, and the other one could be in any order. That's what I'm saying. It's great that these things are there, but...
No. No. What I meant is that there are always going to be LIs. There are always going to be siblings. I guess that's what I meant. So you don't really need to close them, because you know...
It's the same with the DOM.
I'm fine with people closing all their tags. It's probably better and it's more approachable. Yeah. It makes it easier to understand. But I think it's also good to have an understanding of what is actually going on. If you trip up, you realize, "Oh, OK. I forgot this closing tag and, therefore, this thing happened."
But, on the most displayed glitches, when you build websites, just because you forgot something, and browsers inject something you didn't even know existed. So, for me, the predictability of an error is always a confusing point. That's why I like the idea of XHTML. I know it's not popular, but I like code breaking, when I make mistakes.
Rather than browsers doing things to fix it for me.
That is great. The thing is that, with XHTML in particular, if you start using the production website, and you don't use...a lot of people use string concatenation to make their websites. We don't have a DOM on the server and we serialize that and make sure it is well formed. Then you have your typical web shop, where you sell games and you search for black and white, which has an ampersand. The guys that made the web shop used XHTML. But they didn't quite realize what they were doing, so they didn't escape the ampersand. I opened it up in Opera I got the well-formedness error and I couldn't order my game. This actually happened back in 2004 or so.
Well, it's much like when people forget a semi-colon in their PHP and the PHP doesn't run. This is what code is like.
Forgivable code is very dangerous for people to teach things.
Yeah. PHP runs on one product, whereas your website runs in millions of people's browsers. What is going to happen is out of your control. So it's better to use a forgiving format then to throw these errors at users.
Your PHP is what you control. But, the moment you start serving a website and you include content from third parties, there are going to be unknown variables in that. Unless you have a perfectly airtight system, which is really hard to write, it gets really complex to make sure that all the XHTML and all your output actually is always going to be that case. Then you need to really, really test your backend system.
I can understand it for a slide set or a simple HTML page. You can just use XHTML, because it's going to be static. But the moment you get the dynamic things going on and you get all this input from sources, you need to have a whole XML tool chain on your server that does all these things.
I mean, no disrespect, but it's a complex thing. There are a few people who tried it. Like Sam Ruby tried this on his blog and tried to write this perfect blog system, and there are still people every now and then who found cracks in it and made it give well-formedness errors to users. And giving errors to users is like a big no-no I think.
Yeah, it is, but at the same time allowing anybody to write any code is a big no-no as well because we don't evolve as a craft.
Well, we don't allow it. Right? That's why we say, "Use validation and if you get errors, understand what happened and fix those." I think that's our thing. We have validators and there are checkers. I think most people here, they care. Right? We care about that kind of stuff. We want to fix the validation errors, and we want to understand what the fuck we're doing.
I mean, there are some people that don't really care, and they just want to put out content and I think that's cool, too. It's really important to put out content, that's the whole thing the web is about, to share your stuff. If you don't really know HTML that well, you made an error, it's better that you can share your content than that you get an error or something I think.
It's more about tooling, that the tools actually allow you to create clean things without having to think about it.
Yes, I guess, in part it might be about tooling. It depends where you're from, I guess. If you don't have a strong background in HTML, then you might want to use a tool. If you have a strong background in HTML, you might want to do it yourself. Even if you don't have a strong background, you might just want to play around with HTML and see.
There's not really any point I'm trying to make here, I'm just saying, yes.
Yes, I'm getting around to the idea. I still find it kind of scary to release the semantics to the crowd. It might be a good idea. It's not opening the floodgates or anything because, as Alex has said, people are already doing this, right? People are already creating their own elements, so we might as well give them a way to do it properly. I think the Shadow DOM itself plus the web components is a really interesting way.
I edited a Google Maps a couple months ago. I was playing around with the Google Maps API. I had this simple page set up with a div. I had given the div some styling. I don't really use IDs for divs usually, I do div and then give it a width. I gave it a red background.
This Google Map thing wasn't working and I didn't understand why until after all I realized that Google Maps API had inserted a whole bunch of divs in my page. I was tiling those divs and therefore nothing was happening and that's very annoying. That was very annoying because I couldn't use generic styles because it was clashing with this Google Maps thing. That was really weird.
And with components, the Google Maps thing becomes its own separate component and your styles were not affected, unless you want to. There is some kind of bridging thing there. I think that makes that kind of thing, where you have a lot of different components, and different people make those, or you want to embed a map on your page and you don't want to mess around with the styling, it makes it a lot easier to work together and do things.
It can also, for UI stuff, like making new buttons and all those things, like if you want to make a button that consists of ten divs, go ahead, and then you just hide it from the DOM. The DOM stays semantic.
I think that's the other nice thing, the DOM is sort of clean, and in the background you have all these different elements to do your tricks. You sort of hide them from the people that script and try to get content out of your page.
Good, lunch. Thanks Anne. [applause]