Why Do We Encode?

Why Do We Encode?


Hi! In this section, you will learn why encoding is a very good thing to do, and you will learn to use XML and you will see it’s not that difficult. So why on earth do people encode their texts? Why do we think that encoding is a very good thing to do for editing? Because encoding allows you to describe the text for what it is, and NOT for what it looks like, which is a smart thing to do if you want then to process the features of the text and not the way they look. It also allows to separate the text and the content of your edition from the way it looks like on the video at the end, on display, so that you can process it in many ways. It’s very good for long-term sustainability because it is a way of recording what you mean to do and not what you want to do right now, in the present. And it is not connected to a specific software, at least the kind of encoding we will show you is something that can be used over and over again. Markup and encoding were not born with computers, it is a concept that was born with printing actually, and it is the kind of symbols, little things that editors and publishers
used to put on the side of the text, just to show what they wanted to do with it. To show if they want to make it bigger smaller and italics etc. So you know the little symbols you put around the page when you copy it, when you proofread things, say, you want to correct things? Yes, that’s markup actually! You can also call punctuation and layout of the page some sort of markup because that allows you to understand how to segment your text and what to do with that. There is also a more specialized type of markup, for instance maybe some of you will become familiar with the so-called Leiden Convention, which is a series of codes and encodings that are used by scholars dealing with papyri for making an edition of that, and in epigraphy, to show damages on the material to show what they have done with that their interpretation of it. You’ll see some of the symbols they use on the screen right now. Here the example is a page from “Les Fleurs du mal” the first version of “Les Fleurs du mal” by Baudelaire and there are corrections made by the author that was telling his publisher how to modify the page to his liking to correct the text in some cases. All the symbols: tags. All the examples at the side of the tags: markup. But markup as I said is not just that, it’s also everything we use to make text more readable. If we had to read the text in the way the Romans were writing, that would be very difficult because in centuries, actually millennia we have added layers and layers of markup to our text to make it more readable. So spacing, capital letters bold, italics, punctuation, everything we use to make it readable. XML is the technology we will show you Why? Because it is very easy to learn, because it’s very portable and because it’s been around for a little bit now. So we have some hope that it will last for a long time. It has been created and maintained by the W3C World Wide Web Consortium, in 1996 with a strong contribution of the Digital Humanities community and particularly of the Text Encoding Initiative. XML is made of two main components. The first component is called elements. An element is everything you can put a label on, everything you will describe somehow everything that you think is important to annotate in your text. And the way it looks like is what you see on the screen: there is a start tag which is composed by an angle bracket, the name of your tag and another angle bracket to close it, the content of your element and your end tag, which is identical to the start tag but there is a slash just before the tag name. Sometimes, elements can be empty they don’t have any content. In that case you will have only one tag and a slash immediately after the element name, in which way you kind of concentrate the start and the end tag into one. How to use elements? Elements can be one after the other, in a sequence or they can nest, meaning that you can use an element within another element. But one basic rule that can never be bended is that they cannot overlap. If you open element A, and you open element B you have to close B before you can close A. And this is always true. You can never bend this rule. You also need an element to wrap the entire file, that we call the root element: the beginning and the end of everything is within the same element. The second part of XML is attributes. What is an attribute? It is something that we use to specify something about an element, to classify, to give a detail, to make it more clear. You see, here in the example we have the sentence
“the Prime Minister during WWII” and if you are an expert on the history of World War II and you know that the context we are talking about is the one in United Kingdom, you will know that there were two prime ministers and you may know which moment in the history of the war we are talking about so you may say: I know who it is! It’s Winston Churchill. So there you have it: you use the markup to add information to your text something that you as an expert on the field know and the computer cannot know automatically. We have here in this example two attributes. The attribute @reg that is used to regularize the reference to the name which is Winston Churchill. But the order is: Churchill, comma, Winston because after we will use that for creating an automatic index of names. And also we have an attribute @type that say which type of reference it is, here to a person. You see also the syntax of that after the element’s name there is a white space (obligatory) the name of your attribute followed by a sign of equal and the value of the attribute is in between quotes. You can have as many attributes as you want in your element but the same attribute can only be used once. So that’s it, you have learned XML! Didn’t I tell you it was easy?

Leave a Reply

Your email address will not be published. Required fields are marked *