- Glossary
-
Extensible Markup Language (XML)
Even if you’re not a programmer, you’ve seen markup language. It contains XML tags and data in forms such as <name>John Doe</name>
.
A human can read it and understand what it means. A markup language you’ve likely seen is HyperText Markup Language (HTML), which is used to display content on a web page.
XML is similar to HTML, but it’s not the same. It also declares how data is to be interpreted. It uses a similar syntax with slashes and brackets.
However, it’s used to transmit data between systems, often systems in different companies. It allows different organizations to share data without requiring their internal files to be in the same format.
When you format data using XML, you can send a data stream to any system, anywhere. All the recipient needs is an XML parser. Electronic communication between organizations would be severely constrained without a tool such as XML.
Whether you want to maintain B2B communications or launch a website, there’s a good chance your company will use XML somewhere along the way.
What is XML?
Extensible markup language (XML) is a file format that both human beings and computers can read. An XML file contains data, and it also holds the rules that govern the data.
When you’re thinking about types of files and when to use them, consider that a conventional data file contains only data. If you want to read it, you have to know the position of every field and its allowable values.
For example, a customer's name might be in positions 11-30. If the file ever changes or expands, every system that reads it needs to know the new file definition.
With XML, each data value is contained within an element that tells you what it is. If XML elements are changed and added, the XML document itself tells you what they are and how to deal with them.
So, in short, what is XML and what does it do? XML makes it possible to exchange data between disparate systems such as databases, websites, and applications.
Importance of XML in modern technology
When two companies do business electronically, they need to communicate. They also need to understand each other. Sometimes they need to make decisions and close deals without human intervention.
XML ensures data integrity by passing the data rules right along with the data itself. AN XML document can be interpreted multiple times for presentation to a human user and for processing by a computer system. XML ensures that data being consumed is identical across platforms.
Search engines work more easily with XML than with other file formats. XML facilitates data transfer between databases, websites and legacy computer systems such as accounting systems.
A wealth of tools support the creation and reading of XML files. XML support is built into modern programming languages. XML is frequently the most flexible and extensible way to move data between systems, both within and between companies.
Understanding the basics
XML is a markup language. That means it’s a text document with symbols that control how it’s structured and formatted. It contains text that can be displayed or processed, and text that determines what to do with that data.
A traditional data file contains a run-on of numbers, letters and special characters. There’s no way to use the file itself to determine where one value ends and the next begins. An XML document tells you what its data stands for.
XML vs. HTML
Some people confuse XML with HTML (HyperText Markup Language). The latter is used to read data and render it to a display, typically on a web page.
At a glance, XML and HTML look similar. They both contain descriptions of data elements, called XML tags, and characters such as <, > and / that define the XML tags and their values. However, there are significant differences, and they’re not just limited to the fact that HTML is only for display.
The “X” in XML stands for “extensible.” A user can’t extend HTML because it has a finite number of predefined XML elements. With XML, you can create your own elements and give them the definition that’s appropriate to your file.
Structure of an XML file
An XML file begins with an optional XML declaration and a document type definition. The latter is needed for data validation, but it’s not absolutely required for an XML format.
The interesting part of the file is the body. The basic component of the body is an element. The element begins with an element start tag and finishes with an end tag. The text in between can include the element’s attributes, and it can embed other XML elements.
An XML file could contain a large number of these customer elements, and each element might have multiple attributes and multiple child or embedded XML elements.
Real-world XML use cases
Because XML users can define their own elements, XML is ideal for establishing a communication standard within an industry.
Industries can use their own markup language and create XML structures both for communicating data and storing it. News and weather service are just two examples of industry-specific XML definitions.
XML is particularly important in web services, which are services offered by one device to another. XML is an ideal way to flexibly label data in a service so it can be processed by many devices.
Businesses have created XML formats for lots of industries, not only obvious choices such as e-commerce and finance, but also for mathematics, health care and all sorts of B2B communication.
How to create and parse XML documents
Because XML docs are readable text, you can create them in any text editor. Parsers that can read them are plentiful.
Steps for creating a basic XML document structure
You can create a basic XML document simply by opening a text editor and typing. If you choose, start with an XML declaration and document type declaration. Then set up a tree structure with your root element, which is the first element, and define the names of the data stream. Fill in the tree with all the root XML elements.
When you’re done, save the file with a .xml extension. With this method, there’s nothing to ensure that your completed file is syntactically correct or will make sense to a recipient. However, if you’ve made no mistakes, this file can be processed just like one created with a specialized tool.
Tools and software for creating and editing XML documents
The market offers a variety of tools specifically designed for creating and editing XML files.
With these, you don’t have to worry about syntax errors. The tools flag them. They also fill in some file structure and produce XML that a person can easily read. Some of the choices are Oxygen, Emacs for XML, Stylus Studio, XML Notepad and Komodo.
Parsing XML documents using programming languages
Because XML syntax adheres to strict rules, developers are able to write parsers that extract the data and use it in applications. Parsers also check an XML file for valid syntax, and they’ll flag an error if, for example, a tag is missing or the file doesn’t conform to the rules as defined in the schema.
Today’s web browsers have built-in XML parsers. Some of the better-known parsers include Microsoft's MSXML, System.Xml.XmlDocument (part of .NET), Xerces and Saxon. Java has a built-in XML parser, but it can be swapped out for Xerces or Saxon.
Techniques for handling large XML files
XML files can become gigantic, especially when they’re used for data-intensive tasks such as reading large databases and formatting them for export.
A lot of XML text editors read an entire XML file into memory before processing, and with XML files that are sometimes many gigabytes, that just doesn’t work. You need an XML parser that can process XML files in place and use subroutines to handle specific XML elements. Some parsers include large file viewers that create indexes in memory rather than reading in the entire file contents.
Security and encryption in XML
Security is important, both guarding against malicious attack and protecting intellectual property. An XML document structure isn’t inherently more or less secure than any other file. If it contains sensitive information, it needs to be encrypted.
That being said, XML encryption has a characteristic that sets it apart. You can encrypt just a portion of an XML file. For example, you can encrypt an element and all of its sub-elements. Just select the part of the document you wish to encrypt, actually encrypt the text, and send it to its valid recipients.
What are XML schema and namespaces?
Schema and namespaces are used to clarify element names and to establish rules about their attributes and their relationship to other elements.
XML schema in definition and validation
An XML schema defines the allowable structure of an XML file. For example, it can determine the order of the elements, their permissible attributes and what’s required for the file to be complete. When an XML file is parsed, it’s validated against the schema to ensure that required data is present and data values are acceptable.
Many industries and organizations have created standardized XML formats, and most are defined by XML schemas.
Organizing elements and attributes with namespaces
There are only so many reasonable names for elements in the world. A common one, such as “name” or “date,” is used in many XML files, and a date in one context must be distinguished from a date in another. With namespace, element and attribute names can be assigned to a group and be differentiated from one another.
DTD vs XSD
Document Type Definition (DTD) and XML Schema Definition (XSD) are both used to define the structure of an XML file. DTD derives from generalized markup language syntax, while XSD is actually written in XML. XSD offers some advantages.
XSD can define the contents of an XML file as well as the structure. It supports namespaces. It’s easy to learn for someone who already knows XML. Like other forms of XML, XSD is extensible.
XSLT and Xpath
Extensible Stylesheet Language (XSLT) is used to transform an XML document into another markup language document, most frequently HTML or XHTML for a browser. As it transforms, it can add, remove and rearrange elements and attributes. XSLT uses Xpath to navigate through the elements in an XML file and find the parts of the document that require transformation.
Advanced XML technologies to know
XML is widely used in today’s technologies, including web services, databases, search engines and APIs.
SOAP and REST
An application programming interface (API) is a set of functions and procedures that define how one application will interact with another. Simple Object Access Protocol (SOAP) provides a rigorous and secure way to build APIs that encode data in XML. It’s a communications protocol that uses XML to provide a messaging framework. It’s particularly used in decentralized systems that run on different operating systems.
Representational State Transfer (REST) is an architectural style rather than a protocol. REST APIs recognize requests for a resource and return results to the requester in a format suitable to the requester.
XML in web services
A web service is a software functionality hosted at a location that can be addressed on a network. It’s a machine-to-machine function that presents an interface but hides the details of its implementation. XML is frequently the format for sending messages between systems. It gives those XML documents the self-descriptive ability that’s the hallmark of XML.
Without XML, the client requesting a web service would need domain knowledge to understand and process the data stream it’s receiving. With XML, the web service provider can describe the data within the XML documents themselves, and the client can interpret the XML files using their preferred XML parser.
XML in databases
CRM databases are a key component to manage customer data. There are a number of advantages to doing that management in a database that stores XML documents. For one thing, both people and machines can read the data.
That’s not always true of relational databases, which require a human to be familiar with the database and its structure. Some databases contain both XML and other data formats. Often XML is used for the metadata that defines the contents of the database.
XML in a database can be read, created, edited and deleted the same as tables in a relational database. Data retrieved from an XML database has the same self-describing advantages as any other XML files.
XML in search engines
More and more, search engines are using the Programmable Search XML format. A Programmable Search Engine has a great deal of control and flexibility in deciding what sites to search and how to rank the results.
An XML file called the context file defines a search engine’s most basic features. It determines some of the global features, for example, whether image search or promotions are enabled.
Another XML file, the annotations file, designates which websites and pages within websites will be searched. It also defines how the sites should be ranked on the results page.
Future outlook for XML
When you read an XML document, you can tell what the data means without needing outside documentation to tell you. You might wonder why we haven’t been creating data files like these all along. The fact is, XML isn’t all that new. It’s been around since the 1970s.
The increased use of web browsers in the 1990s made markup languages take off. As developers learned how effective XML was for the internet world, they began to extend its use to file transfer, web design, database management, search engines, web services and just about any area where a flexible, self-documenting file structure is needed.
It’s no mystery why XML has become pervasive. Just think about these advantages:
- Humans can read XML. It consists of elements and attributes that can be given appropriate and understandable names. Sometimes it’s possible for a person to understand a customer issue by simply reading a display of that customer’s entry in an XML document.
- Computers can read XML, and programmers don’t have to create new code to do so. There are parsers that read an XML file, determine whether it’s properly formatted, then extract the values to be used in a program or to create another data format.
- XML documents itself. There’s no need to keep a reference book to tell a user or. programmers what the data means. It’s right there in the XML document.
- XML facilitates B2B communications, making it easy for one organization to create a file that the other understands.
- XML ensures data integrity. The rules about the data are enforced in the document. There’s no misunderstanding of what the data means when it’s passed between systems. XML encryption provides a powerful and flexible method of protecting data.
- XML has found its way into every aspect of modern technology. That includes databases, web design, web services, APIs and search engines. Technology continues to create tools that take advantage of XML as well as tools that make XML documents easier to create and process.
- Most importantly, XML supports what business needs to do. It facilitates interaction between companies. It stores documents for use in data-driven marketing. It makes it easier than even for the average business owner to understand the data that’s most important to them.
There’s no reason for XML to slow down. Anywhere data is created, read, updated and processed, XML will have a role.
Its versatility and ease of use make it an essential component in a wide range of applications, including web development, data storage, and business processes. By understanding the basics of XML, one can take advantage of its benefits and utilize it effectively to meet the demands of the ever-evolving digital landscape.
Whether you are a seasoned developer or just starting out, it is important to have a solid understanding of XML documents to stay ahead in today's competitive and fast-paced technological environment.