Make It Semantic from the Start

The publishing industry would be in much better shape if it could just stop concentrating on publishing things. Newspaper companies should be tremendous engines feeding a ridiculous variety of publishing vehicles, but they can’t do that as long as their tools are focused on delivery.

If you’re still lucky enough to have a hometown print newspaper, the reason you can’t get late sports scores has nothing to do with newsroom staffing and everything to do with how far away a truck can get from a print facility before morning rush hour hits. The inverted pyramid, where a reporter crams the most significant facts into the top of a story, became a standard because it allowed copy to be sliced from the bottom by hand just before pages went to press. It remains the default news story format for print pages composed on computer screens (where any sections of the story could easily be trimmed) as well as for online stories where presses don’t even exist.

Despite the emergence of the Internet, the tools for putting ink on newsprint and delivering that newsprint to customers continues to define the newspaper industry. The hierarchies and organizational structures of newsrooms still reflect the days of manual typewriters and hot lead and the relationships that developed between advertising and editorial departments in the last century have extended almost unchanged into this century.

On a larger scale, you can trace a line from today’s dangerous concentration of media ownership directly back to print production innovations of the 1950s. The Linotype machine was invented by a German watchmaker, had over 10,000 moving parts, took more than four years to master, and produced a great majority of U.S. newspapers for much of the 20th century. The Patriot Ledger in Quincy, Massachusetts installed the first commercial photocomposition machine in 1953. The Photon set type using film rather than hot lead and did so at a pace six times faster than the Linotype.

Inexpensive clerical workers could operate the Photon and the combination of an increased demand for photocomposition and a decreased demand for skilled Linotype operators permanently crippled the International Typographical Union. Savings on production costs fueled a sharp increase in profits and that, along with changes to estate tax codes that disrupted family ownership, led to the dominance of corporate newspaper chains like Gannett and Tribune.

Production costs got cheaper, but they were in no way cheap and the substantial investment required to maintain print production has hampered the evolution of online newspapers. Web content is still a second-class citizen. The attitude in the early days of the Web was “We’ve put our stories in tonight’s paper; do whatever you want with them online.” Eventually, organizations found ways to deliver content for Web distribution at the same time they sent the content to their presses. It was more efficient, but online newspapers remained shadow versions of print. Now most news organizations supplement their products with original content, but that online content still has to fit within a structure designed for repurposed print material.

The lack of real differentiation between online and print news products has helped cause a lack of real innovation in online advertising. Despite the introduction of animation and low-level interaction, online banner ads offer little more than print display ads (except, of course, that online we can measure ineffectiveness while for print we can only guess about it.) And that lack of innovation may be a factor in the industry’s stalled advertising sales.
It was thrilling when online newspaper expenditures went over a billion dollars in 2003, but it’s less exciting now that it’s still just eight percent of overall newspaper advertising dollars and that the overall pot is dwindling (down $8.3 billion, or 18%, since 2003, according to the Newspaper Association of America). That’s a sign of trouble when Internet penetration for the U.S. has increased nine percentage points (up to 74%) in the same timeframe.

Until organizations break free of their print foundations, content will continue to be underutilized. When it’s structured for delivery through a specific print mechanism, like a newspaper, you have to extract the content and strip away its original structure before using it elsewhere. The better solution is to structure content only once, but in a way that it can be used anywhere. One way to do that is to set aside all display-related information and base structure solely on the meaning of the content. Semantic content structures allow applications to read, understand, and compare content without a human doing the translation.

With no need for extraction or distillation, the same “machine-readable” content fuels all kinds of mechanisms including print newspapers; Web sites; blogs, microblogs, and other online social objects. Simultaneous use replaces re-use.

Traditional publishing and content management systems bind content to display and delivery mechanisms, which forces a recycling approach for multi-platform publishing. A semantic content publishing system, on the other hand, creates well-defined chunks of content that can be combined in whatever way is most appropriate for a particular platform. Applications do the combining of chunks based on their understanding of a universal content structure. All display and workflow issues are addressed by delivery applications, rather than by a content management system earlier in the process.

An individual application is developed for each delivery mechanism needed. Content management systems tend to have as many moving parts as the old Linotypes, but with semantic publishing, each application is only as complex as its specific delivery mechanism requires. Applications can be as modest as a few lines of code or as robust as a full-featured publishing solution. Each application evolves at its own pace rather than having the rate of change dictated by the needs of other applications, but since all solutions are based on the same content structure, applications evolving at a different pace remain aligned.

The semantic publishing concept also reflects a strategy that more organizations will be adopting in the future when a significant percentage of online content distribution will happen through aggregators and other third parties and not necessarily in ways intended by content creators. Any application (regardless of its author) that understands a chunk of structured content has the ability to distribute that content. Regardless of creators’ justifiable squeamishness and huge copyright issues, the online environment is going to get more chaotic over time rather than less and there will be opportunities in chaos for clever content creators who profit more from distribution than ownership.

Semantic publishing will take advantage of developments in XML (rules for encoding documents), RDF (a method for classification of data on Web sites) and OWL (a semantic markup language for sharing sets of concepts within a domain) and any useful standards that come out of efforts to create the Semantic Web (a replacement for the World Wide Web that would use structured content to understand and automatically satisfy what people need.)

The difficulty in getting enough people to agree to standards may hinder the emergence of the Semantic Web, but semantic publishing doesn’t have to wait for the same consensus. While standardized markup would offer greater distribution, eager organizations can tag their content however works best for the distribution applications they build. (Developers of applications outside of an organization could always request a key for translating the content’s structure.)

In order to be as flexible as possible, semantic publishing will be built with the dumbest open source technology available. There’s no reason, for example, that semantic markup needs to be any more complicated than simple text:

<thesis> The publishing industry would be in much better shape if it could just stop concentrating on publishing things. </thesis >

<history>On a larger scale, you can trace a line from today’s dangerous concentration of media ownership directly back to production innovations of the 1950s.</history >

Switching from print-dominant production to semantic content publishing will transform newspaper companies into content engines and allow them to leverage their real main export (news content) as never before. Newspaper companies have always been in the data management business. Now it’s time they get good at it.