For our purposes, a 'blob' can be defined as a lump of web content that has not been properly content modelled; which is to say that it is comprised of a myriad of different content types all smashed together and stored and retrieved as a single unit. Typically a blob manifests itself as a generic WYSIWYG field in a content management system, but can also take the form of PDF documents (the blobbiest of blobs we ever blobbed on to the web), poorly executed infographics and banners, or PowerPoint presentations.
Most likely, you call your blobs 'pages', and they are neatly organised into hierarchical blobby trees with blobby branches reflecting the routes you want your visitors to take while
enduring your blobby website.
This hierarchical blob approach to web content is ubiquitous and pervasive. It also triggers a chain of toxic problems with far-reaching effects — some of which we don't even know about yet. At best, you'll have unengaging, generic content that no-one ever sees; at worst you'll find yourself in a never-ending artificial content forest infested with legacy quirks, complicated workarounds and near-duplicates, and slowly but surely, you start to dread the prospect of touching your web content.
Those problems stem from two related causes:
- The arbitrary hierarchical organisation of your blobs, and
- The singular nature of the blobs themselves.
I'll address each in turn.
What's wrong with hierarchies?
Nothing is inherently wrong with hierarchical content trees — they are a cornerstone of good information architecture, and allow content owners to impose rationality and structure on otherwise chaotic data. Without at least some form of basic hierarchy, most websites would be a confusing mess.
The problems emerge when those hierarchies are directly and arbitrarily defined by collections of standalone generic pages. What do I mean by that? Let's delve deeper…
1. How deep do you want to go?
"One of the biggest problems with categorizing things in advance is that it forces the categorizers to take on two jobs that have historically been quite hard: mind reading, and fortune telling. It forces categorizers to guess what their users are thinking, and to make predictions about the future."
Clay Shirky, Ontology is Overrated
For one of our clients, this problem had become painfully obvious. Their 'about us' section alone had grown to 4 levels of nested page blobs and was larger than most small websites. (Their US counterpart was even worse, reaching 6 levels at one point). Every time a page was added to the site, an editor would need to define a single spot where the page should live in the enormous site hierarchy, and it would appear in the site navigation automatically (unless they ticked a box to hide it).
Visitors were expected to drill down dutifully through these levels in order to learn 'about us'. It should be no surprise that none of them did. Site analytics showed pitifully low traffic for all but the top level 'about us' page. Bounce rates were high, and time-on-page was low.
At this stage you might be thinking that the hierarchy is not the problem here - perhaps it's just a case of poor quality content and inward-looking information architecture. But you'd be surprised how easily and how quickly a team of marketers can slip into this trap without really noticing it. We've seen it happen on lots of websites and it's really no-one's fault: the ability for site editors to automatically create page hierarchies facilitates and even encourages poor quality content.
The public-facing side of these troubled websites quickly becomes a direct reflection of the internal workings of the organisation - complete with departmental silos, jargon, convoluted lines of management, and decades worth of corporate quirks.
2. There is
Rigid hierarchies that are directly defined by their child items suffer from a few inescapable problems:
- What happens when an item needs to exist in more than one place in the hierarchy? Typically you create cross-linking placeholders or pointers or redirects (or - perish the thought - duplicate items!) so that you can achieve this whilst still maintaining a single definitive location for the actual thing itself. But you now need to manage and keep track of all of those cross-links in addition to everything else. Once the cross-links become numerous, this becomes a fragile and complex part of the editorial system.
- In many cases of hierarchical abuse, certain parent nodes only really exist in order to group their children into a common category - they are, ostensibly, meta data held against the real content. Yet these kinds of hierarchies demand homogenous, navigable blocks of content at every level, even if it's worthless content comprised of a single sentence. Some systems may attempt to "solve" this problem by adding further technological complexity in the form of options to make those nodes different, but this additional complication really just makes things worse and doesn't address the core problem.
- Except in rare circumstances, hierarchies are highly subjective and arbitrary. There are no right or wrong answers when it comes to categorising most items, which inevitably fuels the problem of creating introspective information architectures.
"The problem is, because the cataloguers assume their classification should have force on the world, they underestimate the difficulty of understanding what users are thinking, and they overestimate the amount to which users will agree, either with one another or with the catalogers, about the best way to categorize"
Clay Shirky in Ontology is Overrated
- URLs are sacrosanct. They are one of the most fundamental base components of the web, and they deserve your constant attention to ensure they remain clean, readable, short, user-hackable, and relevant... and critically, they should very rarely change and content should never be duplicated across multiple URLs.
Without the right technological solutions, content hierarchies that are directly cemented to content items can permanently ruin your site's URL scheme: typically a child item's URL includes all of its parent URIs — even where these make no contextual sense. For example, if a site editor decides to place your 'contact us' page under 'about us', the default URL will likely be `acme.com/about-us/contact-us`. In order to make it simply `acme.com/contact` you'll need to manage a separate sub-system of URL aliases or routing overrides (in addition to your system for cross-links!).
As part of the information architecture phase of a website redesign, we build navigational hierarchies to be flexible and "throwaway". Much like the colours, fonts, and layouts of a website, a website's navigation is something that needs to be highly changeable. It's something that deserves and requires constant attention — A-B/multivariate testing, tweaking, revisiting, seasonal variations, and often completely overhauling — to ensure that it's helping your visitors to fulfil their goals and your business to fulfil its digital objectives.
On all but the simplest of sites, the navigation usually benefits from being divorced from the actual content of the site, and managed separately. This allows for the content to be flattened, which in turn allows for unlimited flexibility in how that content is displayed, and how each item relates to the others. When those content relationships are defined based on the semantic nature of the content items, they will endure through years of site redesigns. This paves the way for future tools, techniques, technologies, protocols, environments, systems, and devices that haven't been invented yet to consume and repurpose that content, and allows your content to scale indefinitely.
"...we should always be looking to make a clean system with an interface ready to be used by a system which hasn't yet been invented"
Your navigation is too important to be rigidly cemented to the internal arrangement of your web pages, and your URLs are too important to be automatically generated and at risk of automatically changing when an editor drags-and-drops a page from one place in the hierarchy to another. More often than not, content-generated hierarchies go against the grain of the web.
Ok, hierarchical ranting over, let's move on to those blobs…
What's wrong with blobs
Since the dawn of the hyperlink, websites have been arranged into pages. This is the fundamental delivery mechanism of the web and it works well.
Unfortunately this has led to a conflation of 'pages' with 'content'. 'Pages' are no more 'content' than plates are food.
When your content is stored as singular page blobs, its scope and longevity are severely limited: the content probably only works in the current context (e.g., a website), and has no awareness of the wider content ecosystem which it could and should be a part of - both internally and externally.
Emerging channels such as Google's "AMP" project, Facebook's "Instant Articles", and Apple's News services are good examples of how content can be externally syndicated in new ways, potentially reaching new and wider audiences. But to do this, your content needs to be readily adaptable and able to be semantically structured according to what it "is".
"...what happens when toaster printers become a reality? Is your content ready to be burned onto delicious toast?"
If you have ever searched the web for a film, or a recipe, or a book or an album, you'll probably have noticed some very prominent results listings appearing at the top and/or to the side of the ordinary results that might include star ratings, reviews, photographs, discographies etc. If it's a new film, there might even be listings of local cinema show times and links to book tickets. If it's an album by an artist that is currently touring nearby, you might see listings of their upcoming shows. All this is made possible by granular content with semantically structured meta data.
Content fragments are more easily syndicated across this multiverse of channels, devices, and services without the need for multiple publishing systems. This is the 'Create Once Publish Everywhere' ideology.
"The goal of any CMS should be to gather enough information to present the content on any platform, in any presentation, at any time"
Container-in vs. Content-out
Conversely, with the page-based approach, because each blob is semantically indistinguishable from all the other blobs, you have little choice but to adopt a homogenous 'container-in' approach to the design of your site. Many thousands of one-size-fits-all turnkey website templates thrive off of the container-in approach - you pay your $30 and get a very pretty looking container into which you pump your content.
But this does a terrible disservice to your content: if your content is worth publishing, then it should also be worth designing. Designing a web page from the content outwards instead of the container inwards gives a more engaging result and a better user experience. With the right automation systems, such an approach doesn't even need to take any extra effort on the part of the publisher.
Solving blobs: content modelling
By exploding your blobs into smaller fragments and creating strong relationships between them, you imbue your content ecosystem with an exciting prescient self-awareness: each fragment knows what it "is", and also has knowledge about what other fragments are related to it, along with the proximity and nature of those relationships.
Let's say you're reading a typical client case study on a corporate website: with granularly defined content fragments, that case study could, for example, completely automatically:
- display contact details and biographies of the most relevant staff members who worked with that client
- display testimonials from other clients who are in the same sector as the client in the case study
- link to a related set of products that the client has purchased
If one of those related products is later renamed several years from now, all the case studies will automatically update to reflect the new name. If an editor adds a new staff member in a particular sector, that staff member could automatically appear on 30 different sector case studies without having to edit each one.
Creating strong relationships between content fragments also allows for relationship traversals to be exploited: for example, you could generate a list of case studies related to a single staff member, or display a gallery of images related to a particular industry sector, or you could export all staff members who work with a particular client to a spreadsheet that gets emailed to you once a day, or you could automatically generate PDF product sheets per industry sector – all with minimal extra editorial effort.
As your content grows, that granularity allows new content relationships to blossom, and old relationships to automatically update. It allows external services that might not yet even exist to consume your content and provide rich extra functionality that helps you reach new audiences.
Building the kind of sophisticated content systems that avoid rigid hierarchies and sidestep content blobbyness is hard and complicated. These ideas can easily be taken too far (typically characterised by having content fragments that are too granular), making the editorial experience slow and frustrating, so care must be taken to strike the right balance.
It's also easy to over-simplify this procedure by creating content fragments that are directly bound to the current design/layout of the website - for example, having fields for 'sidebar content' and 'callout colour'. This just makes your fragments even less portable and future-proof than before!
"There's no perfect content model, only the one that works for your project"
This is the content ideology we pursue, but (as with any ideology), it's important to recognise and accept that in practice you will never achieve 100% purity, and shouldn't strive to. A perfectly pure system of flattened and normalised content would take a huge amount of effort to construct, would lack the flexibility to quickly react to changing content requirements, and would be plain non-sensical in places. But the benefits of getting most of the way there are easily worth the effort.