Requirements for an Accessible PDF: Part 1

When asked what the most important PDF accessibility requirements are, most people will tell you that in order to be considered accessible, PDF documents need to:

Be tagged using semantic markup so they become easily navigable by assistive technologies such as screen readers and refreshable braille displays
Have their logical reading order specified independently of their visual appearance so their accessibility is enhanced for people who use screen readers or must navigate through documents using a keyboard, a keyboard emulation device, or speech recognition software
Provide content that can be reflowed efficiently to optimize both text magnification on the screen and changes to text and background color, which makes it easier for some users to consume content

While these are all important, the most efficient way to produce accessible PDF documents is not to use remediation techniques after the PDF document has been created, but rather to prevent the need for remediation altogether. If you integrate accessibility early enough in the process, it is directly handled at the document source level, and a much more practical approach to PDF accessibility can be established.

This is especially true for draft documents that are meant to be edited and converted back to PDF format multiple times as they evolve. In many such cases, dealing with accessibility after the PDF conversion process will mean repeating the same efforts over and over again every time the document is reconverted to PDF format.

What this boils down to is that the most important PDF accessibility requirement is not tagging the PDF or working on its logical reading order or even making sure it reflows naturally; by building accessibility directly at the source and following an efficient conversion process, one can ensure that accessibility features are built right in the document rather than bolted on every time the document is converted to PDF format.

Working from an accessible source

It has often been argued that testing and remediating PDF files that come from an accessible source is generally much simpler than testing and remediating PDF documents coming from a source that isn’t accessible. The same goes for PDF documents that are improperly converted to PDF format, as these always provide more challenges than documents that are converted following an accessible “recipe”.

No matter where they originate from or how these documents are created, remediation is generally required for a document to be considered accessible. This is especially true when the document includes rich layout, data tables, complex images, or interactive form controls. When accessibility is integrated directly at the authoring source level, most of the efforts required to making a document usable to people with disabilities can be built in automatically, by applying simple authoring best practices like using styles or other basic built-in software features.

Let’s take documents created in Microsoft Word or OpenOffice, for example. Documents created in either of these authoring tools can have tremendous accessibility potential if the proper authoring process is followed. On the flip side, if these tools are used poorly, the resulting PDF document may very well end up a complete accessibility failure.

One of the reasons why it is so important to integrate accessibility prior to the conversion process is to have a chance to integrate as much structure and semantics as possible inside the document. As with accessible webpages, accessible PDF documents use tags to indicate the structural elements of the document’s content (chapters, headings, text, multi-column text, graphics, paragraphs, tables, footnotes, etc.) and how these elements relate to each other.

Tagging the different elements of content in a PDF document is what makes it navigable by assistive technologies and what provides the required semantics for computers and software programs to understand how the document is organized and what it is about. This is one of the most important aspects to consider for PDF accessibility.

This post is intended as the first post in a series of four that will walk you through the general concepts of creating source documents with accessibility in mind so they can easily be converted to accessible PDF format.

In the upcoming posts, we will discuss some of the most common accessibility barriers encountered by people with disabilities when using PDF documents and how implementing best practices in the authoring software can help bring those barriers down. We will then conclude by going over the basics of an efficient conversion process to PDF, so all the accessibility features included in the authoring source are automatically and reliably transferred to PDF.

The blog posts in this series include the following and will be published over the next few weeks:

Requirements for an Accessible PDF part one (this post)
Requirements for an Accessible PDF part two: Common Accessibility Barriers
Requirements for an Accessible PDF part three: Authoring Best Practices
Requirements for an Accessible PDF part four: Proper Conversion Process

We hope you enjoyed this first part of our series on PDF Accessibility Requirements. Don’t hesitate to leave a comment so we can keep this conversation going.

About Denis Boudreau

Denis is a Principal Web Accessibility Consultant and currently acts as Deque's training lead. He actively participates in the W3C, the international body that writes web accessibility standards, as a member of the Education and Outreach Working Group, where he leads the development of a framework to break down accessibility responsibilities by roles in the development lifecycle. He is also involved in the Silver TaskForce, where he contributes to the development of the W3C's next generation of accessibility standards.

View all posts by Denis Boudreau

Comments 7 responses

Vincent François says:

March 21, 2013 at 11:05 am

Great intitative to comme with solutions about PDF accessibility.

Do you plan to go further the three requirements in your next posts? Because, while they are surely very important ones, when we talk about a11y compliance (WCAG, SGQRI…), a lot of little checks have to be done and the devil is in the details…
dboudreau says:

March 21, 2013 at 12:19 pm

Thanks. 🙂

In future posts, probably. There’s so much to be said about PDF accessibility. These first four posts will only be scratching the surface, I already know that much…

The toughest part at the moment is really trying to stick to the most important message I want to convey at this point, which is that the best way to create accessible PDF is to start by learning to work efficiently in the authoring source, whatever that mat be (in this case, Word). And building a11y in the document prior to conversion is the most efficient way I know to do this. Much more than bolting a11y on top of an already converted PDF document. While this is sometimes easy to do, it can quickly turn into a nightmare.

Once I’ve covered this part, then I’ll certainly want to pursue with other aspects. It would be interesting to drill down into more specific related topics in the upcoming months: building semantics into the PDF document, controlling the reading order, playing with the navigation, order and tags panes, using the touchup reading order dialog box, all that good stuff.

Time will tell. ;p
Olaf Drümmer says:

March 28, 2013 at 12:11 pm

Sometimes I do not get why so many magic dances are danced around tagging, as if it were almost as challenging as rocket science.

All that needs to be done in an authoring tool like Word is to use styles sheets that are paragraphs (P) or headings at the right heading level (H1..H6), and to use list formatting for lists, the table feature for tables, the footnote feature for footnotes, and the link feature for links. Make sure to use a good PDF export path (Adobe Acrobat Pro – while not free of charge – does fix a couple of things the Microsoft Word alone still doesn’t seem to master in its save as tagged PDF function).

That will cover 98% or more of the tags necessary for all static content (forms would could a slightly more challenging adventure, and yes, one could also think about captions or block quotes, but you know what: if that just comes across as a regular paragraph, hardly anything substantial is lost).

That’s it, when it comes to tagging in Word. P and H1..H6. Not more.

Or am I missing something?

Olaf
dboudreau says:

March 28, 2013 at 4:39 pm

Olaf,

You’ve covered most of the basics indeed. I agree it really isn’t rocket science, but like many other things, even if it’s obvious once you know about these things, the average content author is usually clueless about such best practices.

I don’t know about you, but most Word documents I get to look at are usually very weak when it comes to semantics and accessibility. Which is one of the main reasons why it’s so important to educate authors about these best practices.
Yves Hudon says:

April 11, 2013 at 2:36 pm

Denis, Denis, Denis!!!

Well done!!! Que dire de plus!
Brendan Ryan says:

April 22, 2013 at 6:55 am

The very essence of any decent DAMS requires, in the case of PDF, well constructed and tagged documents. While software exists to read and extract metadata and content from PDF, the onus to create documents where such data CAN be extracted still rests with the originator. Therein lies the source of many frustrations: Those individuals who create PDFs for dissemination of data do not understand the importance of document structure and hve not been trained / have not bothered learning how to construct documents correctly. A case in point would be creative persons at ad agencies. On many occasions, newspapers have been forced to provide credits for adverts printing incorrectly. However, when performing a technical evaluation of a supplied document, one finds basic errors that can easilybe avoided with simple QC and training…. Eg why use Photoshop to layout an advert when weSHOULD be using InDesign?… Enough of my rant… Typing on a Blackberry should be illegal!
Peter Abrahams says:

June 18, 2013 at 6:01 am

It is worth pointing out that a well structured Word document which uses styles effectively is not only a good basis for accessible PDF but is more accessible in its own right. Word files are often used as early drafts of PDF documents and these drafts should be accessible.