Requirements for an Accessible PDF: Part 2 – Common Accessibility Barriers

In the first post of this series, we covered why integrating accessibility at the very first steps of a project can be beneficial to the process of PDF remediation. We also covered why it is always better to plan accessibility in advance, by building it into the source document rather than bolting it onto the PDF document.

The most efficient way to build in accessibility and to ensure properly tagged content in a PDF document is making sure the proper semantics are brought into the document prior to its conversion to PDF. This means these efforts need to be implemented in the authoring tool, rather than in Acrobat Pro. It is always possible to wait and do everything in Acrobat Pro of course, but at the cost of more significant efforts in terms of time and complexity.

Unfortunately, due to poor authoring techniques and an even poorer conversion process, most PDF documents end up being untagged, meaning that they lack any kind of mark up that makes content accessible to assistive technologies. Just as it is the case in HTML, untagged PDF documents are very annoying and difficult to use for people who cannot see and need to use screen readers in order to discover their content.

Some of the most common, and perhaps easier to understand accessibility problems untagged PDF documents impose on users are related to missing or poor:

  1. Document titles
  2. Alternate text for images
  3. Content structure
  4. Language indicators
  5. Navigation patterns

The following is meant as a reminder of why these elements are crucial to help users better experience PDF content.

1. Missing or poor document titles – The title is the first information that is conveyed to screen reader users when opening a PDF document. This information is important for them to validate which document they’re currently viewing, especially when there is more than one document opened at the same time. Confusion can often arise when that information is not properly disclosed or when the title of the document does not clearly reflect its content.

2. Missing or poor alternate text for images – Despite what the adage says, a picture isn’t always worth a thousand words. Graphical elements like pictures, icons, illustrations, charts, and the like convey information that only exists in a graphical format. When informative images are not described in text, this content is lost to screen reader users. On the other hand, decorative images need to be ignored by screen readers, in order to reduce noise. Unless the relevant information is efficiently transferred to screen readers, users will run into major problems trying to make sense of the content.

3. Missing or poor content structure – The general structure of a document, including such components as paragraphs, lists, and headings, is normally represented visually, thus helping users understand how a document is organized. Screen reader users, being unable to see the visual formatting of a document, cannot rely on this information to understand its organization. For them to be able to do so, users need that visual organization conveyed semantically, using PDF tags. When such information is not missing or poorly implemented, screen readers will not be able to reliably convey the structure to the users. As a result, it can easily become very difficult, if not impossible, for a user to consume content.

4. Missing or poor language indicators – Screen readers cannot guess which language the content of a PDF is in. For screen reading software to be able to read the content of a document in the proper language, this information needs to be part of the document. When the main language of a document is undefined, screen readers will default to reading the content using their base settings, which may or may not be the appropriate synthesis to read the content. When this happens and the default setting is not appropriate (for instance, a French voice synthesis attempting to read a document written in English), the content will be for the most part unintelligible to screen reader users, no matter how much accessibility has been built into it.

5. Missing or poor navigation patterns – Users will normally rely on their sight to quickly navigate in a document, isolate the section they’re interested in and jump directly to it. Without the ability to see the pages and using a screen reader, it is not possible to jump directly to a precise position on the page, without having listened to all the preceding content. In order to make it easier for users to navigate through a document, navigation patterns need to be integrated in the document, in the form of bookmarks and internal links. Lacking such patterns makes quick navigation in a document much more difficult for non-sighted users.

These are just a few of the common accessibility barriers people with disabilities often run into when trying to consume untagged PDF content. The main reason for this is lack of knowledge and awareness from the author’s part, but also unreliable tools and techniques. Hopefully, through our next posts, we will be able to cover some of these issues and help improve PDF navigation.

In the upcoming posts, we will go over best practices to help prevent these barriers in Microsoft Word and OpenOffice. We will then conclude by going over the basics of an efficient conversion process to PDF, so all the accessibility features included in the authoring source are automatically and reliably transferred to PDF.

The blog posts in this series include the following and will be published over the next few weeks:

We hope you enjoyed this second part of our series on PDF Accessibility Requirements. Do you feel these barriers are the most important ones? Do you feel other barriers should have been included? Don’t hesitate to leave a comment so we can keep this conversation going.

photo of Denis Boudreau

About Denis Boudreau

Denis is a Principal Web Accessibility Consultant and currently acts as Deque's training lead. He actively participates in the W3C, the international body that writes web accessibility standards, as a member of the Education and Outreach Working Group, where he leads the development of a framework to break down accessibility responsibilities by roles in the development lifecycle. He is also involved in the Silver TaskForce, where he contributes to the development of the W3C's next generation of accessibility standards.
update selasa

Comments 2 responses

  1. Hi Denis,

    When the main language of a document is undefined, screen readers will default to reading the content using their base settings

    Actually in all of our tests we’ve found that the reader will default to PDF19 to determine the language when PDf16 is not available. If both PDF16 and PDF19 are not available then it will use the software default language to read the documents.

    I only mention this because when using LifeCycle in our environment it is not possible to set the language of the document (PDF16) but everything is properly tagged using PDF19 and we have not encountered issues.

    Great article!

    Roch Lambert

    @rlambert27

  2. Thanks for the note Roch, great catch! That’s really good to know! I am definitely going to have a closer look at this. 🙂

    For those of you who didn’t understand what Roch was referring to, he is talking about the PDF Techniques for WCAG 2.0 – .

    And techniques PDF 19 and PDF16 in particular:

    PDF19: Specifying the language for a passage or phrase with the Lang entry in PDF documents )

    PDF16: Setting the default language using the /Lang entry in the document catalog of a PDF document )

Leave a Reply

Your email address will not be published. Required fields are marked *