The skills that are necessary to commit responsible storytelling on data, range between data processing, journalism, coding, and design.
An ideal team of individuals, known as the data unit, who perform data storytelling, is composed of professionals such as data wranglers, storytellers, developers, editors and designers.
Each member of the data unit possesses a different skillset. The data unit acts as a team, with individual strengths contributing to the overall output. The data unit is largely without hierarchy, and every member’s contribution is equally as important and necessary as any other member’s in order to complete the objectives set out by each stage of the data pipeline.
Thus members must work together and collaborate in the decision-making processes in order to progress the storytelling.
“There’s a digital revolution taking place both in and out of government in favor of open-sourced data, innovation, and collaboration.”
- Kathleen Sebelius, Former United States Secretary of Health and Human Services
There may be many challenges an organisation or team will face in terms of a shortage of available or permanent resources, or capacity. Realistically, not every company, organisation, or individual will be able to establish a full-scale data unit.
Yet there are many ways in which the data unit can be materialised in order to suit the resources available to the company, organisation, or individual. To help with this, there are two perspectives from which you can view the data unit. The perspective you choose is dependent on what best supports your current circumstance.
Data storytelling is made possible through investing in networks, connections and associations, leveraging resources and tapping into the existing online community where you can always find a way to simulate or develop a particular skill or competency, or find someone who can help out with a task you may not be able to cover in-house.
It is important to note that the size of the data unit does not prevent an individual, company or organisation from successfully telling stories with data.
The jack of all trades is a lone ranger. From data sourcing to cleaning, analysis and visualisation. Alone, this one-man data unit completes every task along the data pipeline. And the Open Movement has a lot to do with the jack of all trades’ ability to do so.
The international drive towards ‘openness’ has not only made the sourcing of data much easier, but has also been the driving force behind the development of many of the tools that make the cleaning, processing and visualisation of data possible - without the need to be an expert in any single part of the process.
Below are some examples of the way in which the data liberation has made it easier for the jack of all trades to work alone:
Sourcing |
Open Government and Open Data initiatives |
Cleaning |
OpenRefine, Google Sheets and more. |
Analysis |
Google Sheets, LibreOffice, Tableau Public and more. |
Visualisation |
Datawrapper, Charted, RAWGraphs, Tableau Public, Carto (for map-based visualisation), Google DataStudio, KnightLab Storytelling Tools, Piktochart, Infogr.am and more. |
The large-scale data unit includes all the roles required for storytelling with data. Large-scale teams often have a long-term vision that includes opportunity for long form investigative stories, shorter projects as well as experimentation of methods and techniques.
Thus this data unit is not solely focused on data storytelling but also dedicated to progressing or advancing the field of data storytelling, data liberation and open-source.
You can see examples of large-scale data units at The Guardian, La Nación and Australian Broadcasting Corporation (ABC), and the BBC.
The Data Unit is best understood by the different roles that need to be fulfilled. The typical roles that make up a data unit are:
But what do they actually do? Let’s look at each role in light of their tasks and competencies.
The project manager oversees the management of the complete process from ideation to publication and distribution.
Data wrangling is the most impactful step in the data pipeline.
This is because every other role within the unit is, to a significant degree, dependent on the output of the data wrangler in order to move ahead.
The storyteller conducts the journalism on the numbers. The role of the storyteller is not only to dig around in the data to find the stories, but to go out into the world to see if those stories hold true i.e. conduct reporting on the data.
Remember the numbers are not the story, just the source of the story. The storyteller will still have to go out, report on it, and weave together a story that best captures the message.
Data storytelling, often digitally-driven, presents the need for the inclusion of a developer in the unit who assists with the technical elements of the project. These technical elements can take on the form of data visualisations, storytelling templates, data-driven tools and web applications.
Ideally, the developer in this case should also possess a basic understanding and basic working proficiency in visual communication and design.
Collaboration is defined as the action of working with someone to produce something and the success of the data unit is dependent on each member’s ability to work collaboratively.
But what does it mean to collaborate? And how far does this collaboration extend into the workflow of each of the individual roles making up the data unit?
In order to collaborate effectively as a member of the data unit, or merely to context-switch between skills when working alone, there are a few things you need to consider before getting started:
It is critical that each member of the data unit understands something of the function of the others. Without this understanding, as a member of the data unit, you will be unaware of what each member needs from you and when they need it, or when you will receive what you need from the others.
Different technical skills follow different processes and need to uphold different standards in the fulfilment of their tasks. The workflow for each member in the completion of their objective is different and it may be some time before they are able to deliver the output that another member requires in order to move forward.
Almost all decision-making during the course of the project is done collaboratively. The expertise of every member in collaboration with other members is needed to progress and shape the project. Often, the work of one member is prepared and completed to suit the needs and requirements of the member (or competency) to follow. And this informs the fundamental mode for collaboration.
What does collaboration look like and what process does it follow?
The project manager is assigned to a project, in this instance the project is a data-driven story. The project manager then assembles a team together. This team is made up of a data wrangler, a storyteller, a developer/designer, and of course, the project manager.
The project is introduced to the team. The team members, while leveraging their collective expertise, shape the agenda of the project, determine a timeline that works in conjunction with an estimation of assumed tasks and knowledge of individual workflows, as well as bearing in mind the scale of the project itself.
Once the agenda has been set, the project manager is responsible for allocating resources and capacity (human and other), as well as acquiring that which is not available internally, in order to see the fulfillment of the project at hand.
At times, when kicking off a data story, you will find that you begin so in possession of a dataset. In the event that this is not the case, the relevant data to pair with the agenda set by the project manager in collaboration with the team, will have to be sourced.
The act of sourcing data includes both finding the data and getting the data. In terms of the data unit, this is often done by the the storyteller, or the data wrangler, or both. This is often dependent on the competencies that either of these individuals possess.
At times the storyteller will have nurtured the correct relationships and contacts to source the relevant data directly through domain experts, or data custodians, or be familiar with various data portals and other online sources of information. Whereas the data wrangler may be more skilled in actually getting the data, by means of download and scraping, or pulling it in from an API, while making sure that it is in a machine-readable format that can be cleaned and analysed.
The storyteller may be more skilled in the process of verification when it comes to content-related aspects, while the data wrangler may prove more efficient in verifying the dataset itself, uncovering the source of the data, and authoring the data profile.
Once these two members of the data unit are satisfied with the data they have, the data wrangler can proceed to clean the data and structure it for analysis.
This next stage is performed by the data wrangler. Data can only be mined for stories once it has been cleaned and structured for analysis. Cleaning the data involves getting rid of data errors, false positives, removing or redefining empty cells and duplicates, correcting inconsistencies in spelling or the presentation of the data, applying the correct formatting to different data types, spotting and understanding the outliers, and much more.
Any one act of data cleaning involves two processes: (1) the detection and (2) the correction of a data entry (or a group of data entries). Depending on the size and quality of a dataset, data cleaning can often be the most time consuming task for thea data wrangler to perform. Only with clean and correctly structured data is the data wrangler able to proceed with the initial exploratory analysis and interpretation of the data.
Note, the data wrangler will also structure the data bearing in mind the objectives of the project, as established by the project team.
Analysis happens in two stages and committedly involves both the data wrangler and the storyteller.
Firstly, with the data cleaned, the data wrangler is able to execute the initial exploratory analysis and interpretation of the data. This process seeks to uncover the type of information contained within the dataset, while surveying the opportunities available for storytelling. Additionally, the data wrangler also considers different ways in which the data can be grouped, and what these groupings say about the dataset or topic as a whole.
The data wrangler will feedback the initial findings to the rest of the data unit. The data wrangler will need the storyteller’s eye and investigative integrity to lead the next leg of analysis – that is, mining for stories.
Based on the findings that are presented by the data wrangler to the storyteller, the storyteller will, both alone and in collaboration with the data wrangler, develop a selection of critical or core questions with which to interview the data. The data wrangler will use this as a guide to process the data and dig for answers. These answers, which will be presented in the form of data points will act as a springboard from which the storyteller will begin shaping the data story.
Now that the data has been mined for storytelling, and various insights have been lifted from the dataset(s), the storyteller can begin their job of bringing the story to life.
In order to communicate and build relevance around the information contained within a dataset, the storyteller must first reach out to the key stakeholders around whom the story centres. It is the storyteller’s role to discover whether the data holds true to the real world and understand the ways in which the data, and its interpretation, will benefit or enrich those for whom the story would have the highest impact. Outliers, peaks, trends and variances found in the data need to be investigated and reported in the same way as any other story leads.
The storyteller will report on the data and ask relevant stakeholders questions surrounding the insights garnered from the data. After this, the storyteller will take this newly acquired information, return to the data and assess the information in the context of the dataset and vice versa. This process of alignment is one of the critical stages in beginning to shape the data story.
At this point you would bring the data wrangler and project manager back into the fold and begin to further develop and carve a story out of the resources, reporting, and data insights acquired up until this point.
Lastly, the developer/designer will be brought in and updated regarding the direction of the story and to begin brainstorming possibilities for the visualisation or presentation of the data, and the story as a whole.
When we talk about package and present, we are referring to the final stage in the processing of a dataset. Packaging and presenting the data story includes the story map, which is the blueprint for how the different elements should be structured in order to convey the final message. These different elements could include different combinations of written content, data visualisations, as well as any graphics, photographs or other multimedia elements.
This process is collaborated on by the storyteller and the developer/designer, while being overseen by the project manager. Initially, the storyteller and developer/designer will plan the presentation of the story. Together they will define the layout and storymap, meaning they will determine what elements will be positioned at which points along the story, how the various elements will interact with one another, whether the representation of the data in its various forms throughout the story is cohesive in its efforts to convey the data insights guiding the story, and assess the ways in which these elements, and data points, may enrich or distract audience members in their following of the story. This session will be fed back to the project manager, and once approved, the developer/designer will proceed to implement the storymap.
In terms of visualising the data, the developer/designer will consult the data wrangler during this phase in order to get access to the relevant data and discuss and share ideas around the ways in which the data is best presented. The storyteller will review the work in terms of its relation to the written content, while the project manager will continue to oversee the entire process, making sure ethical guidelines and publication standards are upheld.
The developer/designer in this case is responsible for packaging the final product to suit online and offline publication.
This curriculum has been developed by OpenUp.
That's great to hear! We want to make it even better and could really use your feedback.
How will you apply what you learned?
You are free to use, share, and adapt this content to your needs. Do you want to teach others? Let us know how we can help.
We're sorry to hear that.
Please let us know how we can improve.