OpenUp | The Data Unit

Welcome to the data unit

The skills that are necessary to commit responsible storytelling on data, range between data processing, journalism, coding, and design.

An ideal team of individuals, known as the data unit, who perform data storytelling, is composed of professionals such as data wranglers, storytellers, developers, editors and designers.

Each member of the data unit possesses a different skillset. The data unit acts as a team, with individual strengths contributing to the overall output. The data unit is largely without hierarchy, and every member’s contribution is equally as important and necessary as any other member’s in order to complete the objectives set out by each stage of the data pipeline.

Thus members must work together and collaborate in the decision-making processes in order to progress the storytelling.

“There’s a digital revolution taking place both in and out of government in favor of open-sourced data, innovation, and collaboration.”

- Kathleen Sebelius, Former United States Secretary of Health and Human Services

The structure of the data unit

There may be many challenges an organisation or team will face in terms of a shortage of available or permanent resources, or capacity. Realistically, not every company, organisation, or individual will be able to establish a full-scale data unit.

Yet there are many ways in which the data unit can be materialised in order to suit the resources available to the company, organisation, or individual. To help with this, there are two perspectives from which you can view the data unit. The perspective you choose is dependent on what best supports your current circumstance.

Two perspectives for approaching the data unit

A unit comprised of a team of experts such as the project manager, the data wrangler, the storyteller, the developer and/or the designer. This perspective is followed when an organisation, or collective of individuals, are able to support the costs and operation of a full-scale data unit.
A collection of competencies or skills that make available the resources necessary to wrangle data, develop and design data visualisations, write responsible and authentic data stories, and manage data projects. This perspective is followed when the availability of funds, resources, and/or capacity is lacking.

Data storytelling is made possible through investing in networks, connections and associations, leveraging resources and tapping into the existing online community where you can always find a way to simulate or develop a particular skill or competency, or find someone who can help out with a task you may not be able to cover in-house.

It is important to note that the size of the data unit does not prevent an individual, company or organisation from successfully telling stories with data.

Two types of data units

Jack of all trades

The jack of all trades is a lone ranger. From data sourcing to cleaning, analysis and visualisation. Alone, this one-man data unit completes every task along the data pipeline. And the Open Movement has a lot to do with the jack of all trades’ ability to do so.

The international drive towards ‘openness’ has not only made the sourcing of data much easier, but has also been the driving force behind the development of many of the tools that make the cleaning, processing and visualisation of data possible - without the need to be an expert in any single part of the process.

Below are some examples of the way in which the data liberation has made it easier for the jack of all trades to work alone:

Sourcing	Open Government and Open Data initiatives
Cleaning	OpenRefine, Google Sheets and more.
Analysis	Google Sheets, LibreOffice, Tableau Public and more.
Visualisation	Datawrapper, Charted, RAWGraphs, Tableau Public, Carto (for map-based visualisation), Google DataStudio, KnightLab Storytelling Tools, Piktochart, Infogr.am and more.

The large scale data unit

The large-scale data unit includes all the roles required for storytelling with data. Large-scale teams often have a long-term vision that includes opportunity for long form investigative stories, shorter projects as well as experimentation of methods and techniques.

Thus this data unit is not solely focused on data storytelling but also dedicated to progressing or advancing the field of data storytelling, data liberation and open-source.

You can see examples of large-scale data units at The Guardian, La Nación and Australian Broadcasting Corporation (ABC), and the BBC.

Summary of roles and their competencies

The Data Unit is best understood by the different roles that need to be fulfilled. The typical roles that make up a data unit are:

The Project Manager
The Data Wrangler
The Storyteller
The Developer (and/or The Designer)

But what do they actually do? Let’s look at each role in light of their tasks and competencies.

The Project Manager

The project manager oversees the management of the complete process from ideation to publication and distribution.

Tasks at a glance

The project manager helps in setting the agenda and making editorial decisions. For example, the scale of the storytelling.
The project manager ultimately understands the data unit and the varying workflows of individual members of the team.
The project manager works with individual team members to determine priorities and deadlines for deliverables.
The project manager unblocks individual team members where possible or finds solutions, resources, or experts that could.
The project manager keeps members informed about any decisions, changes, or updates concerning the project.

Core competencies

General

Experience managing the collaboration of multidisciplinary teams
Coordinating workflow and managing individual resources within the data unit
Establishing goals and expectations while ensuring standards are maintained
Possesses strong negotiation skills
Recruit and manage additional or external resources and relationships
Cooperate and liaise with designers, developers, journalists/storytellers, data wranglers, department heads or other departments and/or stakeholders
Provide high level support to all members of the Data Unit, and the Data Unit as a whole
Pre-empting and managing conflicts of interest
Meet deadlines and budget requirements

Editorial

Develop story or content ideas that consider the reader and audience appeal
Checks content for accuracy and errors and has proven working experience as an editor in terms of proofreading, editing and improving stories or other pieces
Verify facts, dates, and statistics, using standard reference sources
Experience in making judgements on matters of greater editorial sensitivity is essential
When publishing oversee that laws and ethical guidelines are abided by, such as constitutional laws and rights to privacy
Oversees the layout (artwork, design, visualisations)
Comply with organisational regulations and ethical guidelines

The Data Wrangler

Data wrangling is the most impactful step in the data pipeline.

This is because every other role within the unit is, to a significant degree, dependent on the output of the data wrangler in order to move ahead.

Tasks at a glance

Data wranglers are responsible for sourcing the data, understanding the data, cleaning the data, and conducting analysis on the data.
Data wranglers conduct initial and exploratory analysis on the data that helps to determine its contents and limitations.
Data wranglers guide other members in the interpretation and the presentation of the data.
Data wranglers are responsible for distributing data samples (and insights) to other members of the team, thus allowing them to progress in their respective tasks.
The data wrangler must also consider different ways in which the data could be shaped, and how these different categorisations can find different ‘stories’ in the data.
The data wrangler, in conjunction with the journalist, should probe the data from a journalistic perspective, by asking investigative questions, to identify the ways in which the data can be connected to something of human interest.

Core competencies

General

Strong problem solving and troubleshooting skills with experience exercising mature judgment
Has a curious mind and loves playing with data
Experience working independently with minimal guidance
Strong communication skills and the ability to work in collaboration with storytellers/journalists, designers, developers, editors, and other data wranglers/analysts
Comfortable with a fast paced, quick turnaround work environment
Develop strategies, standards and best practices in the areas of data visualisation, data access and data analysis
Verify facts, dates, and statistics, using standard reference sources

Technical

Experience cleaning data and loading the transformed data into a structure best suited to the project objective and analytic approach
Is able to conduct expert analysis on data
Knowledge of and experience using a scripting language such as Python
Experience writing data scrapers
Experience in data mining, databases and SQL, statistical packages and programming languages such as R, Stata and pandas or scikit-learn in Python
A knowledge of Javascript D3.js and GeoJSON
Basic data visualisation skills
Experience working with online mapping tools such as CartoDB as well as QGIS
A basic understanding of SVG (Scalable Vector Graphics)
Spreadsheet fluency with a solid working knowledge of Excel or equivalent
Experience working with Open Source technologies

The Storyteller

The storyteller conducts the journalism on the numbers. The role of the storyteller is not only to dig around in the data to find the stories, but to go out into the world to see if those stories hold true i.e. conduct reporting on the data.

Remember the numbers are not the story, just the source of the story. The storyteller will still have to go out, report on it, and weave together a story that best captures the message.

Tasks at a glance

The storyteller has to conduct adequate research surrounding the topic that governs the dataset in order to understand the value it holds, the limitations that apply, and to formulate relevant questions that the data will try to answer.
Based on the shared pool of knowledge that exists between the data wranglers experience of working with the dataset/s and the storytellers research, pre-existing insight into the topic, and the overarching objective of the project, they will sit in discussion and formulate a single or series of core questions that will define the story angles.
The storyteller must interview the data and first seek to answer six basic journalistic inquiries: who, what, when, where, why, and how. This will help the storyteller build context and position the data within their knowledge regarding the topic at hand.
The storyteller is responsible for developing the story map. In this session, the storyteller, the project manager and the developer/designer work hand-in-hand with defining the presentation, or packaging, and delivery of the story.
The storyteller prepares all written content for the project, while upholding ethics and standards for reporting on, and writing about certain topics where relevant.

Core competencies

General

Strong communication skills and the ability to work in collaboration with other storytellers/journalists, designers, developers, editors, and data wranglers
A willingness to build strong functional/productive relationships with other resources
Comfortable with a fast paced, quick turnaround work environment
Demonstrate excellence in research and interpreting data, excellent written and spoken communication, numeracy and literacy skills, creativity, journalistic judgement, confidence and good diplomacy skills
Journalistic experience is desirable
Experience reporting and conducting interviews

Technical

Strong data analysis skills and the ability to dig out big-picture stories in empirical research
Able to identify the primary points of interest in a complex dataset, and convey them clearly and concisely for a general audience
Ability to see how socio-economic trends affect consumers in their daily lives
Experience working with Open Source tools and technologies such as Google, Tableau, etc. is desirable
Experience in, or exposure to, working in data-mining/scraping; databases and SQL; statistical packages and programming languages such as R, Stata and pandas or scikit-learn in Python; and visualisation tools like D3.js and GeoJSON
Sometimes, data storytellers will be expected to be able to create their own graphics, in order to allow specialised designer/developers to concentrate on more ambitious visualisations

The Developer (and/or Designer)

Data storytelling, often digitally-driven, presents the need for the inclusion of a developer in the unit who assists with the technical elements of the project. These technical elements can take on the form of data visualisations, storytelling templates, data-driven tools and web applications.

Ideally, the developer in this case should also possess a basic understanding and basic working proficiency in visual communication and design.

Tasks at a glance

The developer is responsible for producing data visualisations, cartography, and infographics while ensuring the best representation of data in all formats (digital and print).
The developer is responsible for building a framework that supports all elements of the story.
The developer must ensure that the highest (and most up to date) standards are upheld in the process of developing online resources.
The developer is responsible for resolving issues related to publishing, embedding, and distributing these graphics.

Core competencies

General

Strong communication skills and the ability to explain complex technical concepts to journalists, designers, developers, editors and others, including a willingness to build strong relationships with other departments
The ability to take responsibility and go into challenging situations with few guardrails with the view to improving processes
Experience building products in multidisciplinary teams
Good at seeking advice from, and consulting the team, in order to make the best decisions

Technical

Story mapping: this includes includes defining the presentation (or packaging) of the story.
Produce data visualisations, cartography and infographics and ensure the best representation of data in all formats (print, app and web)
Exposure to working with data (large/complex datasets), and sound basic data literacy skills
Visual design principles
Practice storytelling and design
Responsible for creating user-friendly visualisations
Work with and manage different multimedia elements
The ability to pick up new programming languages with ease
Excellent front-end web development skills
Languages: HTML/5, CSS/SCSS, Javascript, Python, Ruby
Good working knowledge of semantic markup patterns
Good understanding of modern techniques such as Responsive Web Design (RWD) and graceful degradation for older browsers
Excellent JavaScript programming skills, with knowledge of developing large, modular applications
Experience with Modern JS libraries (Angular/React/Polymer) and Version control systems (Git, Subversion)
Experience working with open source tools

Workflows and collaboration

Collaboration is defined as the action of working with someone to produce something and the success of the data unit is dependent on each member’s ability to work collaboratively.

But what does it mean to collaborate? And how far does this collaboration extend into the workflow of each of the individual roles making up the data unit?

In order to collaborate effectively as a member of the data unit, or merely to context-switch between skills when working alone, there are a few things you need to consider before getting started:

Understanding

It is critical that each member of the data unit understands something of the function of the others. Without this understanding, as a member of the data unit, you will be unaware of what each member needs from you and when they need it, or when you will receive what you need from the others.

Workflow

Different technical skills follow different processes and need to uphold different standards in the fulfilment of their tasks. The workflow for each member in the completion of their objective is different and it may be some time before they are able to deliver the output that another member requires in order to move forward.

Collaboration

Almost all decision-making during the course of the project is done collaboratively. The expertise of every member in collaboration with other members is needed to progress and shape the project. Often, the work of one member is prepared and completed to suit the needs and requirements of the member (or competency) to follow. And this informs the fundamental mode for collaboration.

The unit in action: mapping collaboration along the data storytelling pipeline

What does collaboration look like and what process does it follow?

It begins with the team

The project manager is assigned to a project, in this instance the project is a data-driven story. The project manager then assembles a team together. This team is made up of a data wrangler, a storyteller, a developer/designer, and of course, the project manager.

The project is introduced to the team. The team members, while leveraging their collective expertise, shape the agenda of the project, determine a timeline that works in conjunction with an estimation of assumed tasks and knowledge of individual workflows, as well as bearing in mind the scale of the project itself.

Once the agenda has been set, the project manager is responsible for allocating resources and capacity (human and other), as well as acquiring that which is not available internally, in order to see the fulfillment of the project at hand.

What are the challenges or obstacles that can be encountered during this stage?

It is important to know that each of the generic roles (and their respective competencies) mentioned above, does not necessarily add up to suit the entirety of every project's needs. Every storyteller, will have different content expertise, sources, and working experience. This is the same with every developer, data wrangler, designer and project manager. You will most likely, at some point during the course of the project, have to pull in additional resources, query content-experts, or scale down on the production of the project to cater to budget and availability of resource and capacity.

Data stories typically begin in one of two ways, either the project is kicked off by the discovery of some interesting data, or alternatively a question (story angle) is identified, which could potentially be answered by a database. Whichever way the project begins, it’s important not to be weighted down by these exploratory questions, but to be guided by them. In other words, the initial story angle may change as the data is explored, or other sources of data are discovered, and as the data is processed. However, bearing the initial question in mind throughout the process will ensure that the team doesn’t get lost on unnecessary tangent(s).

Find and Prepare

At times, when kicking off a data story, you will find that you begin so in possession of a dataset. In the event that this is not the case, the relevant data to pair with the agenda set by the project manager in collaboration with the team, will have to be sourced.

The act of sourcing data includes both finding the data and getting the data. In terms of the data unit, this is often done by the the storyteller, or the data wrangler, or both. This is often dependent on the competencies that either of these individuals possess.

At times the storyteller will have nurtured the correct relationships and contacts to source the relevant data directly through domain experts, or data custodians, or be familiar with various data portals and other online sources of information. Whereas the data wrangler may be more skilled in actually getting the data, by means of download and scraping, or pulling it in from an API, while making sure that it is in a machine-readable format that can be cleaned and analysed.

The storyteller may be more skilled in the process of verification when it comes to content-related aspects, while the data wrangler may prove more efficient in verifying the dataset itself, uncovering the source of the data, and authoring the data profile.

Once these two members of the data unit are satisfied with the data they have, the data wrangler can proceed to clean the data and structure it for analysis.

What are the challenges or obstacles that can be encountered during this stage?

Allowing sufficient time for this aspect
Unable to source the exact data you were hoping to get
The data cannot so easily be transformed into a machine-readable format
Source of the data is either unknown or unreachable
Uncertain about the terms of usage
Access to data that has been sourced via PAIA (Promotion of Access to Information Act) could experience delays and/or rejection.
If you are unsure of where or how to source data online, enquire about our Source and Clean short course!

Cleaning your data

This next stage is performed by the data wrangler. Data can only be mined for stories once it has been cleaned and structured for analysis. Cleaning the data involves getting rid of data errors, false positives, removing or redefining empty cells and duplicates, correcting inconsistencies in spelling or the presentation of the data, applying the correct formatting to different data types, spotting and understanding the outliers, and much more.

Any one act of data cleaning involves two processes: (1) the detection and (2) the correction of a data entry (or a group of data entries). Depending on the size and quality of a dataset, data cleaning can often be the most time consuming task for thea data wrangler to perform. Only with clean and correctly structured data is the data wrangler able to proceed with the initial exploratory analysis and interpretation of the data.

Note, the data wrangler will also structure the data bearing in mind the objectives of the project, as established by the project team.

What are the challenges or obstacles that can be encountered during this stage?

Lack of metadata can hamper cleaning efforts and decoding processes.
Certain aspects of verification are often dependent on getting in touch with the source of the data, or content-experts, and this may take a little time, if you are in fact successful in doing so.
Sometimes during the cleaning phase the data can be found to be too poor of quality to be used, and you will have to return to the sourcing stage of the data pipeline.
Actual cleaning of data can be extremely time-consuming, ensure enough time is allocated to this task
If you are unsure of where or how to clean data, enquire about our Source and Clean short course!

Analysis

Analysis happens in two stages and committedly involves both the data wrangler and the storyteller.

Firstly, with the data cleaned, the data wrangler is able to execute the initial exploratory analysis and interpretation of the data. This process seeks to uncover the type of information contained within the dataset, while surveying the opportunities available for storytelling. Additionally, the data wrangler also considers different ways in which the data can be grouped, and what these groupings say about the dataset or topic as a whole.

The data wrangler will feedback the initial findings to the rest of the data unit. The data wrangler will need the storyteller’s eye and investigative integrity to lead the next leg of analysis – that is, mining for stories.

Based on the findings that are presented by the data wrangler to the storyteller, the storyteller will, both alone and in collaboration with the data wrangler, develop a selection of critical or core questions with which to interview the data. The data wrangler will use this as a guide to process the data and dig for answers. These answers, which will be presented in the form of data points will act as a springboard from which the storyteller will begin shaping the data story.

What are the challenges or obstacles that can be encountered during this stage?

Data around different topics has different limitations for processing and interpreting. A data wrangler, or storyteller, will not always be familiar with the requirements of the topic at hand, and will need to research it and/or contact a content-expert and/or the original data source, as in the case of verification.
Opportunities or requirements to enrich the data with additional data sources at this stage, could result in a return to the Data Sourcing stage. This could in turn require more data cleaning, etc in an iterative manner. Note that the collaboration and support of the team at this stage, when production is beginning to feel imminent, will help to negate impatience and potential negativity within the team culture.
If you would like to learn how to analyse data for storytelling, enquire about our Analyse for Storytelling short course!

Storytelling

Now that the data has been mined for storytelling, and various insights have been lifted from the dataset(s), the storyteller can begin their job of bringing the story to life.

In order to communicate and build relevance around the information contained within a dataset, the storyteller must first reach out to the key stakeholders around whom the story centres. It is the storyteller’s role to discover whether the data holds true to the real world and understand the ways in which the data, and its interpretation, will benefit or enrich those for whom the story would have the highest impact. Outliers, peaks, trends and variances found in the data need to be investigated and reported in the same way as any other story leads.

The storyteller will report on the data and ask relevant stakeholders questions surrounding the insights garnered from the data. After this, the storyteller will take this newly acquired information, return to the data and assess the information in the context of the dataset and vice versa. This process of alignment is one of the critical stages in beginning to shape the data story.

At this point you would bring the data wrangler and project manager back into the fold and begin to further develop and carve a story out of the resources, reporting, and data insights acquired up until this point.

Lastly, the developer/designer will be brought in and updated regarding the direction of the story and to begin brainstorming possibilities for the visualisation or presentation of the data, and the story as a whole.

What are the challenges or obstacles that can be encountered during this stage?

The data is found to only capture a small portion of a much larger problem, or doesn’t actually reveal the insights required by your story, leading you back to the drawing board.
When aligning reporting with data insights, you may find that the story you had in mind is either (i) irrelevant, or (ii) you need more data to answer new leads uncovered while reporting.
Key stakeholders are unreachable, uncooperative, or unsure of how to answer questions about findings in the data.
If you would like to learn how to tell stories using data, enquire about our Analyse for Storytelling short course!

Package and Present

When we talk about package and present, we are referring to the final stage in the processing of a dataset. Packaging and presenting the data story includes the story map, which is the blueprint for how the different elements should be structured in order to convey the final message. These different elements could include different combinations of written content, data visualisations, as well as any graphics, photographs or other multimedia elements.

This process is collaborated on by the storyteller and the developer/designer, while being overseen by the project manager. Initially, the storyteller and developer/designer will plan the presentation of the story. Together they will define the layout and storymap, meaning they will determine what elements will be positioned at which points along the story, how the various elements will interact with one another, whether the representation of the data in its various forms throughout the story is cohesive in its efforts to convey the data insights guiding the story, and assess the ways in which these elements, and data points, may enrich or distract audience members in their following of the story. This session will be fed back to the project manager, and once approved, the developer/designer will proceed to implement the storymap.

In terms of visualising the data, the developer/designer will consult the data wrangler during this phase in order to get access to the relevant data and discuss and share ideas around the ways in which the data is best presented. The storyteller will review the work in terms of its relation to the written content, while the project manager will continue to oversee the entire process, making sure ethical guidelines and publication standards are upheld.

The developer/designer in this case is responsible for packaging the final product to suit online and offline publication.

What are the challenges or obstacles that can be encountered during this stage?

There can often be lots of trial and error when developing custom visualisations, including preparing the data in the form best suited to the chosen visualisation. This can result in some back and forth exchanges between the developer/designer and the data wrangler.
Custom visualisations can take longer to produce than the time estimated.
Ensure sufficient time has been allowed for user-testing to ensure that the message is being communicated effectively.
A developer may encounter obstacles related to server security and brand identity when preparing and publishing data stories and visualisations in different publications in both online and print conditions.
If you would like to learn how to visualise and communicate your data driven stories, enquire about our Package and Present for data storytelling short course!

Credit

This curriculum has been developed by OpenUp.

Interested in taking one of our taught courses?
Express your interest and receive course updates

The Data Unit

Welcome to the data unit

The structure of the data unit

Two perspectives for approaching the data unit

Two types of data units

Jack of all trades

The large scale data unit

Summary of roles and their competencies

The Project Manager

Tasks at a glance

Core competencies

General

Editorial

The Data Wrangler

Tasks at a glance

Core competencies

General

Technical

The Storyteller

Tasks at a glance

Core competencies

General

Technical

The Developer (and/or Designer)

Tasks at a glance

Core competencies

General

Technical

Workflows and collaboration

Understanding

Workflow

Collaboration

The unit in action: mapping collaboration along the data storytelling pipeline

It begins with the team

What are the challenges or obstacles that can be encountered during this stage?

Find and Prepare

What are the challenges or obstacles that can be encountered during this stage?

Cleaning your data

What are the challenges or obstacles that can be encountered during this stage?

Analysis

What are the challenges or obstacles that can be encountered during this stage?

Storytelling

What are the challenges or obstacles that can be encountered during this stage?

Package and Present

What are the challenges or obstacles that can be encountered during this stage?

Credit

Do you find this content useful?