Less noise, more data. Get the biggest data report on software developer careers in South Africa.

Dev Report mobile

MERGE Presentation: Brittany Wagenaar and Robyn Steenekamp on Why Everyone’s a Data Scientist

28 January 2020 , by Candice Grobler

As they scale heavily, the Luno team increasingly relies on data to inform and validate their decision-making. Every team member needs to be equipped to autonomously evaluate the impact of their choices.

In this talk, Brittany Wagenaar and Robyn Steenekamp share what they’ve learned about making data accessible, understandable and easy to use for everyone in the company.

Brittany---robyn_MERGE_Banner-21-1

Presentation transcript

[00:10] Brittany Wagenaar: So, let's start at the beginning. What is a data scientist? Well, technically speaking a data scientist is someone that can reason, read and interpret data and make predictions about the future. We believe that everyone is a data scientist, but what's quite surprising is how many people don't. In the build-up to this talk, we were met with many responses, but a common one was, "Maybe everyone's a data scientist, but I'm certainly not." It's a common misconception people have that they're not good at maths and stats. Every moment of every day people are ingesting huge quantities of data that they store, process and act on.

[01:01] Consider any time we've needed to cross the street, chances are good that when you got to the edge of the road you look left and right, you check for cars. If there were cars, you check to see how fast they were growing and if at any point they turned off the road. What you maybe didn't realise was that your brain was building an extremely sophisticated model to tell you whether it was safe to cross the street. Considering that you're all sitting here today, it's safe to say that your model's worked out pretty well. People are always interpreting the data that they ingest in every moment, a data scientist just gets paid to do it.

[01:43] So now that we understand that everyone has this analytic capability, how do we empower people to use it? Well, beyond the obvious, "I can't do maths" mental barrier, there's also a structural barrier that needs to be overcome. This is where organisations can help by fostering a data-driven culture. Being data-driven means empowering people to make decisions using hard data evidence, rather than on intuition or observation alone. It encompasses the idea of having data science citizens, where people can merge data evidence with their personal domain context to enhance a decision making ability.

[02:23] This concept is most easily illustrated with a story. So picture this, a boardroom of individuals or boardroom of executives of a popular fast food chain. They've noticed that sales have been going down recently and the need to determine why and how to fix it. The first executive raises his hand and he suggests that the company should get rid of all of their chicken products, because he personally doesn't like chicken and no one he knows likes chicken either. The second executive raises her hand, she says that she's looked at the data and there's been a significant increase in the number of ice cream sold over the past few months. She suggests that the company start focusing on selling ice creams. Finally, the third executive raises her hand. She says that she's also seen an increase in the sale of ice creams, but attributes that increase to a recent promotion the company's has been running over the same time period. The third executive further suggests that the company's customers might be rewards driven and that cleverly using a promotion might increase their sales.

[03:39] In this story, the first executive was trying to make a decision using only his personal experience. Although intuition has a part to play, not having any data evidence to back his claim, means that the executive had no surety that he was making the right decision. The second executive, although she tried to use data to make a decision, lacked any context or explanation as to why she saw the patterns in the data that she did. Being data-driven is exactly how the third executive approached the problem. Find underlying patterns in your data and then overlay context to give explanation to those patterns. If you make sure that you're using data, when you make a decision, you have at least some level of surety that you're making the right one.

[04:28] So now we understand what it means to be data-driven. The next question we should answer is why is this important? So there's two main benefits to being data-driven. Firstly, it guards your company against bias and secondly, it ensures the accuracy and reliability of your data. A lot of the mental work that we do is subconscious, which can make it very difficult for us to verify the logic we use when we make a decision. Data has no biases and can act as an unbiased validator of our decisions. So let's look at a simple incentivised referral program.

[05:07] If you saw an increase in your referrals, you would probably think that the program was doing great, right? But what the data can and has in the past shown, is that they're always going to be people that abuse your policies. So what you think is an increase in referrals could actually just be an increase in fraudulent sign-ups, where fraudsters send incentive to themselves. Using data it can help you do an unbiased behavioural analysis of all of your customers, fraudsters included.

[05:45] The second benefit to being data-driven is that it gives you accurate and reliable data, so if you're being data-driven, the idea is that teams should interact and integrate and really explore your data, which opens the door to very helpful and insightful feedback. If you have people always working with your data, you essentially have realtime credibility checks, always judging the state of your data to ensure that it's honest and reliable. For example, at Luno, all of our teams can have access to the data at which they can integrate and explore as they wish. Which makes it very easy for our teams to spot any misalignments between what our data reflects and what's actually happening. This helps ensure the integrity of our data.

[06:34] So now we've answered what being data-driven means, why it's important. The next question we should answer is how does this affect your team? So a data-driven organisation should be made up of data science citizens. Let me explain. In your organisation, you'll have many individuals, some will know about operations, some will know about security, some might be able to spot a fraudster from a mile away. These individuals are hugely advantageous to your organisation because they contain a wealth of knowledge and context and their respective areas. For them to be able to do their jobs most efficiently, they need to be able to both ask and answer questions when they arise. These individuals are essentially your data science citizens with a wealth of context. And they need to be able to easily and accurately answer questions when they see it, and they can do this best if they're in an organisation that has a data-driven culture. A data-driven culture would provide these data science citizens with data access, data literacy and data fluency.

[07:55] So data access, earlier we mentioned that when we give access to teams, we help reduce bias in our company, but we also ensure that the decisions that people are making are impactful. At Luno, we have a central data platform that anyone can access and explore data, and we also have dashboards situated around the office that visualise data in a way that drives good data-driven questions. Data science citizens also need data literacy. Simply having access to data is not much use if people don't understand what it means. If you have any metrics, it's crucial that people know what they are.

[08:36] For example, if you wanted to look at the number of users that opened your app today on Google Analytics, you might want to look at the metric number of users, but what does that mean? Is it the number of unique users that opened your app today or is it the count of the total number of times someone opened your app? Distinguishing the difference between the meanings is crucial in understanding your data. Finally, data science citizens need some level of data fluency. This encapsulates the idea that people need to be able to read, reason and argue with data in order to make good business decisions.

[09:24] Robyn Steenekamp: Thank you, Brittany. At Luno we foster a data-driven culture by ensuring that data is accessible, easy to understand, and easy to use. So first I'm going to speak about accessibility. In recent years, companies have focused extensively on collecting and storing as much data as possible. The sheer volume of recent sources of data creates significant barriers to accessibility and usability. According to a recent study, less than half a percent of company data is ever analysed and far less is used to inform business decisions. It's clear that companies are overwhelmed by the data they collect, so how do we make all this data accessible at Luno?

[10:07] First we ensured that our data is joinable. To create large context rich data sets, our data is in a form where it can be easily joined to other company data when necessary. Breaking down data silos allows us to join data from disparate sources, giving us a holistic view of the data stored. As an example, we use an external ticketing platform to manage customer queries. This platform has its own analytics capabilities, but what we really wanted to do was to be able to overlay our customer and transactional data over the ticket data to get a unified view of what was happening. Once we were able to achieve this, we could conduct a much richer analysis into customer service operations and customer satisfaction.

[10:54] Secondly, we ensured that our data is sharable. Anyone at Luno is able to explore, utilise, and visualise the data that's most relevant to them. We use a business intelligence tool, Looker, to put actionable data in the hands of people who need it most. Looker allows users to automate schedules for data delivery, giving us greater control over how information is circulated throughout the company. Of course, we need to do this without compromising compliance or increasing risk, so we encrypt or remove any personally identifiable information from our data sets so that our customers remain anonymous.

[11:33] Finally, we make sure that our data is queryable. There are appropriate tools to filter group and summarize the raw data into subsets that can generate insights. Once again, we use Looker, but there are many platforms out there that offer similar functionalities. So now data is accessible, but this is not enough. Access to data alone will not ensure that people use it. Investment into big data integration, machine learning and data science solutions are of little value if the data output doesn't reach its intended audience.

[12:08] So how do we ensure that decision makers use this data to benefit the company? At Luno, we make data understandable. A key challenge facing many companies is a lack of common data vocabulary, inconsistent and ambiguous metrics undermine operational efficiency and competitive advantage. For example, what do we mean when we talk about engaged users? Are these customers that have transacted on the Luno platform or do we consider a user engaged when they open the app? If so, do we consider a seven day rolling window or 30 day rolling window? Having well-governed data standards will minimise potential misunderstandings and foster better collaboration between business units.

[12:55] Secondly, context. Context is important. Data scientists have a limited understanding of certain areas of the business and we need input from stakeholders to determine which data is relevant, which is noise, and what we would like to achieve from the data. This is incredibly important as the true power of data is harnessed and when we incorporate domain knowledge into data analysis. At Luno we have a data scientist embedded in each engineering pod, and we really try to immerse ourselves in the particular field to fully understand the information and problem contexts. We feed insights back into our pod, generating a conversation around the data and the problems we're trying to solve.

[13:37] Storytelling is a powerful and compelling way of transferring and consolidating information in a meaningful way. Data storytelling is a structured approach for communicating data insights and involves three key elements, data, visuals and narrative. We have found that data storytelling is a much more effective way of communicating data insights as opposed to pouring over spreadsheets and data tables. So now data is accessible and understandable, but how do we drive self service data analytic capabilities? We achieve this at Luno by making sure that our data is easy to use.

[14:19] So data curation data is dirty and cleaning data is a difficult and time consuming process. You will often hear data scientists say that 80% of their time is spent gathering and cleaning data, whereas the other 20% is spent building validating models and drawing conclusions. Data specialists need to abstract the underlying data gathering and cleaning processes from the rest of the company. At Luno, the data team takes care of the data curation when metrics have already been defined and centralized on Looker, and available for people to easily reuse. Secondly, data literacy. In the same way that literacy contributed to human progress over the past few hundred years, data literacy is crucial and keeping companies relevant in this century. Data literacy is a new language and all members of our organisation needs to be fluent in it.

[15:15] Decision makers don't necessarily need to understand the mechanics of data cleaning and gathering, but they should have a good understanding of experimental design and basic statistical inference. We have found that by giving product owners data literacy training sessions, they are much better equipped to make decisions using data. And to conclude being data-driven is not an end destination, but a continual and iterative process. At Luno, in addition to giving everyone a voice, we also encourage an inquisitive culture where people can ask for additional information, challenge assumptions and discuss recommendations. Thank you.

NOTE: The questions for this presentation were not transcribed, they start at 15:56.

Recent posts

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.