In today’s digital marketing landscape, you can’t type “B” into Google without it auto-filling “Big Data”. But despite the quantity of articles you read on this topic, the nature of big data still carries with it an air of mystery. To clear things up on this topic, we sought the opinion of our very own data wizard and lead data scientist, Dr. Melanie Koehler.

How would you define the role of a data manager in the financial sector?
Melanie Koehler

Dr. Melanie Koehler, Lead Data Scientist, StoneShot

“In general, I would say, a data manager in the financial sector has three main responsibilities when it comes to data analysis.

One responsibility is to support and contribute to the development of machine learning techniques and big data analysis in order to use the best techniques and provide best solutions to clients. This assures the best opportunities for clients are available for optimization.

Another responsibility lies in ensuring that all results are correct. That sounds obvious and simple, but when using machine learning techniques it’s easy to overfit the data. If machine learning algorithms are used as a black box (straightforward data-in, results-out algorithm), the results are not necessarily the truth behind the data. This responsibility should be taken very seriously since the results affect, for example, deriving the creditworthiness of clients through big data analysis. Any mistakes can have detrimental outcomes. This aspect of data management assures that the analysis of the client’s data is correct and that the trust is built on accurate calculations.

The third responsibility, in my opinion, is the protection of collected data. Some of this data is personal to clients and professional investors, collected in databases which are under threat of being hacked. This is why it is of the utmost importance to ensure the client’s know we’re managing it properly.”

What is the simplest way to define big data? How is it accumulated? Can everyone “get” it?

“Big data is, in general, data sets that are huge and/or complex. So huge and complex that sharing, transferring, storing, analyzing, visualizing and updating them can be a challenge. MS Excel can only handle so much data.

To illustrate, imagine how many people use Google, Facebook or Twitter every day. Then imagine that every entry of every person is saved and stored; that’s one example of the size of big data. Your mobile phone is already collecting data all the time –  your location information via GPS. All this data is stored and accounts for big data.

A lot of information we give is saved securely out of everyone’s reach. However, a lot of information is freely available and everybody can easily get it. Indirectly, we profit back from this vastly collected data when using apps, trends, devices, and advertisements. It can be perceived as both useful and useless, depending on your perspective.”

What’s big data’s role in the financial industry?

“In the financial industry, most information is gained from data patterns or from finding anomalies in these patterns.

The more data that is available, the better the results for pattern recognition and predictions are. These patterns are needed for optimizations and success but are also used to check the creditworthiness of clients. Banks calculate the trustworthiness of their customers not only based on credit history, but also using big data.

Another important task big data performs in the financial industry is finding anomalies in patterns. Again, the more data that is available, the better the recognition of anomalies is. These anomalies can be a sign of fraud and misuse. Not only does fraud detection play an important role for single clients but it is also central for all the credit institutes.

But perhaps the most powerful role of big data in the financial industry comes from being able to detect trends and anomalies in product performance and match that alongside client’s needs and goals to ensure they are invested in the best product that fits into their diversified portfolio. ”

How do you go about pulling insights from large pools of data?

“Big data cannot be handled in a single Excel sheet anymore. A common way to store big data is to use relational database management systems (RDBMS). Data is saved in separate tables, which have one or more columns in common to reconnect the information. One language to reconnect this data is the Structured Query Language (SQL). By writing queries, the tables with the needed information are connected and the data needed for the analysis is collected and preprocessed before it is exported to one table. This one table can be used for further processes or for direct plotting. In some cases, visualizing the data in the appropriate graphs gives the required result. In other cases, further processing of the data using machine learning techniques is applied.”

Do you use data from multiple sources to support your insights? If so, how do you combine multiple sets of data?

“For my work, I use SQL to obtain data from our database, where all data is stored. In some cases, we are able to get further information from clients directly, which is then included in the analysis. I mainly use Python to combine the data, which offers the library Pandas (a programming language used in analyzing big data) to work with databases.

For our client’s work, however, combining multiple sources of data is key to getting as close to a total view of your campaign as possible. We analyze email, web, and video data, in addition to CRM and BI to find the through-line between data sets. We do this to gain deeper insights.”

Why, in your professional opinion, does the media and tech industry talk as much as they do about “big data” these days?

“Actually, the topic of the minute is AI with our clients, but you can’t drive AI without proper utilization of big data. In fact, many applications we use on a daily basis are based on data. Big data is a big part of our life. The recommendation for books or other goods we get on the side of our browser, the advertisements which follow us from one web page to another, and the spam filters that partly stop unwanted ads are only a few examples that we encounter every day.

Preventing internet fraud, optimizing business processes and financial trading, improving health care, improving science and research, optimizing machine and device processes, and improving security and law enforcement are other examples. Big data is the basis for all of these applications as well as many new applications that are currently in development, which will have a comparably meaningful impact.

Another reason why people talk about big data might be out of fear that ‘our’ data is being saved and used without our permission. It raises a lot of privacy questions.”

What are 2-3 crucial pieces of advice you would give data analysts in the financial sector?

Don’t let confirmation bias cloud the truth.

“Pattern recognition is helpful if there are patterns and predictions from regressions. If there’s a trend you can identify, then that’s great. Many algorithms are used as black boxes, therefore it’s extremely important to question if the procedure is the right one and if the results make sense. If your results show that jumping from a plane without a parachute is healthy for you, then there might be a mistake in the analysis.

One final piece of advice is to present the data results without any bias. Don’t plan what you’d like to see from your data. Prepare a graph that shows the results without manipulating it in a way to force a result or trend that isn’t there.”

For more info on data analysis and insight on how to get the most out of your data, email us at!