problem solving data science problems

5 Structured Thinking Techniques for Data Scientists

Structured thinking is a framework for solving unstructured problems — which covers just about all data science problems. Using a structured approach to solve problems not only only helps solve problems faster but also helps identify the parts of the problem that may need some extra attention.

Think of structured thinking like the map of a city you’re visiting for the first time.Without a map, you’ll probably find it difficult to reach your destination. Even if you did eventually reach your destination, it’ll probably take you at least double the time.

What Is Structured Thinking?

Here’s where the analogy breaks down: Structured thinking is a framework and not a fixed mindset; you can modify these techniques based on the problem you’re trying to solve. Let’s look at five structured thinking techniques to use in your next data science project .

Six Step Problem Solving Model
Eight Disciplines of Problem Solving
The Drill Down Technique
The Cynefin Framework
The 5 Whys Technique

More From Sara A. Metwalli 3 Reasons Data Scientists Need Linear Algebra

1. Six Step Problem Solving Model

This technique is the simplest and easiest to use. As the name suggests, this technique uses six steps to solve a problem, which are:

Have a clear and concise problem definition.

Study the roots of the problem.

Brainstorm possible solutions to the problem.

Examine the possible solution and choose the best one.

Implement the solution effectively.

Evaluate the results.

This model follows the mindset of continuous development and improvement. So, on step six, if your results didn’t turn out the way you wanted, go back to step four and choose another solution (or to step one and try to define the problem differently).

My favorite part about this simple technique is how easy it is to alter based on the specific problem you’re attempting to solve.

We’ve Got Your Data Science Professionalization Right Here 4 Types of Projects You Need in Your Data Science Portfolio

2. Eight Disciplines of Problem Solving

The eight disciplines of problem solving offers a practical plan to solve a problem using an eight-step process. You can think of this technique as an extended, more-detailed version of the six step problem-solving model.

Each of the eight disciplines in this process should move you a step closer to finding the optimal solution to your problem. So, after you’ve got the prerequisites of your problem, you can follow disciplines D1-D8.

D1 : Put together your team. Having a team with the skills to solve the project can make moving forward much easier.

D2 : Define the problem. Describe the problem using quantifiable terms: the who, what, where, when, why and how.

D3 : Develop a working plan.

D4 : Determine and identify root causes. Identify the root causes of the problem using cause and effect diagrams to map causes against their effects.

D5 : Choose and verify permanent corrections. Based on the root causes, assess the work plan you developed earlier and edit as needed.

D6 : Implement the corrected action plan.

D7 : Assess your results.

D8 : Congratulate your team. After the end of a project, it’s essential to take a step back and appreciate the work you’ve all done before jumping into a new project.

3. The Drill Down Technique

The drill down technique is more suitable for large, complex problems with multiple collaborators. The whole purpose of using this technique is to break down a problem to its roots to make finding solutions that much easier. To use the drill down technique, you first need to create a table. The first column of the table will contain the outlined definition of the problem, followed by a second column containing the factors causing this problem. Finally, the third column will contain the cause of the second column's contents, and you’ll continue to drill down on each column until you reach the root of the problem.

Once you reach the root causes of the symptoms, you can begin developing solutions for the bigger problem.

On That Note . . . 4 Essential Skills Every Data Scientist Needs

4. The Cynefin Framework

The Cynefin framework, like the rest of the techniques, works by breaking down a problem into its root causes to reach an efficient solution. We consider the Cynefin framework a higher-level approach because it requires you to place your problem into one of five contexts.

Obvious Contexts. In this context, your options are clear, and the cause-and-effect relationships are apparent and easy to point out.
Complicated Contexts. In this context, the problem might have several correct solutions. In this case, a clear relationship between cause and effect may exist, but it’s not equally apparent to everyone.
Complex Contexts. If it’s impossible to find a direct answer to your problem, then you’re looking at a complex context. Complex contexts are problems that have unpredictable answers. The best approach here is to follow a trial and error approach.
Chaotic Contexts. In this context, there is no apparent relationship between cause and effect and our main goal is to establish a correlation between the causes and effects.
Disorder. The final context is disorder, the most difficult of the contexts to categorize. The only way to diagnose disorder is to eliminate the other contexts and gather further information.

Get the Job You Want. We Can Help. Apply for Data Science Jobs on Built In

5. The 5 Whys Technique

Our final technique is the 5 Whys or, as I like to call it, the curious child approach. I think this is the most well-known and natural approach to problem solving.

This technique follows the simple approach of asking “why” five times — like a child would. First, you start with the main problem and ask why it occurred. Then you keep asking why until you reach the root cause of said problem. (Fair warning, you may need to ask more than five whys to find your answer.)

Women Who Code

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Data, AI, & Machine Learning
Managing Technology
Social Responsibility
Workplace, Teams, & Culture
AI & Machine Learning
Diversity & Inclusion
Big ideas Research Projects
Artificial Intelligence and Business Strategy
Responsible AI
Future of the Workforce
Future of Leadership
All Research Projects
AI in Action
Most Popular
The Truth Behind the Nursing Crisis
Work/23: The Big Shift
Coaching for the Future-Forward Leader
Measuring Culture

The spring 2024 issue’s special report looks at how to take advantage of market opportunities in the digital space, and provides advice on building culture and friendships at work; maximizing the benefits of LLMs, corporate venture capital initiatives, and innovation contests; and scaling automation and digital health platform.

Past Issues
Upcoming Events
Video Archive
Me, Myself, and AI
Three Big Points

Framing Data Science Problems the Right Way From the Start

Data science project failure can often be attributed to poor problem definition, but early intervention can prevent it.

Data, AI, & Machine Learning
Analytics & Business Intelligence
Data & Data Culture

The failure rate of data science initiatives — often estimated at over 80% — is way too high. We have spent years researching the reasons contributing to companies’ low success rates and have identified one underappreciated issue: Too often, teams skip right to analyzing the data before agreeing on the problem to be solved. This lack of initial understanding guarantees that many projects are doomed to fail from the very beginning.

Of course, this issue is not a new one. Albert Einstein is often quoted as having said , “If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute solving it.”

Get Updates on Leading With AI and Data

Get monthly insights on how artificial intelligence impacts your organization and what it means for your company and customers.

Please enter a valid email address

Thank you for signing up

Consider how often data scientists need to “clean up the data” on data science projects, often as quickly and cheaply as possible. This may seem reasonable, but it ignores the critical “why” question: Why is there bad data in the first place? Where did it come from? Does it represent blunders, or are there legitimate data points that are just surprising? Will they occur in the future? How does the bad data impact this particular project and the business? In many cases, we find that a better problem statement is to find and eliminate the root causes of bad data .

Too often, we see examples where people either assume that they understand the problem and rush to define it, or they don’t build the consensus needed to actually solve it. We argue that a key to successful data science projects is to recognize the importance of clearly defining the problem and adhere to proven principles in so doing. This problem is not relegated to technology teams; we find that many business, political, management, and media projects, at all levels, also suffer from poor problem definition.

Toward Better Problem Definition

Data science uses the scientific method to solve often complex (or multifaceted) and unstructured problems using data and analytics. In analytics, the term fishing expedition refers to a project that was never framed correctly to begin with and involves trolling the data for unexpected correlations. This type of data fishing does not meet the spirit of effective data science but is prevalent nonetheless. Consequently, defining the problem correctly needs to be step one. We previously proposed an organizational “bridge” between data science teams and business units, to be led by an innovation marshal — someone who speaks the language of both the data and management teams and can report directly to the CEO. This marshal would be an ideal candidate to assume overall responsibility to ensure that the following proposed principles are utilized.

Get the right people involved. To ensure that your problem framing has the correct inputs, you have to involve all the key people whose contributions are needed to complete the project successfully from the beginning. After all, data science is an interdisciplinary, transdisciplinary team sport. This team should include those who “own” the problem, those who will provide data, those responsible for the analyses, and those responsible for all aspects of implementation. Think of the RACI matrix — those responsible , accountable , to be consulted , and to be informed — for each aspect of the project.

Recognize that rigorously defining the problem is hard work. We often find that the problem statement changes as people work to nail it down. Leaders of data science projects should encourage debate, allow plenty of time, and document the problem statement in detail as they go. This ensures broad agreement on the statement before moving forward.

Don’t confuse the problem and its proposed solution. Consider a bank that is losing market share in consumer loans and whose leadership team believes that competitors are using more advanced models. It would be easy to jump to a problem statement that looks something like “Build more sophisticated loan risk models.” But that presupposes that a more sophisticated model is the solution to market share loss, without considering other possible options, such as increasing the number of loan officers, providing better training, or combating new entrants with more effective marketing. Confusing the problem and proposed solution all but ensures that the problem is not well understood, limits creativity, and keeps potential problem solvers in the dark. A better statement in this case would be “Research root causes of market share loss in consumer loans, and propose viable solutions.” This might lead to more sophisticated models, or it might not.

Understand the distinction between a proximate problem and a deeper root cause. In our first example, the unclean data is a proximate problem, whereas the root cause is whatever leads to the creation of bad data in the first place. Importantly, “We don’t know enough to fully articulate the root cause of the bad data problem” is a legitimate state of affairs, demanding a small-scale subproject.

Do not move past problem definition until it meets the following criteria:

It does no harm. It may not be clear how to solve the defined problem, but it should be clear that solving it will lead to a good business result. If it’s not clear, more refinement may be needed. Consider the earlier bank example. While it might be easy enough to adjust models in ways that grant more loans, this might significantly increase risk — an unacceptable outcome. So the real goal should be to improve market share without creating additional risk, hence the inclusion of “propose viable solutions” in the problem statement above.
It considers necessary constraints. Using the bank example, we can recognize that more sophisticated models might require hiring additional highly skilled loan officers — something the bank might be unwilling to do. All constraints, including those involving time, budget, technology, and people, should be clearly articulated to avoid a problem statement misaligned with business goals.
It has an accountability matrix (or its equivalent). Alignment is key for success, so ensure that those who are responsible for solving the problem understand their various roles and responsibilities. Again, think RACI matrix.
It receives buy-in from stakeholders. Poorly defined or controversial problem statements often produce resistors within the organization. In extreme cases, they may become “snipers,” attempting to ensure project failure. Work to develop a general (not necessarily unanimous) consensus from leadership, those involved in the solution, and the ultimate customers (those who will be affected) on the problem definition.

Taking the time needed to properly define the problem can feel uncomfortable. After all, we live and work in cultures that demand results and are eager to “get on with it.” But shortchanging this step is akin to putting the cart before the horse — it simply doesn’t work. There is no substitute for probing more deeply, getting the right people involved, and taking the time to understand the real problem. All of us — data scientists, business leaders, and politicians alike — need to get better at defining the right problem the right way.

About the Authors

Roger W. Hoerl ( @rogerhoerl ) teaches statistics at Union College in Schenectady, New York. Previously, he led the applied statistics lab at GE Global Research. Diego Kuonen ( @diegokuonen ) is head of Bern, Switzerland-based Statoo Consulting and a professor of data science at the Geneva School of Economics and Management at the University of Geneva. Thomas C. Redman ( @thedatadoc1 ) is president of New Jersey-based consultancy Data Quality Solutions and coauthor of The Real Work of Data Science: Turning Data Into Information, Better Decisions, and Stronger Organizations (Wiley, 2019).

More Like This

Add a comment cancel reply.

You must sign in to post a comment. First time here? Sign up for a free account : Comment on articles and get access to many more articles.

Comments (2)

Tathagat varma.

search faculty.ai

Key skills for aspiring data scientists: Problem solving and the scientific method

This blog is part two of our ‘Data science skills’ series, which takes a detailed look at the skills aspiring data scientists need to ace interviews, get exciting projects, and progress in the industry. You can find the other blogs in our series under the ‘Data science career skills’ tag.

One of the things that attracts a lot of aspiring data scientists to the field is a love of problem solving, more specifically problem solving using the scientific method. This has been around for hundreds of years, but the vast volume of data available today offers new and exciting ways to test all manner of different hypotheses – it is called data science after all.

If you’re a PhD student, you’ll probably be fairly used to using the scientific method in an academic context, but problem solving means something slightly different in a commercial context. To succeed, you’ll need to learn how to solve problems quickly, effectively and within the constraints of your organisation’s structure, resources and time frames.

Why is problem solving essential for data scientists?

Problem solving is involved in nearly every aspect of a typical data science project from start to finish. Indeed, almost all data science projects can be thought of as one long problem solving exercise.

To make this clear, let’s consider the following case study; you have been asked to help optimize a company’s direct marketing, which consists of weekly catalogues.

Defining the right question

The first aim of most data science projects is to properly specify the question or problem you wish to tackle. This might sound trivial, but it can often be one of the most challenging parts of any project, and how successful you are at this stage can come to define how successful you are by the finish.

In an academic context, your problem is usually very clearly defined. But as a data scientist in industry it’s rare for your colleagues or your customer to know exactly which problem they’re trying to solve.

In this example, you have been asked to “optimise a company’s direct marketing”. There are numerous translations of this problem statement into the language of data science. You could create a model which helps you contact customers who would get the biggest uplift in purchase propensity or spend from receiving direct marketing. Or you could simply work out which customers are most likely to buy and focus on contacting them.

While most marketers and data scientists would agree that the first approach is better in theory, whether or not you can answer this question through data depends on what the company has been doing up to this point. A robust analysis of the company’s data and previous strategy is therefore required, even before deciding on which specific problem to focus on.

This example makes clear the importance of properly defining your question up front; both options here would lead you on very different trajectories and it is therefore crucial that you start off on the right one. As a data scientist, it will be your job to help turn an often vague direction from a customer or colleague into a firm strategy.

Formulating and evaluating hypotheses

Once you’ve decided on the question that will deliver the best results for your company or your customer, the next step is to formulate hypotheses to test. These can come from many places, whether it be the data, business experts, or your own intuition.

Suppose in this example you’ve had to settle for finding customers who are most likely to buy. Clearly you’ll want to ensure that your new process is better than the company’s old one – indeed, if you’re making better data driven decisions than the company’s previous process you would expect this to be the case.

There is a challenge here though – you can’t directly test the effect of changing historical mailing decisions because these decisions have already been made. However, you can indirectly, by looking at people who were mailed, and then looking at who bought something and who didn’t. If your new process is superior to the previous one, it should be suggesting that you mail most of the people in this first category, as people missed here could indicate potential lost revenue. It should also omit most of the people in the latter category, as mailing this group is definitely wasted marketing spend.

While these metrics don’t prove that your new process is better, they do provide some evidence that you’re making improvements over what went before.

This example is typical of applied data science projects – you often can’t test your model on historical data to the extent that you would like, so you have to use the data you have available as best you can to give us as much evidence as is possible as to the validity of your hypotheses.

Testing and drawing conclusions

The ultimate test of any data science algorithm is how it performs in the real world. Most data science projects will end by attempting to answer this question, as ultimately this is the only way that data science can truly deliver value to people.

In our example from above, this might look like comparing your algorithm against the company’s current process by doing an randomised control trial (RCT), and comparing the response rates across the two groups. Of course one would expect random variation, and being able to explain the significance (or lack thereof) of any deviations between the two groups would be essential to solving the company’s original problem.

How successfully you test and draw your final conclusions, as well as well you take into account all the limitations with the evaluation, will ultimately decide how impactful the end result of the project is. When addressing a business problem there can be massive consequences to getting the answer wrong – therefore formulating this final test in a way that is scientifically robust but also helps address the original problem statement is therefore paramount, and is a skill that any data scientist needs to possess.

How to develop your problem solving skills

There are certainly ways you can develop your applied data science problem solving skills. The best advice, as so often is true in life, is to practice. Indeed, one of the reasons that so many employers look for data scientists with PhDs is because this demonstrates that the individual in question can solve hard problems.

Websites like kaggle can be a great starting point for learning how to tackle data science problems and winners of old competitions often have good posts about how they came to build their winning model. It’s also important to learn how to translate business problems into a clear data science problem statement. Data science problems found online have often solved this bit for you, so try and focus on those that are vague and ill-defined – whilst it might be tempting to stick to those that are more concrete, real life is seldom as accommodating.

As the best way to develop your skills is to practice them, Faculty’s Fellowship programme can be a fantastic way to improve your problem solving skills. As the fellowship gives you an opportunity to tackle a real business problem for a real customer, and take the problem through from start to finish, there are not many better ways to develop, and prove, your skills in this area.

Head to the Faculty Fellowship page to find out more.

Recent Blogs

Using AI to predict severe system pressure up to 10 days in advance

Meet the team: Andrew Perry

Creating an AI Enabled Organisation

Subscribe to our newsletter and never miss out on updates from our experts.

Data Topics

Data Architecture
Data Literacy
Data Science
Data Strategy
Data Modeling
Governance & Quality
Education Resources For Use & Management of Data

Data Science Solutions: Applications and Use Cases

Data Science is a broad field with many potential applications. It’s not just about analyzing data and modeling algorithms, but it also reinvents the way businesses operate and how different departments interact. Data scientists solve complex problems every day, leveraging a variety of Data Science solutions to tackle issues like processing unstructured data, finding patterns […]

Data Science helps analyze and extract patterns from corporate data, so these patterns can be organized to guide corporate decisions. Data analysis using Data Science techniques helps companies to figure out which trends are the best fit for businesses during various parts of the year.

Through data patterns, Data Science professionals can use tools and techniques to forecast future customer needs toward a specific product or service. Data Science and businesses can work together closely in understanding consumer preferences across a wide range of items and running better marketing campaigns.

To enhance the scope of predictive analytics , Data Science now employs other advanced technologies such as machine learning and deep learning to improve decision-making and create better models for predicting financial risks, customer behaviors, or market trends.

Data Science helps with making future-proofing decisions, supply chain predictions, understanding market trends, planning better pricing for products, consideration of automation for various data-driven tasks, and so on.

For example, in sales and marketing, Data Science is mainly used to predict markets, determine new customer segments, optimize pricing structures, and analyze the customer portfolio. Businesses frequently use sentiment analysis and behavior analytics to determine purchase and usage patterns, and to understand how people view products and services. Some businesses like Lowes, Home Depot, or Netflix use “hyper-personalization” techniques to match offers to customers accurately via their recommendation engines.

E-commerce companies use recommendation engines, pricing algorithms, customer predictive segmentation, personalized product image searching, and artificially intelligent chat bots to offer transformational customer experience.

In recent times, deep learning , through its use of “artificial neural networks,” has empowered data scientists to perform unstructured data analytics, such as image recognition, object categorizing, and sound mapping.

Data Science Solutions by Industry Applications

Now let’s take a look at how Data Science is powering the industry sectors with its cross-disciplinary platforms and tools:

Data Science Solutions in Banking: Banking and financial sectors are highly dependent on Data Science solutions powered with big data tools for risk analytics, risk management, KYC, and fraud mitigation. Large banks, hedge funds, stock exchanges, and other financial institutions use advanced Data Science (powered by big data, AI, ML) for trading analytics, pre-trade decision-support analytics, sentiment measurements, predictive analytics, and more.

Data Science Solutions in Marketing: Marketing departments often use Data Science to build recommendation systems and to analyze customer behavior. When we talk about Data Science in marketing, we are primarily concerned with what we call “retail marketing.” The retail marketing process involves analyzing customer data to inform business decisions and drive revenue. Common data used in retail marketing include customer data, product data, sales data, and competitor data. Customer transactional data is used extensively in AI-powered data analytics systems for increased sales and providing excellent marketing services. Chatbot analytics and sales representative response data are used together to improve sales efficiency.

The retailer can use this data to build customer-targeted marketing campaigns, optimize prices based on demand, and decide on product assortment. The retail marketing process is rarely automated; it involves making business decisions based on the data. Data scientists working in retail marketing are primarily concerned with deriving insights from the data and applying statistical and machine learning methods to inform these decisions.

Data Science Solutions in Finance and Trading: Finance departments use Data Science to build trading algorithms, manage risk, and improve compliance. A data scientist working in finance will primarily use data about the financial markets. This includes data about the companies whose stocks are traded on the market, the trading activity of the investors, and the stock prices. The financial data is unstructured and messy; it’s collected from different sources using different formats. The data scientist’s first task, therefore, is to process the data and convert it into a structured format. This is necessary for building algorithms and other models. For example, the data scientist might build a trading algorithm that exploits the market inefficiencies and generates profits for the company.

Data Science Solutions in Human Resources: HR departments use Data Science to hire the best talent, manage employee data, and predict employee performance. The data scientist working in HR will primarily use employee data collected from different sources. This data could be structured or unstructured depending on how it’s collected. The most common source is an HR database such as Workday. The data scientist’s first task is to process the data and clean it. This is necessary for insights from the data. The data scientist might use methods like machine learning to predict the employee’s performance. This can be done by training the algorithm on historical employee data and the features it contains. For example, the data scientist might build a model that predicts employee performance using historical data.

Data Science in Logistics and Warehousing: Logistics and operations departments use Data Science to manage supply chains and predict demand. The data scientist working in logistics and warehousing will primarily use data about customer orders, inventory, and product prices. The data scientist will use data from sensors and IoT devices deployed in the supply chain to track the product’s journey. The data scientist might use methods like machine learning to predict demand.

Data Science Solutions in Customer Service: Customer service departments use Data Science to answer customer queries, manage tickets, and improve the end-to-end customer experience. The data scientist working in customer service will primarily use data about customer tickets, customers, and the support team. The most common source is the ticket management system. In this case, the data scientist might use methods like machine learning to predict when the customer will stop engaging with the brand. This can be done by training the algorithm on historical customer data. For example, using historical data, the data scientist might build a model that predicts when a customer will stop engaging with the brand.

Big Data with Data Science Solutions Use Cases

While Data Science solutions can be used to get insights into behaviors and processes, big data analytics indicates the convergence of several cutting-edge technologies working together to help enterprise organizations extract better value from the data that they have.

In biomedical research and health, advanced Data Science and big data analytics techniques are used for increasing online revenue, reducing customer complaints, and enhancing customer experience through personalized services. In the hospitality and food services industries, once again big data analytics is used for studying customers’ behavior through shopping data, such as wait times at the checkout. Statistics show that 38% of companies use big data to improve organizational effectiveness.

In the insurance sector, big data-powered predictive analytics is frequently used for analyzing large volumes of data at high speed during the underwriting stage. Insurance claims analysts now have access to algorithms that help identify fraudulent behaviors. Across all industry sectors, organizations are harnessing the predictive powers of Data Science to enhance their business forecasting capabilities.

Big data coupled with Data Science enables enterprise businesses to leverage their own organization data, rather than relying on market studies or third-party tools. Data Science practitioners work closely with RPA industry professionals to identify data sources for a company, as well as to build dashboards and visuals for searching various forms of data analytics in real-time. Data Science teams can now train deep learning systems to identify contracts and invoices from a stack of documents, as well as perform different types of identification for the information.

Big data analytics has the potential to unlock great insights into data across social media channels and platforms, enabling marketing, customer support, and advertising to improve and be more aligned with corporate goals. Big data analytics make research results better, and helps organizations use research more effectively by allowing them to identify specific test cases and user settings.

Specialized Data Science Use Cases with Examples

Data Science applications can be used for any industry or area of study, but the majority of examples involve data analytics for business use cases . In this section, some specific use cases are presented with examples to help you better understand its potential in your organization.

Data cleansing: In Data Science, the first step is data cleansing, which involves identifying and cleaning up any incorrect or incomplete data sets. Data cleansing is critical to identify errors and inconsistencies that can skew your data analysis and lead to poor business decisions. The most important thing about data cleansing is that it’s an ongoing process. Business data is always changing, which means the data you have today might not be correct tomorrow. The best data scientists know that data cleansing isn’t done just once; it’s an ongoing process that starts with the very first data set you collect.

Prediction and forecasting: The next step in Data Science is data analysis, prediction, and forecasting. You can do this on an individual level or on a larger scale for your entire customer base. Prediction and forecasting helps you understand how your customers behave and what they may do next. You can use these insights to create better products, marketing campaigns, and customer support. Normally, the techniques used for prediction and forecasting include regression, time series analysis, and artificial neural networks.

Fraud detection: Fraud detection is a highly specialized use of Data Science that relies on many techniques to identify inconsistencies. With fraud detection, you’re trying to find any transactions that are incorrect or fraudulent. It’s an important use case because it can significantly reduce the costs of business operations. The best fraud detection systems are wide-ranging. They use many different techniques to identify inconsistencies and unusual data points that suggest fraud. Because fraud detection is such a specialized use case, it’s best to work with a Data Science professional.

Data Science for business growth: Every business wants to grow, and this is a natural outcome of doing business. Yet many businesses struggle to keep up with their competitors. Data Science can help you understand your potential customers and improve your services. It can also help you identify new opportunities and explore different areas you can expand into. Use Data Science to identify your target audience and their needs. Then create products and services that serve those needs better than your competitors can. You can also use Data Science to identify new markets, explore new areas for growth, and expand into new industries.

Data Science is an interdisciplinary field that uses mathematics, engineering, statistics, machine learning, and other fields of study to analyze data and identify patterns. Data Science applications can be used for any industry or area of study, but most examples involve data analytics for business use cases . Data Science often helps you understand your potential customers and their buying needs.

Image used under license from Shutterstock.com

Business Problem

What is the problem we are trying to solve?

That’s the most logical first step to solving any question, right? We have to be able to articulate exactly what the issue is. Start by writing down the problem without going into the specifics, such as how the data is structured or which algorithm we think could effectively solve the problem.

Then try explaining the problem to your niece or nephew, who is a freshman in high school. It is easier than explaining the problem to a third-grader, but you still can’t dive into statistical uncertainty or convolutional versus recurrent neural networks. The act of explaining the problem at a high school stats and computer science level makes your problem, and the solution, accessible to everyone within your or your client’s organization, from the junior data scientists to the Chief Legal Officer.

Clearly defining our business problem showcases how data science is used to solve real-world problems. This high-level thinking provides us with a foundation for solving the problem. Here are a few other business problem definitions we should think about.

Who are the stakeholders for this project?
Have we solved similar problems before?
Has someone else documented solutions to similar problems?
Can we reframe the problem in any way?

And don’t be fooled by these deceivingly simple questions. Sometimes more generalized questions can be very difficult to answer. But, we believe answering these framing question is the first, and possibly most important, step in the process, because it makes the rest of the effort actionable.

Say we work at a video game company — let’s call the company Rocinante. Our business is built on customers subscribing to our massive online multiplayer game. Users are billed monthly. We have data about users who have cancelled their subscription and those who have continued to renew month after month. Our management team wants us to analyze our customer data.

Well, as a company, the Rocinante wants to be able to predict whether or not customers will cancel their subscription . We want to be able to predict which customers will churn, in order to address the core reasons why customers unsubscribe. Additionally, we need a plan to target specific customers with more proactive retention strategies.

Churn is the turnover of customers, also referred to as customer death. In a contractual setting - such as when a user signs a contract to join a gym - a customer “dies” when they cancel their gym membership. In a non-contractual setting, customer death is not observed and is more difficult to model. For example, Amazon does not know when you have decided to never-again purchase Adidas. Your customer death as an Amazon or Adidas customer is implied.

Possible Solutions

What are the approaches we can use to solve this problem.

There are many instances when we shouldn’t be using machine learning to solve a problem. Remember, data science is one of many tools in the toolbox. There could be a simpler, and maybe cheaper, solution out there. Maybe we could answer a question by looking at descriptive statistics around web analytics data from Google Analytics. Maybe we could solve the problem with user interviews and hear what the users think in their own words. This question aims to see if spinning up EC2 instances on Amazon Web Services is worth it. If the answer to, “Is there a simple solution,” is, “No,” then we can ask, “ Can we use data science to solve this problem? ” This yes or no question brings about two follow-up questions:

“ Is the data available to solve this problem? ” A data scientist without data is not a very helpful individual. Many of the data science techniques that are highlighted in media today — such as deep learning with artificial neural networks — requires a massive amount of data. A hundred data points is unlikely to provide enough data to train and test a model. If the answer to this question is no, then we can consider acquiring more data and pipelining that data to warehouses, where it can be accessed at a later date.
“ Who are the team members we need in order to solve this problem? ” Your initial answer to this question will be, “The data scientist, of course!” The vast majority of the problems we face at Viget can’t or shouldn’t be solved by a lone data scientist because we are solving business problems. Our data scientists team up with UXers , designers , developers , project managers , and hardware developers to develop digital strategies and solving data science problems is one part of that strategy. Siloing your problem and siloing your data scientists isn’t helpful for anyone.

We want to predict when a customer will unsubscribe from Rocinante’s flagship game. One simple approach to solving this problem would be to take the average customer life - how long a gamer remains subscribed - and predict that all customers will churn after X amount of time. Say our data showed that on average customers churned after 72 months of subscription. Then we could predict a new customer would churn after 72 months of subscription. We test out this hypothesis on new data and learn that it is wildly inaccurate. The average customer lifetime for our previous data was 72 months, but our new batch of data had an average customer lifetime of 2 months. Users in the second batch of data churned much faster than those in the first batch. Our prediction of 72 months didn’t generalize well. Let’s try a more sophisticated approach using data science.

Is the data available to solve this problem? The dataset contains 12,043 rows of data and 49 features. We determine that this sample of data is large enough for our use-case. We don’t need to deploy Rocinante’s data engineering team for this project.
Who are the team members we need in order to solve this problem? Let’s talk with the Rocinante’s data engineering team to learn more about their data collection process. We could learn about biases in the data from the data collectors themselves. Let’s also chat with the customer retention and acquisitions team and hear about their tactics to reduce churn. Our job is to analyze data that will ultimately impact their work. Our project team will consist of the data scientist to lead the analysis, a project manager to keep the project team on task, and a UX designer to help facilitate research efforts we plan to conduct before and after the data analysis.

How do we know if we have successfully solved the problem?

At Viget, we aim to be data-informed, which means we aren’t blindly driven by our data, but we are still focused on quantifiable measures of success. Our data science problems are held to the same standard. What are the ways in which this problem could be a success? What are the ways in which this problem could be a complete and utter failure? We often have specific success metrics and Key Performance Indicators (KPIs) that help us answer these questions.

Our UX coworker has interviewed some of the other stakeholders at Rocinante and some of the gamers who play our game. Our team believes if our analysis is inconclusive, and we continue the status quo, the project would be a failure. The project would be a success if we are able to predict a churn risk score for each subscriber. A churn risk score, coupled with our monthly churn rate (the rate at which customers leave the subscription service per month), will be useful information. The customer acquisition team will have a better idea of how many new users they need to acquire in order to keep the number of customers the same, and how many new users they need in order to grow the customer base.

Data Science-ing

What do we need to learn about the data and what analysis do we need to conduct.

At the heart of solving a data science problem are hundreds of questions. I attempted to ask these and similar questions last year in a blog post, Data Science Workflow . Below are some of the most crucial — they’re not the only questions you could face when solving a data science problem, but are ones that our team at Viget thinks about on nearly every data problem.

What do we need to learn about the data?
What type of exploratory data analysis do we need to conduct?
Where is our data coming from?
What is the current state of our data?
Is this a supervised or unsupervised learning problem?
Is this a regression, classification, or clustering problem?
What biases could our data contain?
What type of data cleaning do we need to do?
What type of feature engineering could be useful?
What algorithms or types of models have been proven to solve similar problems well?
What evaluation metric are we using for our model?
What is our training and testing plan?
How can we tweak the model to make it more accurate, increase the ROC/AUC, decrease log-loss, etc. ?
Have we optimized the various parameters of the algorithm? Try grid search here.
Is this ethical?

That last question raises the conversation about ethics in data science. Unfortunately, there is no hippocratic oath for data scientists, but that doesn’t excuse the data science industry from acting unethically. We should apply ethical considerations to our standard data science workflow. Additionally, ethics in data science as a topic deserves more than a paragraph in this article — but I wanted to highlight that we should be cognizant and practice only ethical data science.

Let’s get started with the analysis. It’s time to answer the data science questions. Because this is an example, the answer to these data science questions are entirely hypothetical.

We need to learn more about the time series nature of our data, as well as the format.
We should look into average customer lifetime durations and summary statistics around some of the features we believe could be important.
Our data came from login data and customer data, compiled by Rocinante’s data engineering team.
The data needs to be cleaned, but it is conveniently in a PostgreSQL database.
This is a supervised learning problem because we know which customers have churned.
This is a binary classification problem.
After conducting exploratory data analysis and speaking with the data engineering team, we do not see any biases in the data.
We need to reformat some of the data and use missing data imputation for features we believe are important but have some missing data points.
With 49 good features, we don’t believe we need to do any feature engineering.
We have used random forests, XGBoost, and standard logistic regressions to solve classification problems.
We will use ROC-AUC score as our evaluation metric.
We are going to use a training-test split (80% training, 20% test) to evaluate our model.
Let’s remove features that are statistically insignificant from our model to improve the ROC-AUC score.
Let’s optimize the parameters within our random forests model to improve the ROC-AUC score.
Our team believes we are acting ethically.

This process may look deceivingly linear, but data science is often a nonlinear practice. After doing all of the work in our example above, we could still end up with a model that doesn’t generalize well. It could be bad at predicting churn in new customers. Maybe we shouldn’t have assumed this problem was a binary classification problem and instead used survival regression to solve the problem. This part of the project will be filled with experimentation, and that’s totally normal.

Communication

What is the best way to communicated and circulate our results.

Our job is typically to bring our findings to the client, explain how the process was a success or failure, and explain why. Communicating technical details and explaining to non-technical audiences is important because not all of our clients have degrees in statistics. There are three ways in which communication of technical details can be advantageous:

It can be used to inspire confidence that the work is thorough and multiple options have been considered.
It can highlight technical considerations or caveats that stakeholders and decision-makers should be aware of.
It can offer resources to learn more about specific techniques applied.
It can provide supplemental materials to allow the findings to be replicated where possible.

We often use blog posts and articles to circulate our work. They help spread our knowledge and the lessons we learned while working on a project to peers. I encourage every data scientist to engage with the data science community by attending and speaking at meetups and conferences, publishing their work online, and extending a helping hand to other curious data scientists and analysts.

Our method of binary classification was in fact incorrect, so we ended up using survival regression to determine there are four features that impact churn: gaming platform, geographical region, days since last update, and season. Our team aggregates all of our findings into one report, detailing the specific techniques we used, caveats about the analysis, and the multiple recommendations from our team to the customer retention and acquisition team. This report is full of the nitty-gritty details that the more technical folks, such as the data engineering team, may appreciate. Our team also creates a slide deck for the less-technical audience. This deck glosses over many of the technical details of the project and focuses on recommendations for the customer retention and acquisition team.

We give a talk at a local data science meetup, going over the trials, tribulations, and triumphs of the project and sharing them with the data science community at large.

Why are we doing all of this?

I ask myself this question daily — and not in the metaphysical sense, but in the value-driven sense. Is there value in the work we have done and in the end result? I hope the answer is yes. But, let’s be honest, this is business. We don’t have three years to put together a PhD thesis-like paper. We have to move quickly and cost-effectively. Critically evaluating the value ultimately created will help you refine your approach to the next project. And, if you didn’t produce the value you’d originally hoped, then at the very least, I hope you were able to learn something and sharpen your data science skills.

Rocinante has a better idea of how long our users will remain active on the platform based on user characteristics, and can now launch preemptive strikes in order to retain those users who look like they are about to churn. Our team eventually develops a system that alerts the customer retention and acquisition team when a user may be about to churn, and they know to reach out to that user, via email, encouraging them to try out a new feature we recently launched. Rocinante is making better data-informed decisions based on this work, and that’s great!

I hope this article will help guide your next data science project and get the wheels turning in your own mind. Maybe you will be the creator of a data science framework the world adopts! Let me know what you think about the questions, or whether I’m missing anything, in the comments below.

Start Your Project With an Innovation Workshop

Kate Trenerry

Charting New Paths to Startup Product Development

Making a Business Case for Your Website Project

The viget newsletter.

Nobody likes popups, so we waited until now to recommend our newsletter, featuring thoughts, opinions, and tools for building a better digital world. Read the current issue.

Subscribe Here (opens in new window)

Share this page
Post this page

Practice Exams

Course Notes

Infographics

Career Guides

A selection of practice exams that will test your current data science knowledge. Identify key areas of improvement to strengthen your theoretical preparation, critical thinking, and practical problem-solving skills so you can get one step closer to realizing your professional goals.

Excel Mechanics

Imagine if you had to apply the same Excel formatting adjustment to both Sheet 1 and Sheet 2 (i.e., adjust font, adjust fill color of the sheets, add a couple of empty rows here and there) which contain thousands of rows. That would cost an unjustifiable amount of time. That is where advanced Excel skills come in handy as they optimize your data cleaning, formatting and analysis process and shortcut your way to a job well-done. Therefore, asses your Excel data manipulation skills with this free practice exam.

Formatting Excel Spreadsheets

Did you know that more than 1 in 8 people on the planet uses Excel and that Office users typically spend a third of their time in Excel. But how many of them use the popular spreadsheet tool efficiently? Find out where you stand in your Excel skills with this free practice exam where you are a first-year investment banking analyst at one of the top-tier banks in the world. The dynamic nature of your position will test your skills in quick Excel formatting and various Excel shortcuts

Hypothesis Testing

Whenever we need to verify the results of a test or experiment we turn to hypothesis testing. In this free practice exam you are a data analyst at an electric car manufacturer, selling vehicles in the US and Canada. Currently the company offers two car models – Apollo and SpeedX. You will need to download a free Excel file containing the car sales of the two models over the last 3 years in order find out interesting insights and test your skills in hypothesis testing.

Confidence Intervals

Confidence Intervals refers to the probability of a population parameter falling between a range of certain values. In this free practice exam, you lead the research team at a portfolio management company with over $50 billion dollars in total assets under management. You are asked to compare the performance of 3 funds with similar investment strategies and are given a table with the return of the three portfolios over the last 3 years. You will have to use the data to answer questions that will test your knowledge in confidence intervals.

Fundamentals of Inferential Statistics

While descriptive statistics helps us describe and summarize a dataset, inferential statistics allows us to make predictions based off data. In this free practice exam, you are a data analyst at a leading statistical research company. Much of your daily work relates to understanding data structures and processes, as well as applying analytical theory to real-world problems on large and dynamic datasets. You will be given an excel dataset and will be tested on normal distribution, standardizing a dataset, the Central Limit Theorem among other inferential statistics questions.

Fundamentals of Descriptive Statistics

Descriptive statistics helps us understand the actual characteristics of a dataset by generating summaries about data samples. The most popular types of descriptive statistics are measures of center: median, mode and mean. In this free practice exam you have been appointed as a Junior Data Analyst at a property developer company in the US, where you are asked to evaluate the renting prices in 9 key states. You will work with a free excel dataset file that contains the rental prices and houses over the last years.

Jupyter Notebook Shortcuts

In this free practice exam you are an experienced university professor in Statistics who is looking to upskill in data science and has joined the data science apartment. As on of the most popular coding environments for Python, your colleagues recommend you learn Jupyter Notebook as a beginner data scientist. Therefore, in this quick assessment exam you are going to be tested on some basic theory regarding Jupyter Notebook and some of its shortcuts which will determine how efficient you are at using the environment.

Intro to Jupyter Notebooks

Jupyter is a free, open-source interactive web-based computational notebook. As one of the most popular coding environments for Python and R, you are inevitably going to encounter Jupyter at some point in you data science journey, if you have not already. Therefore, in this free practice exam you are a professor of Applied Economics and Finance who is learning how to use Jupyter. You are going to be tested on the very basics of the Jupyter environment like how to set up the environment and some Jupyter keyboard shortcuts.

Black-Scholes-Merton Model in Python

The Black Scholes formula is one of the most popular financial instruments used in the past 40 years. Derived by Fisher, Black Myron Scholes and Robert Merton in 1973, it has become the primary tool for derivative pricing. In this free practice exam, you are a finance student whose Applied Finance is approaching and is asked to perform the Black-Scholes-Merton formula in Python by working on a dataset containing Tesla’s stock prices for the period between mid-2010 and mid-2020.

Python for Financial Analysis

In a heavily regulated industry like fintech, simplicity and efficiency is key. Which is why Python is the preferred choice for programming language over the likes of Java or C++. In this free practice exam you are a university professor of Applied Economics and Finance, who is focused on running regressions and applying the CAPM model on the NASDAQ and The Coca-Cola Company Dataset for the period between 2016 and 2020 inclusive. Make sure to have the following packages running to complete your practice test: pandas, numpy, api, scipy, and pyplot as plt.

Python Finance

Python has become the ideal programming language for the financial industry, as more and more hedge funds and large investment banks are adopting this general multi-purpose language to solve their quantitative problems. In this free practice exam on Python Finance, you are part of the IT team of a huge company, operating in the US stock market, where you are asked to analyze the performance of three market indices. The packages you need to have running are numpy, pandas and pyplot as plt.

Machine Learning with KNN

KNN is a popular supervised machine learning algorithm that is used for solving both classification and regression problems. In this free practice exam, this is exactly what you are going to be asked to do, as you are required to create 2 datasets for 2 car dealerships in Jupyter Notebook, fit the models to the training data, find the set of parameters that best classify a car, construct a confusion matrix and more.

Excel Functions

The majority of data comes in spreadsheet format, making Excel the #1 tool of choice for professional data analysts. The ability to work effectively and efficiently in Excel is highly desirable for any data practitioner who is looking to bring value to a company. As a matter of fact, being proficient in Excel has become the new standard, as 82% of middle-skill jobs require competent use of the productivity software. Take this free Excel Functions practice exam and test your knowledge on removing duplicate values, transferring data from one sheet to another, rand using the VLOOKUP and SUMIF function.

Useful Tools in Excel

What Excel lacks in data visualization tools compared to Tableau, or computational power for analyzing big data compared to Python, it compensates with accessibility and flexibility. Excel allows you to quickly organize, visualize and perform mathematical functions on a set of data, without the need for any programming or statistical skills. Therefore, it is in your best interest to learn how to use the various Excel tools at your disposal. This practice exam is a good opportunity to test your excel knowledge in the text to column functions, excel macros, row manipulation and basic math formulas.

Excel Basics

Ever since its first release in 1985, Excel continues to be the most popular spreadsheet application to this day- with approximately 750 million users worldwide, thanks to its flexibility and ease of use. No matter if you are a data scientist or not, knowing how to use Excel will greatly improve and optimize your workflow. Therefore, in this free Excel Basics practice exam you are going to work with a dataset of a company in the Fast Moving Consumer Goods Sector as an aspiring data analyst and test your knowledge on basic Excel functions and shortcuts.

A/B Testing for Social Media

In this free A/B Testing for Social Media practice exam, you are an experienced data analyst who works at a new social media company called FilmIt. You are tasked with the job of increasing user engagement by applying the correct modifications to how users move on to the next video. You decide that the best approach is by conducting a A/B test in a controlled environment. Therefore, in order to successfully complete this task, you are going to be tested on statistical significance, 2 tailed-tests and choosing the success metrics.

Fundamentals of A/B Testing

A/B Testing is a powerful statistical tool used to compare the results between two versions of the same marketing asset such as a webpage or email in a controlled environment. An example of A/B testing is when Electronic Arts created a variation version of the sales page for the popular SimCity 5 simulation game, which performed 40% better than the control page. Speaking about video games, in this free practice test, you are a data analyst who is tasked with the job to conduct A/B testing for a game developer. You are going to be asked to choose the best way to perform an A/B test, identify the null hypothesis, choose the right evaluation metrics, and ultimately increase revenue through in-game ads.

Grey Cover of Intro to Machine Learning. The practice exam resource is from 365 Data Science.

Introduction to Data Science Disciplines

The term “Data Science” dates back to the 1960s, to describe the emerging field of working with large amounts of data that drives organizational growth and decision-making. While the essence has remained the same, the data science disciplines have changed a lot over the past decades thanks to rapid technological advancements. In this free introduction to data science practice exam, you will test your understanding of the modern day data science disciplines and their role within an organization.

Advanced SQL

In this free Advanced SQL practice exam you are a sophomore Business student who has decided to focus on improving your coding and analytical skills in the areas of relational database management systems. You are given an employee dataset containing information like titles, salaries, birth dates and department names, and are required to come up with the correct answers. This free SQL practice test will evaluate your knowledge on MySQL aggregate functions , DML statements (INSERT, UPDATE) and other advanced SQL queries.

Most Popular Practice Exams

Check out our most helpful downloadable resources according to 365 Data Science students and our expert team of instructors.

Join 2M+ Students and Start Learning

Learn from the best, develop an invaluable skillset, and secure a job in data science.

Data Analytics with R

1 problem solving with data, 1.1 introduction.

This chapter will introduce you to a general approach to solving problems and answering questions using data. Throughout the rest of the module, we will reference back to this chapter as you work your way through your own data analysis exercises.

The approach is applicable to actuaries, data scientists, general data analysts, or anyone who intends to critically analyze data and develop insights from data.

This framework, which some may refer to as The Data Science Process includes the following five main components:

Data Collection
Data Cleaning
Exploratory Data Analysis
Model Building
Inference and Communication

Note that all five steps may not be applicable in every situation, but these steps should guide you as you think about how to approach each analysis you perform.

In the subsections below, we’ll dive into each of these in more detail.

1.2 Data Collection

In order to solve a problem or answer a question using data, it seems obvious that you must need some sort of data to start with. Obtaining data may come in the form of pre-existing or generating new data (think surveys). As an actuary, your data will often come from pre-existing sources within your company. This could include querying data from databases or APIs, being sent excel files, text files, etc. You may also find supplemental data online to assist you with your project.

For example, let’s say you work for a health insurance company and you are interested in determining the average drive time for your insured population to the nearest in-network primary care providers to see if it would be prudent to contract with additional doctors in the area. You would need to collect at least three pieces of data:

Addresses of your insured population (internal company source/database)
Addresses of primary care provider offices (internal company source/database)
Google Maps travel time API to calculate drive times between addresses (external data source)

In summary, data collection provides the fundamental pieces needed to solve your problem or answer your question.

1.3 Data Cleaning

We’ll discuss data cleaning in a little more detail in later chapters, but this phase generally refers to the process of taking the data you collected in step 1, and turning it into a usable format for your analysis. This phase can often be the most time consuming as it may involve handling missing data as well as pre-processing the data to be as error free as possible.

Depending on where you source your data will have major implications for how long this phase takes. For example, many of us actuaries benefit from devoted data engineers and resources within our companies who exert much effort to make our data as clean as possible for us to use. However, if you are sourcing your data from raw files on the internet, you may find this phase to be exceptionally difficult and time intensive.

1.4 Exploratory Data Analysis

Exploratory Data Analysis , or EDA, is an entire subject itself. In short, EDA is an iterative process whereby you:

Generate questions about your data
Search for answers, patterns, and characteristics of your data by transforming, visualizing, and summarizing your data
Use learnings from step 2 to generate new questions and insights about your data

We’ll cover some basics of EDA in Chapter 4 on Data Manipulation and Chapter 5 on Data Visualization, but we’ll only be able to scratch the surface of this topic.

A successful EDA approach will allow you to better understand your data and the relationships between variables within your data. Sometimes, you may be able to answer your question or solve your problem after the EDA step alone. Other times, you may apply what you learned in the EDA step to help build a model for your data.

1.5 Model Building

In this step, we build a model, often using machine learning algorithms, in an effort to make sense of our data and gain insights that can be used for decision making or communicating to an audience. Examples of models could include regression approaches, classification algorithms, tree-based models, time-series applications, neural networks, and many, many more. Later in this module, we will practice building our own models using introductory machine learning algorithms.

It’s important to note that while model building gets a lot of attention (because it’s fun to learn and apply new types of models), it typically encompasses a relatively small portion of your overall analysis from a time perspective.

It’s also important to note that building a model doesn’t have to mean applying machine learning algorithms. In fact, in actuarial science, you may find more often than not that the actuarial models you create are Microsoft Excel-based models that blend together historical data, assumptions about the business, and other factors that allow you make projections or understand the business better.

1.6 Inference and Communication

The final phase of the framework is to use everything you’ve learned about your data up to this point to draw inferences and conclusions about the data, and to communicate those out to an audience. Your audience may be your boss, a client, or perhaps a group of actuaries at an SOA conference.

In any instance, it is critical for you to be able to condense what you’ve learned into clear and concise insights and convince your audience why your insights are important. In some cases, these insights will lend themselves to actionable next steps, or perhaps recommendations for a client. In other cases, the results will simply help you to better understand the world, or your business, and to make more informed decisions going forward.

1.7 Wrap-Up

As we conclude this chapter, take a few minutes to look at a couple alternative visualizations that others have used to describe the processes and components of performing analyses. What do they have in common?

Karl Rohe - Professor of Statistics at the University of Wisconsin-Madison
Chanin Nantasenamat - Associate Professor of Bioinformatics and Youtuber at the “Data Professor” channel

Solving Data Science Problems

Researchers at the University of Hong Kong, Peking University, Stanford University, the University of California, Berkeley, the University of Washington, Carnegie Mellon University, and Meta have created a dataset of 1,000 data science questions from 451 problems found on Stack Overflow, a collective knowledge platform for programmers. Researchers can use the dataset to train AI systems to solve data science problems.

Get the data.

Image credit: Flickr user Christiaan Colen

Morgan Stevens

Morgan Stevens is a Research Assistant at the Center for Data Innovation. She holds a J.D. from the Sandra Day O'Connor College of Law at Arizona State University and a B.A. in Economics and Government from the University of Texas at Austin.

Visualizing Household Emission Levels

10 bits: the data news hotlist, you may also like, tracking opioid settlement payouts, tracking plague deaths in medieval london, tracking state lawmakers, enhancing diversity in dermatology datasets, improving autonomous vehicle capabilities, tracking 200 years of u.s. settlement, capturing india’s linguistic diversity, improving ev charging, bringing equity in health outcomes, tracking global land and ocean temperature.

Foundations of Mathematical Modelling for Engineering Problem Solving pp 87–141 Cite as

Data Science Problems

Parikshit Narendra Mahalle 7 ,
Nancy Ambritta P. 8 ,
Sachin R. Sakhare 9 &
Atul P. Kulkarni 10
First Online: 11 January 2023

183 Accesses

1 Citations

Part of the book series: Studies in Autonomic, Data-driven and Industrial Computing ((SADIC))

Data are elements or information that are usually numerical and are collected by observation. It can also be defined as a set of values (quality or quality) related to people or things.

This is a preview of subscription content, log in via an institution .

Buying options

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Trivedi KS (2008) Probability & statistics with reliability, queing, and computer science applications” PHI

Google Scholar

Kothari CR (2004) Research methodology (2nd edn), New age international. ISBN(13): 978-81-224-1522-3

Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco

MATH Google Scholar

Aho, Hopcraft, Ullman (1974) The design and analysis of computer algorithms. Addison Wesley

Mahalle PN, Shinde GR, Pise PJ, Deshmukh JY (2021) Foundations of data science for engineering problem solving (1st edn). Springer Verlag, Singapore. ISBN: 9789811651595

Mahalle PN, Nancy AP, Shinde GR, Vinayak Deshpande A (2021) The convergence of internet of things and cloud for smart computing (1st edn). CRC Press. https://doi.org/10.1201/9781003189091

Stančin I, Jović A (2019) An overview and comparison of free Python libraries for data mining and big data analysis. In: 2019 42nd International convention on information and communication technology, electronics and microelectronics (MIPRO), pp 977–982. https://doi.org/10.23919/MIPRO.2019.8757088

Sousa R, Miranda R, Moreira A, Alves C, Lori N, Machado J (2021) Software tools for conducting real-time information processing and visualization in industry: an up-to-date review. Appl Sci 11:4800. https://doi.org/10.3390/app11114800

Nasrabadi AM, Eslaminia AR, Enayati AMS, Alibiglou L, Behzadipour S (2019) Optimal sensor configuration for activity recognition during whole-body exercises. In: 2019 7th international conference on robotics and mechatronics (ICRoM), pp 512–518. https://doi.org/10.1109/ICRoM48714.2019.9071849

Rahman SAZ, Chandra Mitra K, Mohidul Islam SM (2018) Soil classification using machine learning methods and crop suggestion based on soil series. In: 2018 21st international conference of computer and information technology (ICCIT), pp 1–4. https://doi.org/10.1109/ICCITECHN.2018.8631943

Download references

Author information

Authors and affiliations.

Department of Artificial Intelligence and Data Science, Vishwakarma Institute of Information Technology, Pune, India

Parikshit Narendra Mahalle

Glareal Software Solutions PTE. Ltd., Singapore, Singapore

Nancy Ambritta P.

Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India

Sachin R. Sakhare

Department of Mechanical Engineering, Vishwakarma Institute of Information Technology, Pune, India

Atul P. Kulkarni

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nancy Ambritta P. .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter.

Mahalle, P.N., Ambritta P., N., Sakhare, S.R., Kulkarni, A.P. (2023). Data Science Problems. In: Foundations of Mathematical Modelling for Engineering Problem Solving. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-19-8828-8_6

Download citation

DOI : https://doi.org/10.1007/978-981-19-8828-8_6

Published : 11 January 2023

Publisher Name : Springer, Singapore

Print ISBN : 978-981-19-8827-1

Online ISBN : 978-981-19-8828-8

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

For enquiries call:

+1-469-442-0620

Data Science

Common Data Science Challenges of 2024 [with Solution]

Home Blog Data Science Common Data Science Challenges of 2024 [with Solution]

Data is the new oil for companies. Since then, it has been a standard aspect of every choice made. Increasingly, businesses rely on analytics and data to strengthen their brand's position in the market and boost revenue.

Information now has more value than physical metals. According to a poll conducted by NewVantage Partners in 2017, 85% of businesses are making an effort to become data-driven, and the worldwide data science platform market is projected to grow to $128.21 billion by 2022, from only $19.75 billion in 2016.

Data science is not a meaningless term with no practical applications. Yet, many businesses have difficulty reorganizing their decision-making around data and implementing a consistent data strategy. Lack of information is not the issue.

Our daily data production has reached 2.5 quintillion bytes, which is so huge that it is impossible to completely understand the breakneck speed at which we produce new data. Ninety percent of all global data was generated in the previous few years.

The actual issue is that businesses aren't able to properly use the data they already collect to get useful insights that can be utilized to improve decision-making, counteract risks, and protect against threats.

It is vital for businesses to know how to approach a new data science challenge and understand what kinds of questions data science can answer since there is frequently too much data accessible to make a clear choice. One must have a look at Data Science Course Subjects for an outstanding career in Data Science.

What is Data Science Challenges?

Data science is an application of the scientific method that utilizes data and analytics to address issues that are often difficult (or multiple) and unstructured. The phrase "fishing expedition" comes from the field of analytics and refers to a project that was never structured appropriately, to begin with, and entails searching through the data for unanticipated connections. This particular kind of "data fishing" does not adhere to the principles of efficient data science; nonetheless, it is still rather common. Therefore, the first thing that needs to be done is to clearly define the issue. In the past, we put out an idea for

"The study of statistics and data is not a kind of witchcraft. They will not, by any means, solve all of the issues that plague a corporation. According to Seattle Data Guy, a data-driven consulting service, "but, they are valuable tools that assist organizations make more accurate judgments and automate repetitious labor and choices that teams need to make."

The following are some of the categories that may be used to classify the problems that can be solved with the assistance of data science:

Finding patterns in massive data sets : Which of the servers in my server farm need the most maintenance?
Detecting deviations from the norm in huge data sets : Is this particular mix of acquisitions distinct from what this particular consumer has previously ordered?
The process of estimating the possibility of something occurring : What are the chances that this person will click on my video?
illustrating the ways in which things are related to one another : What exactly is the focus of this article that I saw online?
Categorizing specific data points: Which animal do you think this picture depicts a kitty or a mouse?

Of course, the aforementioned is in no way a comprehensive list of all the questions that can be answered by data science. Even if it were, the field of data science is advancing at such a breakneck speed that it is quite possible that it would be rendered entirely irrelevant within a year or two of its release.

It is time to write out the stages that the majority of data scientists would follow when tackling a new data science challenge now that we have determined the categories of questions that may be fairly anticipated to be solved with the assistance of data science. Data Science Bootcamp review is for people struggling to make a breakthrough in this domain.

Common Data Science Problems Faced by Data Scientists

1. preparation of data for smart enterprise ai.

Finding and cleaning up the proper data is a data scientist's priority. Nearly 80% of a data scientist's day is spent on cleaning, organizing, mining, and gathering data, according to a CrowdFlower poll. In this stage, the data is double-checked before undergoing additional analysis and processing. Most data scientists (76%) agree that this is one of the most tedious elements of their work. As part of the data wrangling process, data scientists must efficiently sort through terabytes of data stored in a wide variety of formats and codes on a wide variety of platforms, all while keeping track of changes to such data to avoid data duplication.

Adopting AI-based tools that help data scientists maintain their edge and increase their efficacy is the best method to deal with this issue. Another flexible workplace AI technology that aids in data preparation and sheds light on the topic at hand is augmented learning.

2. Generation of Data from Multiple Sources

Data is obtained by organizations in a broad variety of forms from the many programs, software, and tools that they use. Managing voluminous amounts of data is a significant obstacle for data scientists. This method calls for the manual entering of data and compilation, both of which are time-consuming and have the potential to result in unnecessary repeats or erroneous choices. The data may be most valuable when exploited effectively for maximum usefulness in company artificial intelligence .

Companies now can build up sophisticated virtual data warehouses that are equipped with a centralized platform to combine all of their data sources in a single location. It is possible to modify or manipulate the data that is stored in the central repository to satisfy the needs of a company and increase its efficiency. This easy-to-implement modification has the potential to significantly reduce the amount of time and labor required by data scientists.

3. Identification of Business Issues

Identifying issues is a crucial component of conducting a solid organization. Before constructing data sets and analyzing data, data scientists should concentrate on identifying enterprise-critical challenges. Before establishing the data collection, it is crucial to determine the source of the problem rather than immediately resorting to a mechanical solution.

Before commencing analytical operations, data scientists may have a structured workflow in place. The process must consider all company stakeholders and important parties. Using specialized dashboard software that provides an assortment of visualization widgets, the enterprise's data may be rendered more understandable.

4. Communication of Results to Non-Technical Stakeholders

The primary objective of a data scientist is to enhance the organization's capacity for decision-making, which is aligned with the business plan that its function supports. The most difficult obstacle for data scientists to overcome is effectively communicating their findings and interpretations to business leaders and managers. Because the majority of managers or stakeholders are unfamiliar with the tools and technologies used by data scientists, it is vital to provide them with the proper foundation concept to apply the model using business AI.

In order to provide an effective narrative for their analysis and visualizations of the notion, data scientists need to incorporate concepts such as "data storytelling."

5. Data Security

Due to the need to scale quickly, businesses have turned to cloud management for the safekeeping of their sensitive information. Cyberattacks and online spoofing have made sensitive data stored in the cloud exposed to the outside world. Strict measures have been enacted to protect data in the central repository against hackers. Data scientists now face additional challenges as they attempt to work around the new restrictions brought forth by the new rules.

Organizations must use cutting-edge encryption methods and machine learning security solutions to counteract the security threat. In order to maximize productivity, it is essential that the systems be compliant with all applicable safety regulations and designed to deter lengthy audits.

6. Efficient Collaboration

It is common practice for data scientists and data engineers to collaborate on the same projects for a company. Maintaining strong lines of communication is very necessary to avoid any potential conflicts. To guarantee that the workflows of both teams are comparable, the institution hosting the event should make the necessary efforts to establish clear communication channels. The organization may also choose to establish a Chief Officer position to monitor whether or not both departments are functioning along the same lines.

7. Selection of Non-Specific KPI Metrics

It is a common misunderstanding that data scientists can handle the majority of the job on their own and come prepared with answers to all of the challenges that are encountered by the company. Data scientists are put under a great deal of strain as a result of this, which results in decreased productivity.

It is vital for any company to have a certain set of metrics to measure the analyses that a data scientist presents. In addition, they have the responsibility of analyzing the effects that these indicators have on the operation of the company.

The many responsibilities and duties of a data scientist make for a demanding work environment. Nevertheless, it is one of the occupations that are in most demand in the market today. The challenges that are experienced by data scientists are simply solvable difficulties that may be used to increase the functionality and efficiency of workplace AI in high-pressure work situations.

Types of Data Science Challenges/Problems

1. data science business challenges.

Listening to important words and phrases is one of the responsibilities of a data scientist during an interview with a line-of-business expert who is discussing a business issue. The data scientist breaks the issue down into a procedural flow that always involves a grasp of the business challenge, a comprehension of the data that is necessary, as well as the many forms of artificial intelligence (AI) and data science approaches that can address the problem. This information, when taken as a whole, serves as the impetus behind an iterative series of thought experiments, modeling methodologies, and assessment of the business objectives.

The company itself has to remain the primary focus. When technology is used too early in a process, it may lead to the solution focusing on the technology itself, while the original business challenge may be ignored or only partially addressed.

Artificial intelligence and data science demand a degree of accuracy that must be captured from the beginning:

Describe the issue that needs to be addressed.
Provide as much detail as you can on each of the business questions.
Determine any additional business needs, such as maintaining existing client relationships while expanding potential for upselling and cross-selling.
Specify the predicted advantages in terms of how they will affect the company, such as a 10% reduction in the customer turnover rate among high-value clients.

2. Real Life Data Science Problems

Data science is the use of hybrid mathematical and computer science models to address real-world business challenges in order to get actionable insights. It is willing to take the risk of venturing into the unknown domain of 'unstructured' data in order to get significant insights that assist organizations in improving their decision-making.

Managing the placement of digital advertisements using computerized processes.
The search function will be improved by the use of data science and sophisticated analytics.
Using data science for producing data-driven crime predictions
Utilizing data science in order to avoid breaking tax laws

3. Data Science Challenges In Healthcare And Example

It has been calculated that each human being creates around 2 gigabytes of data per day. These measurements include brain activity, tension, heart rate, blood sugar, and many more. These days, we have more sophisticated tools, and Data Science is one among them, to deal with such a massive data volume. This system aids in keeping tabs on a patient's health by recording relevant information.

The use of Data Science in medicine has made it feasible to spot the first signs of illness in otherwise healthy people. Doctors may now check up on their patients from afar thanks to a host of cutting-edge technology.

Historically, hospitals and their staffs have struggled to care for large numbers of patients simultaneously. The patients' ailments used to worsen because of a lack of adequate care.

A) Medical Image Analysis: Focusing on the efforts connected to the applications of computer vision, virtual reality, and robotics to biomedical imaging challenges, Medical Image Analysis offers a venue for the dissemination of new research discoveries in the area of medical and biological image analysis. It publishes high-quality, original research articles that advance our understanding of how to best process, analyze, and use medical and biological pictures in these contexts. Methods that make use of molecular/cellular imaging data as well as tissue/organ imaging data are of interest to the journal. Among the most common sources of interest for biomedical image databases are those gathered from:

Magnetic resonance
Ultrasound
Computed tomography
Nuclear medicine
X-ray
Optical and Confocal Microscopy
Video and range data images

Procedures such as identifying cancers, artery stenosis, and organ delineation use a variety of different approaches and frameworks like MapReduce to determine ideal parameters for tasks such as lung texture categorization. Examples of these procedures include:

The categorization of solid textures is accomplished by the use of machine learning techniques, support vector machines (SVM), content-based medical picture indexing, and wavelet analysis.

B) Drug Research and Development: The ever-increasing human population brings a plethora of new health concerns. Possible causes include insufficient nutrition, stress, environmental hazards, disease, etc. Medical research facilities now under pressure to rapidly discover treatments or vaccinations for many illnesses. It may take millions of test cases to uncover a medicine's formula since scientists need to learn about the properties of the causal agent. Then, once they have a recipe, researchers must put it through its paces in a battery of experiments.

Previously, it took a team of researchers 10–12 years to sift through the information of the millions of test instances stated above. However, with the aid of Data Science's many medical applications, this process is now simplified. It is possible to process data from millions of test cases in a matter of months, if not weeks. It's useful for analyzing the data that shows how well the medicine works. So, the vaccine or drug may be available to the public in less than a year if all tests go well. Data Science and machine learning make this a reality. Both have been game-changing for the pharmaceutical industry's R&D departments. As we go forward, we shall see Data Science's use in genomics. Data analytics played a crucial part in the rapid development of a vaccine against the global pandemic Corona-virus.

C) Genomics and Bioinformatics: One of the most fascinating parts of modern medicine is genomics. Human genomics focuses on the sequencing and analysis of genomes, which are made up of the genetic material of living organisms. Genealogical studies pave the way for cutting-edge medical interventions. Investigating DNA for its peculiarities and quirks is what genomics is all about. It also aids in determining the link between a disease's symptoms and the patient's actual health. Drug response analysis for a certain DNA type is also a component of genomics research.

Before the development of effective data analysis methods, studying genomes was a laborious and unnecessary process. The human body has millions of chromosomes, each of which may code for a unique set of instructions. However, recent Data Science advancements in the fields of medicine and genetics have simplified this process. Analyzing human genomes now takes much less time and energy because to the many Data Science and Big Data techniques available. These methods aid scientists in identifying the underlying genetic problem and the corresponding medication.

D) Virtual Assistance: One excellent illustration of how Data Science may be put to use is seen in the development of apps with the use of virtual assistants. The work of data scientists has resulted in the creation of complete platforms that provide patients with individualized experiences. The patient's symptoms are analyzed by the medical apps that make use of data science in order to aid in the diagnosis of a condition. Simply having the patient input his or her symptoms into the program will allow it to make an accurate diagnosis of the patient's ailment and current status. According on the state of the patient, it will provide recommendations for any necessary precautions, medications, and treatments.

In addition, the software does an analysis on the patient's data and generates a checklist of the treatment methods that must be adhered to at all times. After that, it reminds the patient to take their medication at regular intervals. This helps to prevent the scenario of neglect, which might potentially make the illness much worse.

Patients suffering from Alzheimer's disease, anxiety, depression, and other psychological problems have also benefited from the usage of virtual aid, since its benefits have been shown to be beneficial. Because the application reminds these patients on a consistent basis to carry out the actions that are necessary, their therapy is beginning to bear fruit. Taking the appropriate medicine, being active, and eating well are all part of these efforts. Woebot, which was created at Stanford University, is an example of a virtual assistant that may help you out. It is a chatbot that assists individuals suffering from psychiatric diseases in obtaining the appropriate therapy in order to improve their mental health.

4. Data Science Problems In Retail

Although the phrase "customer analytics" is relatively new to the retail sector, the practice of analyzing data collected from consumers to provide them with tailored products and services is centuries old. The development of data science has made it simple to manage a growing number of customers. With the use of data science software, reductions and sales may be managed in real-time, which might boost sales of previously discontinued items and generate buzz for forthcoming releases. One further use of data science is to analyze the whole social media ecosystem to foresee which items will be popular in the near future so that they may be promoted to the market at the same time.

Data science is far from being complete. loaded with actual uses in the world today. Data science is still in its infancy, but its applications are already being felt throughout the globe. We have a long way to go before we reach saturation.

Steps on How to Approach and Address a Solution to Data Science Problems

Step 1: define the problem.

First things first, it is essential to precisely characterize the data issue that has to be addressed. The issue at hand need to be comprehensible, succinct, and quantifiable. When identifying data challenges, many businesses are far too general with their language, which makes it difficult, if not impossible, for data scientists to transform such problems into machine code. Below we will discuss a few most common data science problem statements and data science challenges.

The following is a list of fundamental qualities that describe a data issue as well-defined:

It seems probable that the solution to the issue will have a sufficient amount of positive effect to warrant the effort.
There is sufficient data accessible in a format that can be used.
The use of data science as a means of resolving the issue has garnered the attention of stakeholders.

Step 2: Types of Data Science Problem

There is a wide variety of data science algorithms that can be implemented on data, and they can be classified, to a certain extent, within the following families, below are the most common data science problems examples:

Two-class classification: Useful for any issue that can only have two responses, the two-class categorization consists of two distinct categories.
Multi-class classification: Providing an answer to a question that might have many different responses is an example of multi-class categorization.
Anomaly detection: The term "anomaly detection" refers to the process of locating data points that deviate from the norm.
Regression: When searching for a number as opposed to a class or category, regression is helpful since it provides an answer with a real-valued result.
Multi-class classification as regression: Useful when questions are posed in the form of rankings or comparisons, multi-class classification may be thought of as regression.
Two-class classification as regression: Useful for binary classification problems that can also be reformulated as regression, the two-class classification method is also referred to as regression analysis.
Clustering: The term "clustering" refers to the process of answering questions regarding the organization of data by attempting to partition a data set into understandable chunks.
Dimensionality reduction: It is the process of acquiring a set of major variables in order to lower the number of random variables that are being taken into account.
Reinforcement learning : The goal of the learning algorithms known as reinforcement learning is to perform actions within an environment in such a way as to maximize some concept of cumulative reward.

Step 3: Data Collection

Now that the issue has been articulated in its entirety and an appropriate solution has been chosen, it is time to gather data. It is important to record all of the data that has been gathered in a log, along with the date of each collection and any other pertinent information.

It is essential to understand that the data produced are rarely immediately available for analysis. The majority of a data scientist's day is dedicated to cleaning the data, which involves tasks such as eliminating records with missing values, locating records with duplicates, and correcting values that are wrong. It is one of the prominent data scientist problems.

Step 4: Data Analysis

Data analysis comes after data gathering and cleansing. At this point, there is a danger that the chosen data science strategy will fail. This is to be expected and anticipated. In general, it is advisable to begin by experimenting with all of the fundamental machine learning algorithms since they have fewer parameters to adjust.

There are several good open source data science libraries available for use in data analysis. The vast majority of data science tools are developed in Python, Java, or C++. Apart from this, many data science practice problems are available for free on web.

Step 5: Result Interpretation

Following the completion of the data analysis, the next step is to interpret the findings. Consideration of whether or not the primary issue has been resolved should take precedence over anything else. It's possible that you'll find out that your model works but generates results that aren't very good. Adding new data and continually retraining the model until one is pleased with it is one strategy for dealing with this situation.

Finalizing the Problem Statement

After identifying the precise issue type, you should be able to formulate a refined problem statement that includes the model's predictions. For instance:

This is a multi-class classification problem that predicts if a picture belongs to one of four classes: "vehicle," "traffic," "sign," and "human."

Additionally, you should be able to provide a desired result or intended use for the model prediction. Making a model accurate is one of the most crucial thing Data Scientists problems.

The optimal result is to offer quick notice to end users when a target class is predicted. One may practice such data science hackathon problem statements on Kaggle.

Gain the skills you need to excel in business analysis with ccba certification course . Start your journey today and open doors to limitless opportunities!

When professionals are working toward their analytics objectives, they may run across a variety of different kinds of data science challenegs, all of which slow down their progress. The stages that we've discussed in this article on how to tackle a new data science issue are designed to highlight the general problem-solving attitude that businesses need to adopt in order to effectively meet the problems of our present data-centric era.

Not only will a competent data science problem seek to make predictions, but it will also aim to make judgments. Always keep this overarching goal in mind while you think about the many challenges you are facing. You may combat the blues of data science with the aid of a detailed approach. In addition, engaging with professionals in the field of data science enables you to get insights, which ultimately results in the effective execution of the project. Have a look at KnowledgeHut’s Data Science Course Subjects to understand this matter in depth.

Frequently Asked Questions (FAQs)

The discipline of data science aims to provide answers to actual challenges faced by businesses by using data in the construction of algorithms and the development of programs that assist in demonstrating that certain issues have ideal solutions. Data science is the use of hybrid mathematical and computer science models to address real-world business challenges in order to get actionable insights.

There are many other platforms available for the same- Kaggle, KnowledgeHut, HackerEarth, MachineHack, Colab by Google, Datacamp etc.

Statistics, Coding, Business Intelligence, Data Structures, Mathematics, Machine Learning, and Algorithms are only few of the primary subjects that are covered in the Data Science curriculum.

Aspects of this profession may be stressful, but I imagine that's true of most jobs. I will provide an excellent illustration of a potential source of "stress" in the field of data science. doubt it's a tough time Data science is R&D, and there's always enough time to get everything done.

Stressful? No. Frustrating? Absolute yeah. It's really annoying that we often get trapped on an error for three days or have to ponder the question of what metrics to use a thousand times.

Ritesh Pratap Arjun Singh

RiteshPratap A. Singh is an AI & DeepTech Data Scientist. His research interests include machine vision and cognitive intelligence. He is known for leading innovative AI projects for large corporations and PSUs. Collaborate with him in the fields of AI/ML/DL, machine vision, bioinformatics, molecular genetics, and psychology.

Avail your free 1:1 mentorship session.

Something went wrong

Upcoming Data Science Batches & Dates

Student spotlight: Zoe Weinstein explores the problem-solving power of data science

Zoe Weinstein x’25 grew up on the outskirts of Silicon Valley, but her fascination with data science and technology didn’t emerge until she arrived in Wisconsin. And her decision to come to Madison in the first place was based on a fortuitous visit to another Midwest city, during which she learned some good news.

“I happened to find out I got into UW-Madison when I was visiting Chicago, and my mom asked me, ‘do you want to go drive up and check it out?’” Weinstein recalled. “When I got here, I absolutely fell in love with it—the city, the people, the campus. I knew I wanted to be here, even without looking at specific programs very thoroughly at first.”

But once she dove into possible majors, she discovered that UW-Madison had outstanding programs in two of her academic interests: data science and sociology. She decided to double major. Now a junior, Weinstein has embraced data science as more than just a major: she is on the executive board of DotData , the data science student club at UW-Madison, and has secured data-centric internships through the summer and fall.

Discovering data science

Weinstein hadn’t taken much interest in technology prior to college, but she knew she enjoyed and excelled at math. Even so, she was (at the time) unfamiliar with the formal discipline of data science.

“When I came here and found data science, it was something I had never heard of before,” she said. The Data Science major , established in 2020 within the Department of Statistics, teaches students how to apply computational and mathematical skills to data-centric problems in a variety of fields. Now three years into her studies, Weinstein said she particularly enjoys “ the problem-solving behind working with data and coding.”

According to Weinstein, becoming adept at coding in Python has been especially rewarding and useful.

“I never felt like I was great at learning new languages like Spanish in high school, but learning Python just felt so natural to me. The code read like English in my mind, and it helped me solve real problems.” -Zoe Weinstein

Weinstein highlighted a few memorable courses and instructors that have shaped her experience at UW-Madison. She said COMP SCI 220, Data Science Programming I, helped her grasp the foundations of coding for data scientists, especially in Python. Meanwhile, LIS 440, Navigating the Data Revolution, got Weinstein “super interested in data ethics and the almost philosophical side of data, she said.”

In addition, Weinstein said, Introduction to Artificial Intelligence (AI), COMP SCI 540, which she is taking this spring, is “absolutely one of my favorite classes.” Taught by Assistant Professor Frederic Sala , the course delves into AI-related concepts like machine learning and probabilistic reasoning with applications for data mining, natural language processing, and more.

“Fred Sala is an amazing professor,” Weinstein said. “I could not recommend his class more.”

On top of technical data- and computing-focused courses, Weinstein’s sociology major has enabled her to explore social and cultural issues through a data-centric lens. After taking SOC 120: Marriage and the Family, Weinstein said, “I was absolutely hooked.” Since then, she noted, “I’ve had the opportunity to use data science and analytical skills in my sociology classes, which has honestly been a huge advantage for me. In that way, the two majors definitely complement each other.”

DotData: a student community

After finding DotData through the Wisconsin Involvement Network, Weinstein decided to try it out. “I started going to some of the meetings and met a new friend, who was on the board. That kept me coming back, knowing there are people here that are friendly and familiar,” she said. When some of the board members announced they would be studying abroad this spring, spots opened up for new board members. She applied and won the position of secretary on the executive board, beginning her term over the winter break.

“I have learned so much about the club and about data science, and I’ve met so many of my peers who are Data Science or Computer Sciences majors,” Weinstein said. “It has been an amazing experience.”

In the secretary role, Weinstein produces the club’s newsletter, which keeps members apprised of meeting topics and other events, including their annual MadData Hackathon . “One of my favorite things about DotData is the annual hackathon,” she said. “This year we had more than 200 participants sign up and 35 projects submitted at the end of the day-long event,” which took place in February.

Participants in MadData work in groups to solve any problem they choose using real-world data. “It was very cool to see how many people are interested in data science and what kinds of ideas people were able to come up with,” Weinstein said, “We had groups of freshmen who have never coded before meeting people who are expert coders and creating the most amazing projects.” The winning project, Tech Trends , was created by students Shlok Desai and Muthu Ramnarayanan. Inspired by the challenge of navigating large amounts of technology news online, Tech Trends is a platform that uses AI-driven personalization to enable users to navigate the online technology news landscape with ease and enjoyment.

During the fall 2023 semester, Weinstein said she applied to “somewhere between 60 and 80 internships.” The first company to invite her to an in-person interview was Chicago-based CNA , one of the largest commercial property and casualty insurance companies in the country. CNA ultimately offered Weinstein a role as a Data Engineering Intern this coming summer. After that internship ends, she will transition to a role as Data Analytics Intern for the Wisconsin School of Business (WSB), where she will use her data and coding skill sets to uncover useful insights in the School’s datasets.

Looking ahead to after graduation, Weinstein is open to multiple potential paths. “I’ve always wanted to leverage my data science skills to make an impact,” she said, “whether I end up staying in industry or pursuing a career in social science research.” She is hopeful that her upcoming internships will help her discover which aspects of being a data professional she enjoys most.

Zoe Weinstein’s story illustrates how data science enables students to engage in interdisciplinary problem-solving. Through the Data Science major and the DotData club, Weinstein—along with hundreds of fellow students, faculty, and staff—is an active participant in the fast-growing UW-Madison data science community.

To learn more about the Data Science major, visit its website .

For more information about DotData, visit its website .

Semiconductors at scale: New processor achieves remarkable speedup in problem solving

A nnealing processors are designed specifically for addressing combinatorial optimization problems, where the task is to find the best solution from a finite set of possibilities. This holds implications for practical applications in logistics, resource allocation, and the discovery of drugs and materials.

In the context of CMOS (a type of semiconductor technology), it is necessary for the components of annealing processors to be fully "coupled." However, the complexity of this coupling directly affects the scalability of the processors.

In a new IEEE Access study led by Professor Takayuki Kawahara from Tokyo University of Science, researchers have developed and successfully tested a scalable processor that divides the calculation into multiple LSI chips. The innovation was also presented in IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI 2024) on 25 January 2024.

According to Prof. Kawahara, "We want to achieve advanced information processing directly at the edge, rather than in the cloud or performing preprocessing at the edge for the cloud. Using the unique processing architecture announced by the Tokyo University of Science in 2020, we have realized a fully coupled LSI (Large Scale Integration) on one chip using 28nm CMOS technology. Furthermore, we devised a scalable method with parallel-operating chips and demonstrated its feasibility using FPGAs (Field-Programmable Gate Arrays) in 2022."

The team created a scalable annealing processor. It used 36 22nm CMOS calculation LSI (Large Scale Integration) chips and one control FPGA. This technology enables the construction of large-scale, fully coupled semiconductor systems following the Ising model (a mathematical model of magnetic systems) with 4096 spins.

The processor incorporates two distinct technologies developed at the Tokyo University of Science. This includes a "spin thread method" that enables 8 parallel solution searches, coupled with a technique that reduces chip requirements by about half compared to conventional methods. Its power needs are also modest, operating at 10MHz with a power consumption of 2.9W (1.3W for the core part). This was practically confirmed using a vertex cover problem with 4096 vertices.

In terms of power performance ratio, the processor outperformed simulating a fully coupled Ising system on a PC (i7, 3.6GHz) using annealing emulation by 2,306 times. Additionally, it surpassed the core CPU and arithmetic chip by 2,186 times.

The successful machine verification of this processor suggests the possibility of enhanced capacity. According to Prof. Kawahara, who holds a vision for the social implementation of this technology (such as initiating a business, joint research, and technology transfer), "In the future, we will develop this technology for a joint research effort targeting an LSI system with the computing power of a 2050-level quantum computer for solving combinatorial optimization problems."

"The goal is to achieve this without the need for air conditioning, large equipment, or cloud infrastructure using current semiconductor processes. Specifically, we would like to achieve 2M (million) spins by 2030 and explore the creation of new digital industries using this."

In summary, researchers have developed a scalable, fully coupled annealing processor incorporating 4096 spins on a single board with 36 CMOS chips. Key innovations, including chip reduction and parallel operations for simultaneous solution searches, played a crucial role in this development.

More information: Taichi Megumi et al, Scalable Fully-Coupled Annealing Processing System Implementing 4096 Spins Using 22nm CMOS LSI, IEEE Access (2024). DOI: 10.1109/ACCESS.2024.3360034

Provided by Tokyo University of Science

(a) The die photo of a 22nm fully-coupled Ising LSI chip; (b) the front and back views of the board of a 4096-spin scalable full- coupled Ising LSI system. Credit: Takayuki Kawahara from TUS

Help | Advanced Search

Computer Science > Computation and Language

Title: can language models solve olympiad programming.

Abstract: Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate language models (LMs). In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests, reference code, and official analyses for each problem. These resources enable us to construct and test a range of LM inference methods for competitive programming for the first time. We find GPT-4 only achieves a 8.7% pass@1 accuracy with zero-shot chain-of-thought prompting, and our best inference method improves it to 20.2% using a combination of self-reflection and retrieval over episodic knowledge. However, this is far from solving the benchmark. To better understand the remaining challenges, we design a novel human-in-the-loop study and surprisingly find that a small number of targeted hints enable GPT-4 to solve 13 out of 15 problems previously unsolvable by any model and method. Our benchmark, baseline methods, quantitative results, and qualitative analysis serve as an initial step toward LMs with grounded, creative, and algorithmic reasoning.

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Solve calculation by cellphone 4+

Studying problem by calculator, mutsumi suzuki, designed for ipad.

Offers In-App Purchases

Screenshots

Description.

【SAT/ACT】【GMAT】【CFA】【USCPA】【FE/PE】　... And so on! A fill-in-the-blanks type learning application specialized for learning calculation problems using a calculator is now available! For certification exams where you want to practice questions using a calculator, this app makes it easy to start learning on your smartphone! With this app, you can take a picture of a reference book, problem set, or explanation with the camera, process it into a fill-in-the-blank format, and start learning right away using the in-app calculator! 　Learning method: Two types of learning modes are available: Tap mode and Input mode. 　Fill-in-the-blanks mode: Tap to display the filled-in area, with the answer hidden by a fill-in-the-blank. 　Input mode: Enter numerical values in the filled-in areas and check your answers at the end. 　　Fill-in-the-blanks mode is for studying how to solve puzzles easily. 　　Fill-in-the-blanks mode is for studying in earnest in preparation for the real exam. 　You can change the learning method according to your learning steps and time availability. 　In-app calculator: In the input mode, you can use the calculator in the app to enter numbers, but it is difficult to type numbers on a calculator on a smartphone. 　So, you can register numerical values to fill in the holes 　When in tap mode... Tap a fill-in-the-blank and the calculator will instantly reflect the registered numerical value. 　In input mode: ...... Tap a fill-in-the-blank to instantly reflect the value you have entered into the calculator. 　Reduce the number of times you have to type on the calculator and focus on learning how to solve math problems. 　Start learning instantly: Numerical value recognition automatically creates fill-in-the-blanks where you want to hide numbers. 　You can also manually add as many fill-in-the-blanks as you like, and change their size and position. 　The recognized numerical values are automatically registered as fill-in-the-blanks. 　Offline support: Once you create a question in the application, you can study it even without an Internet connection. 　(Please note that some functions, such as image recognition, will not be available.) ---------About Subscription Plans--------- In the free version... 　△ Create up to 3 questions... 　△ Add up to 5 pages per question... 　△ Advertisements are displayed at the top of the screen and in other places... Subscribe to our plans 　- Unlimited number of questions! 　- Unlimited number of additional pages! 　- No ads! You can add unlimited questions and pages to your plan without any advertising! (However, if you delete a question or page from the menu, you will not be able to restore it.) You can subscribe to the plan only when you want to add more questions and concentrate on learning. -Restoration when changing the device model When you change your device, you can restore your previously purchased subscription plan for free as long as you are still within the subscription period. -Confirmation and cancellation Open the "Settings" app from your iPhone-> Select Apple ID-> Select "Subscriptions"-> From this screen you can check the next auto-renewal timing and cancel/set auto-renewal. -About auto-renewal billing If you do not cancel the auto-renewal at least 24 hours before the end of the subscription period, the subscription period will be automatically renewed. Auto renewal billing will be performed within 24 hours of the end of the subscription period. -Notes If you have been billed in the application, you cannot cancel by any method other than those listed above. Cancellation of the current month's subscription will not be accepted. Billing is done via your iTunes account. Privacy Policy and Terms of Use Please refer to the link below or from the menu in the application. Privacy Policy https://mutsumi-suzuki-developpage.web.app Terms of Use https://mutsumi-suzuki-developpage.web.app?section=terms

Version 3.0.0

1. iPad is now supported. 2. The design and UI of the study page and question edit page have been changed as follows Study page -Image pages can now be scrolled horizontally. -The bottom navigation at the bottom of the screen can now be used to navigate through image pages. -The position of the in-app calculator can now be adjusted. Problem Editing Page -Fill-in-the-blanks can now be resized by dragging the corners. -When adding an image page by camera, documents can now be cropped and their orientation can be corrected. 3. In addition, dark mode is now supported and a contact form has been set up.

App Privacy

The developer, MUTSUMI SUZUKI , indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy .

Data Used to Track You

The following data may be used to track you across apps and websites owned by other companies:

Identifiers
Diagnostics

Data Linked to You

The following data may be collected and linked to your identity:

Privacy practices may vary, for example, based on the features you use or your age. Learn More

Information

English, Japanese

Unlimited number of problems $1.99
Developer Website
App Support
Privacy Policy

More By This Developer

Scan Sum -total values scanned

Description -education support

VLOOKUP spreadsheet master

Copy Paste list-easy operate

Household-Review fixed cost

全商1級・日商2級原価計算・工業簿記-実践問題で合格へ近道

All-in-1 Books & Keys Class 12

Bidder Database

CPA FAR Exam Expert

Learn Math : Science Practice

Solver AI: Snap, Solve & Pass

Perfect Day: Parent

IMAGES

Problems Solved by Data Science
The 5 Steps of Problem Solving
Problem solving infographic 10 steps concept Vector Image
5 step problem solving method
Draw A Map Showing The Problem Solving Process
What Is Problem-Solving? Steps, Processes, Exercises to do it Right

VIDEO

Python For Data Science || Exam Preparation Part 2 || My Swayam || July 2023
SQL Interview Questions- Part 7 by Data Analyst Duo
Problem-Solving Skills for Data Scientist✅🔥 #datascience #datascientist
Python For Data Science Week 1 || NPTEL Answers || My Swayam || July 2023
Python For Data Science Week 3 || NPTEL Answers || My Swayam || Jan 2024
Python For Data Science Week 3 || NPTEL Answers || My Swayam || July 2023

COMMENTS

The Art of Solving Any Data Science Problem
Most people interested in data science learn about tools and technology to solve data science problems. They are absolutely necessary to build a solution. But, remember, it is just not enough. ... Problem Definition: The very first step in solving a data science problem is understanding the problem. A framework like First-Principle Thinking and ...
Solving data problems: A beginner's guide
Break down problems into small steps. One of the essential strategies for problem-solving is to break down the problem into the smallest steps possible — atomic steps. Try to describe every single step. Don't write any code or start your search for the magic formula. Make notes in plain language.
Data Science Case Studies: Solved and Explained
4 min read. ·. Feb 21, 2021. 1. Solving a Data Science case study means analyzing and solving a problem statement intensively. Solving case studies will help you show unique and amazing data ...
5 Structured Thinking Techniques for Data Scientists
Structured thinking is a framework for solving unstructured problems — which covers just about all data science problems. Using a structured approach to solve problems not only only helps solve problems faster but also helps identify the parts of the problem that may need some extra attention. ... The eight disciplines of problem solving ...
Defining A Data Science Problem
According to Cameron Warren, in his Towards Data Science article Don't Do Data Science, Solve Business Problems, "…the number one most important skill for a Data Scientist above any technical expertise — [is] the ability to clearly evaluate and define a problem.". As a data scientist you will routinely discover or be presented with problems to solve.
How To Learn To Solve Data Science Problems Like a Pro and Become
The data science teams work on solving many business problems. The approach for each one of them will be very different. Yet, the core skills required to solve a problem don't change much. The majority of real-life data science problems belong to one of the below categories. Regression; Classification; Clustering; Recommendation; NLP ...
Problem Solving as Data Scientist: a Case Study
The general problem-solving flow with the right mindset. The data science area is quite broad and designing algorithmic data products is only part of many potential projects. Other commonly-seen data science projects are experimentation design, causal inference, deep-dive analysis to drive strategic changes, etc.
9 Steps for Solving Data Science Problems
Step 3: Exploratory Data Analysis. Once the data is cleaned, it is important to understand the data by taking a bird's eye view. I normally look at the data types of the variables first and see if they match with what they should be. The second step that I do is looking at the distribution of variables of interest.
Framing Data Science Problems the Right Way From the Start
Toward Better Problem Definition. Data science uses the scientific method to solve often complex (or multifaceted) and unstructured problems using data and analytics. In analytics, the term fishing expedition refers to a project that was never framed correctly to begin with and involves trolling the data for unexpected correlations.
Solutions to Data Science Problems
Two tasks are critical in solving a well-defined problem in data science, namely, the methods to find solutions and the evaluation and assessment of their quality [ 21 ]. This section describes methods and metrics to quantify and evaluate the quality of solutions. Methods include training, testing, and cross-validation.
Doing Data Science: A Framework and Case Study
Without applications (problems), doing data science would not exist. Our data science framework and research processes are fundamentally tied to practical problem solving and can be used in diverse settings. We provide a case study of using local data to address questions raised by county officials.
Key skills for aspiring data scientists: Problem solving and the
Key skills for aspiring data scientists: Problem solving and the scientific method. 15 Oct 2020. This blog is part two of our 'Data science skills' series, which takes a detailed look at the skills aspiring data scientists need to ace interviews, get exciting projects, and progress in the industry. You can find the other blogs in our series ...
Data Science Solutions: Applications and Use Cases
Data Science is a broad field with many potential applications. It's not just about analyzing data and modeling algorithms, but it also reinvents the way businesses operate and how different departments interact. ... Data scientists solve complex problems every day, leveraging a variety of Data Science solutions to tackle issues like ...
Real-World Problems, and How Data Helps Us Solve Them
Nov 23, 2023. With the constant buzz around new tools and cutting-edge models, it's easy to lose sight of a basic truth: the real value in leveraging data lies in its ability to bring about tangible positive change. Whether it's around complex business decisions or our everyday routines, data-informed solutions are only as good as the ...
Apply Data Science Problem-Solving Techniques
In data science, solving complex problems is a core aspect of the job. Whether you're trying to interpret data, build predictive models, or create algorithms, problem-solving techniques are your ...
Data Science Projects That Can Help You Solve Real World Problems
The best way to learn Data Science is by solving real-world problems with the data and building your own portfolio. In this article, we will discuss three projects that you can work on to build your portfolio and impress interviewers. By Nate Rosidi, KDnuggets Market Trends & SQL Content Specialist on November 30, 2022 in Data Science. Image by ...
Solving Problems with Data Science
The vast majority of the problems we face at Viget can't or shouldn't be solved by a lone data scientist because we are solving business problems. Our data scientists team up with UXers, designers, developers, project managers, and hardware developers to develop digital strategies and solving data science problems is one part of that ...
Free Practice Exams
A selection of practice exams that will test your current data science knowledge. Identify key areas of improvement to strengthen your theoretical preparation, critical thinking, and practical problem-solving skills so you can get one step closer to realizing your professional goals.
Chapter 1 Problem Solving with Data
1.1 Introduction. This chapter will introduce you to a general approach to solving problems and answering questions using data. Throughout the rest of the module, we will reference back to this chapter as you work your way through your own data analysis exercises. The approach is applicable to actuaries, data scientists, general data analysts ...
Solving Data Science Problems
Solving Data Science Problems. by Morgan Stevens December 16, 2022. Researchers at the University of Hong Kong, Peking University, Stanford University, the University of California, Berkeley, the University of Washington, Carnegie Mellon University, and Meta have created a dataset of 1,000 data science questions from 451 problems found on Stack ...
Data Science Problems
This will provide students with a real-life data science problem. The chapter presents an in-depth description of the required statistics for data science researchers and IT professionals. A step-by-step approach to problem-solving is presented clearly and accurately to the reader with examples for better understanding. Case studies will assist ...
Common Data Science Challenges of 2024 [with Solution]
Steps on How to Approach and Address a Solution to Data Science Problems. Step 1: Define the Problem. First things first, it is essential to precisely characterize the data issue that has to be addressed. The issue at hand need to be comprehensible, succinct, and quantifiable.
Top Data Scientist Interview Questions and Tips
Typically, these questions involve data manipulation using code devised to test your programming, problem-solving, and innovation skills. During the interview, you'll likely be required to use a computer or whiteboard to complete the questions, or you may be asked to talk through the problem verbally and explain your thought process.
Student spotlight: Zoe Weinstein explores the problem-solving power of
The Data Science major, established in 2020 within the Department of Statistics, teaches students how to apply computational and mathematical skills to data-centric problems in a variety of fields. Now three years into her studies, Weinstein said she particularly enjoys " the problem-solving behind working with data and coding."
How to Encode Constraints to the Output of Neural Networks
Extended Sinkhorn with multi-set marginals. We discover that the Sinkhorn algorithm can generalize to multiple sets of marginals. Recall that Γᵢ ⱼ ∈ [0,1] means the proportion of uⱼ moved to vᵢ.Interestingly, it yields the same formulation if we simply replace u, v with another set of marginal distributions, suggesting the potential of extending the Sinkhorn algorithm to multiple ...
Learning from Offline and Online Experiences: A Hybrid Adaptive
In many practical applications, usually, similar optimisation problems or scenarios repeatedly appear. Learning from previous problem-solving experiences can help adjust algorithm components of meta-heuristics, e.g., adaptively selecting promising search operators, to achieve better optimisation performance. However, those experiences obtained from previously solved problems, namely offline ...
7 Things Students Are Missing in a Data Science Resume
6. Adaptability and Problem Solving Skills. The field of data science is continually evolving, and employers are seeking candidates who can adapt to new challenges and technologies. As a data scientist, you may find yourself jumping from being a data analyst to a machine learning engineer in just a few months.
Semiconductors at scale: New processor achieves remarkable ...
Semiconductors at scale: New processor achieves remarkable speedup in problem solving. (a) The die photo of a 22nm fully-coupled Ising LSI chip; (b) the front and back views of the board of a 4096 ...
[2404.10952v1] Can Language Models Solve Olympiad Programming?
Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate language models (LMs). In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests ...
Solve calculation by cellphone 4+
Reduce the number of times you have to type on the calculator and focus on learning how to solve math problems. Start learning instantly: Numerical value recognition automatically creates fill-in-the-blanks where you want to hide numbers. You can also manually add as many fill-in-the-blanks as you like, and change their size and position.