case study for python

Using Python for Research

Take your introductory knowledge of Python programming to the next level and learn how to use Python 3 for your research.

Associated Schools

Harvard T.H. Chan School of Public Health

What you'll learn.

Python 3 programming basics (a review)

Python tools (e.g., NumPy and SciPy modules) for research applications

How to apply Python research tools in practical settings

Course description

This course bridges the gap between introductory and advanced courses in Python. While there are many excellent introductory Python courses available, most typically do not go deep enough for you to apply your Python skills to research projects. In this course, after first reviewing the basics of Python 3, we learn about tools commonly used in research settings.

Using a combination of a guided introduction and more independent in-depth exploration, you will get to practice your new Python skills with various case studies chosen for their scientific breadth and their coverage of different Python features. This run of the course includes revised assessments and a new module on machine learning.

Course Outline

Python Basics

Review of basic Python 3 language concepts and syntax.

Python Research Tools

Introduction to Python modules commonly used in scientific computation, such as NumPy.

Case Studies

This collection of six case studies from different disciplines provides opportunities to practice Python research skills.

Statistical Learning

Exploration of statistical learning using the scikit-learn library followed by a two-part case study that allows you to further practice your coding skills.

Instructors

Jukka-Pekka Onnela

CS50's Web Programming with Python and JavaScript

This course picks up where CS50 leaves off, diving more deeply into the design and implementation of web apps with Python, JavaScript, and SQL using frameworks like Django, React, and Bootstrap.

CS50: Introduction to Computer Science

An introduction to the intellectual enterprises of computer science and the art of programming.

CS50's Understanding Technology

This is CS50’s introduction to technology for students who don’t (yet!) consider themselves computer persons.

Join our list to learn more

Python Case Studies

Introduction, introduction #.

This project is a collection of six captivating case studies that use Python and computational techniques to analyse data, build classification models and unravel insights on multifaceted datasets. The array of topics touch on different domains of knowledge.

Central to these studies is the application of tools and concepts. Ranging from working with fundamental Python objects like dataframes and numpy arrays, to using advanced tools like scikit-learn. The case studies deal with different topics - working with bird migration data to untable mysteries of flight, to deciphering social network dynamics among villagers, exploring diverse whisky flavors, and analyzing book data.

This project has lead me to acquire new technical skills, develope my analytical thinking and solidify my problem-solving skills while using Python’s potential to extract knowledge from diverse datasets.

Here is the table of contents:

Exploring DNA Sequencing and Analysis
Linguistic Analysis of Books: A Study of Language Variability
Predictive Insights through Classification Analysis
Analysis of Scotch Whisky Production and Flavor Profiles
Analyzing Bird Migration Patterns with GPS Data
Deep Dive into Network Analysis and Social Relationships
Bibliography

Data Science Case Studies: Solved using Python

February 19, 2021
Machine Learning

Solving a Data Science case study means analyzing and solving a problem statement intensively. Solving case studies will help you show unique and amazing data science use cases in your portfolio. In this article, I’m going to introduce you to 3 data science case studies solved and explained using Python.

Data Science Case Studies

If you’ve learned data science by taking a course or certification program, you’re still not that close to finding a job easily. The most important point of your Data Science interview is to show how you can use your skills in real use cases. Below are 3 data science case studies that will help you understand how to analyze and solve a problem. All of the data science case studies mentioned below are solved and explained using Python.

Case Study 1: Text Emotions Detection

If you are one of them who is having an interest in natural language processing then this use case is for you. The idea is to train a machine learning model to generate emojis based on an input text. Then this machine learning model can be used in training Artificial Intelligent Chatbots.

Use Case: A human can express his emotions in any form, such as the face, gestures, speech and text. The detection of text emotions is a content-based classification problem. Detecting a person’s emotions is a difficult task, but detecting the emotions using text written by a person is even more difficult as a human can express his emotions in any form.

Recognizing this type of emotion from a text written by a person plays an important role in applications such as chatbots, customer support forum, customer reviews etc. So you have to train a machine learning model that can identify the emotion of a text by presenting the most relevant emoji according to the input text.

Case Study 2: Hotel Recommendation System

A hotel recommendation system typically works on collaborative filtering that makes recommendations based on ratings given by other customers in the same category as the user looking for a product.

Use Case: We all plan trips and the first thing to do when planning a trip is finding a hotel. There are so many websites recommending the best hotel for our trip. A hotel recommendation system aims to predict which hotel a user is most likely to choose from among all hotels. So to build this type of system which will help the user to book the best hotel out of all the other hotels. We can do this using customer reviews.

For example, suppose you want to go on a business trip, so the hotel recommendation system should show you the hotels that other customers have rated best for business travel. It is therefore also our approach to build a recommendation system based on customer reviews and ratings. So use the ratings and reviews given by customers who belong to the same category as the user and build a hotel recommendation system.

Case Study 3: Customer Personality Analysis

The analysis of customers is one of the most important roles that a data scientist has to do who is working at a product based company. So if you are someone who wants to join a product based company then this data science case study is best for you.

Use Case: Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviours and concerns of different types of customers.

You have to do an analysis which should help a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

So these three data science case studies are based on real-world problems, starting with the first; Text Emotions Detection, it is completely based on natural language processing and the machine learning model trained by you will be used in training an AI chatbot. The second use case; Hotel Recommendation System, is also based on NLP, but here you will understand how to generate recommendations using collaborative filtering. The last use case; customer personality analysis, is based on someone who wants to focus on the analysis part.

All these data science case studies are solved using Python, here are the resources where you will find these use cases solved and explained:

Text Emotions Detection
Hotel Recommendation System
Customer Personality Analysis

I hope you liked this article on data science case studies solved and explained using the Python programming language. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Recommended For You

Companies Offering Data Science Jobs

May 31, 2024

Time Series Techniques You Should Know

May 28, 2024

Companies Offering Data Science Internships

May 24, 2024

Cricket Analytics Project Ideas

May 23, 2024

One comment

[…] there is no need for any academic or professional qualifications, you should have projects based on practical use cases in your portfolio to get your first data science […]

Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy .

By clicking "Accept" or further use of this website, you agree to allow cookies.

Data Science
Data Analytics
Machine Learning

Essential Statistics for Data Science: A Case Study using Python, Part I

Get to know some of the essential statistics you should be very familiar with when learning data science

LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.

Our last post dove straight into linear regression. In this post, we'll take a step back to cover essential statistics that every data scientist should know. To demonstrate these essentials, we'll look at a hypothetical case study involving an administrator tasked with improving school performance in Tennessee.

You should already know:

Python fundamentals — learn on dataquest.io

Note, this tutorial is intended to serve solely as an educational tool and not as a scientific explanation of the causes of various school outcomes in Tennessee .

Article Resources

Notebook and Data: Github
Libraries: pandas, matplotlib, seaborn

Introduction

Meet Sally, a public school administrator. Some schools in her state of Tennessee are performing below average academically. Her superintendent, under pressure from frustrated parents and voters, approached Sally with the task of understanding why these schools are under-performing. Not an easy problem, to be sure.

To improve school performance, Sally needs to learn more about these schools and their students, just as a business needs to understand its own strengths and weaknesses and its customers.

Though Sally is eager to build an impressive explanatory model, she knows the importance of conducting preliminary research to prevent possible pitfalls or blind spots (e.g. cognitive bias'). Thus, she engages in a thorough exploratory analysis, which includes: a lit review, data collection, descriptive and inferential statistics, and data visualization.

Sally has strong opinions as to why some schools are under-performing, but opinions won't do, nor will a handful of facts; she needs rigorous statistical evidence.

Sally conducts a lit review, which involves reading a variety of credible sources to familiarize herself with the topic. Most importantly, Sally keeps an open mind and embraces a scientific world view to help her resist confirmation bias (seeking solely to confirm one's own world view).

In Sally's lit review, she finds multiple compelling explanations of school performance: curriculae , income , and parental involvement . These sources will help Sally select her model and data, and will guide her interpretation of the results.

Data Collection

The data we want isn't always available, but Sally lucks out and finds student performance data based on test scores ( school_rating ) for every public school in middle Tennessee. The data also includes various demographic, school faculty, and income variables (see readme for more information). Satisfied with this dataset, she writes a web-scraper to retrieve the data.

But data alone can't help Sally; she needs to convert the data into useful information.

Descriptive and Inferential Statistics

Sally opens her stats textbook and finds that there are two major types of statistics, descriptive and inferential.

Descriptive statistics identify patterns in the data, but they don't allow for making hypotheses about the data.

Within descriptive statistics, there are two measures used to describe the data: central tendency and deviation . Central tendency refers to the central position of the data (mean, median, mode) while the deviation describes how far spread out the data are from the mean. Deviation is most commonly measured with the standard deviation. A small standard deviation indicates the data are close to the mean, while a large standard deviation indicates that the data are more spread out from the mean.

Inferential statistics allow us to make hypotheses (or inferences ) about a sample that can be applied to the population. For Sally, this involves developing a hypothesis about her sample of middle Tennessee schools and applying it to her population of all schools in Tennessee.

For now, Sally puts aside inferential statistics and digs into descriptive statistics.

To begin learning about the sample, Sally uses pandas' describe method, as seen below. The column headers in bold text represent the variables Sally will be exploring. Each row header represents a descriptive statistic about the corresponding column.

Looking at the output above, Sally's variables can be put into two classes: measurements and indicators.

Measurements are variables that can be quantified. All data in the output above are measurements. Some of these measurements, such as state_percentile_16 , avg_score_16 and school_rating , are outcomes; these outcomes cannot be used to explain one another. For example, explaining school_rating as a result of state_percentile_16 (test scores) is circular logic. Therefore we need a second class of variables.

The second class, indicators, are used to explain our outcomes. Sally chooses indicators that describe the student body (for example, reduced_lunch ) or school administration ( stu_teach_ratio ) hoping they will explain school_rating .

Sally sees a pattern in one of the indicators, reduced_lunch . reduced_lunch is a variable measuring the average percentage of students per school enrolled in a federal program that provides lunches for students from lower-income households. In short, reduced_lunch is a good proxy for household income, which Sally remembers from her lit review was correlated with school performance.

Sally isolates reduced_lunch and groups the data by school_rating using pandas' groupby method and then uses describe on the re-shaped data (see below).

Below is a discussion of the metrics from the table above and what each result indicates about the relationship between school_rating and reduced_lunch :

count : the number of schools at each rating. Most of the schools in Sally's sample have a 4- or 5-star rating, but 25% of schools have a 1-star rating or below. This confirms that poor school performance isn't merely anecdotal, but a serious problem that deserves attention.

mean : the average percentage of students on reduced_lunch among all schools by each school_rating . As school performance increases, the average number of students on reduced lunch decreases. Schools with a 0-star rating have 83.6% of students on reduced lunch. And on the other end of the spectrum, 5-star schools on average have 21.6% of students on reduced lunch. We'll examine this pattern further. in the graphing section.

std : the standard deviation of the variable. Referring to the school_rating of 0, a standard deviation of 8.813498 indicates that 68.2% (refer to readme ) of all observations are within 8.81 percentage points on either side of the average, 83.6%. Note that the standard deviation increases as school_rating increases, indicating that reduced_lunch loses explanatory power as school performance improves. As with the mean, we'll explore this idea further in the graphing section.

min : the minimum value of the variable. This represents the school with the lowest percentage of students on reduced lunch at each school rating. For 0- and 1-star schools, the minimum percentage of students on reduced lunch is 53%. The minimum for 5-star schools is 2%. The minimum value tells a similar story as the mean, but looking at it from the low end of the range of observations.

25% : the bottom quartile; represents the lowest 25% of values for the variable, reduced_lunch . For 0-star schools, 25% of the observations are less than 79.5%. Sally sees the same trend in the bottom quartile as the above metrics: as school_rating increases the bottom 25% of reduced_lunch decreases.

50% : the second quartile; represents the lowest 50% of values. Looking at the trend in school_rating and reduced_lunch , the same relationship is present here.

75% : the top quartile; represents the lowest 75% of values. The trend continues.

max : the maximum value for that variable. You guessed it: the trend continues!

The descriptive statistics consistently reveal that schools with more students on reduced lunch under-perform when compared to their peers. Sally is on to something.

Sally decides to look at reduced_lunch from another angle using a correlation matrix with pandas' corr method. The values in the correlation matrix table will be between -1 and 1 (see below). A value of -1 indicates the strongest possible negative correlation, meaning as one variable decreases the other increases. And a value of 1 indicates the opposite. The result below, -0.815757, indicates strong negative correlation between reduced_lunch and school_rating . There's clearly a relationship between the two variables.

Sally continues to explore this relationship graphically.

Essential Graphs for Exploring Data

Box-and-whisker plot.

In her stats book, Sally sees a box-and-whisker plot . A box-and-whisker plot is helpful for visualizing the distribution of the data from the mean. Understanding the distribution allows Sally to understand how far spread out her data is from the mean; the larger the spread from the mean, the less robust reduced_lunch is at explaining school_rating .

See below for an explanation of the box-and-whisker plot.

Now that Sally knows how to read the box-and-whisker plot, she graphs reduced_lunch to see the distributions. See below.

In her box-and-whisker plots, Sally sees that the minimum and maximum reduced_lunch values tend to get closer to the mean as school_rating decreases; that is, as school_rating decreases so does the standard deviation in reduced_lunch .

What does this mean?

Starting with the top box-and-whisker plot, as school_rating decreases, reduced_lunch becomes a more powerful way to explain outcomes. This could be because as parents' incomes decrease they have fewer resources to devote to their children's education (such as, after-school programs, tutors, time spent on homework, computer camps, etc) than higher-income parents. Above a 3-star rating, more predictors are needed to explain school_rating due to an increasing spread in reduced_lunch .

Having used box-and-whisker plots to reaffirm her idea that household income and school performance are related, Sally seeks further validation.

Scatter Plot

To further examine the relationship between school_rating and reduced_lunch , Sally graphs the two variables on a scatter plot. See below.

In the scatter plot above, each dot represents a school. The placement of the dot represents that school's rating (Y-axis) and the percentage of its students on reduced lunch (x-axis).

The downward trend line shows the negative correlation between school_rating and reduced_lunch (as one increases, the other decreases). The slope of the trend line indicates how much school_rating decreases as reduced_lunch increases. A steeper slope would indicate that a small change in reduced_lunch has a big impact on school_rating while a more horizontal slope would indicate that the same small change in reduced_lunch has a smaller impact on school_rating .

Sally notices that the scatter plot further supports what she saw with the box-and-whisker plot: when reduced_lunch increases, school_rating decreases. The tighter spread of the data as school_rating declines indicates the increasing influence of reduced_lunch . Now she has a hypothesis.

Correlation Matrix

Sally is ready to test her hypothesis: a negative relationship exists between school_rating and reduced_lunch (to be covered in a follow up article). If the test is successful, she'll need to build a more robust model using additional variables. If the test fails, she'll need to re-visit her dataset to choose other variables that possibly explain school_rating . Either way, Sally could benefit from an efficient way of assessing relationships among her variables.

An efficient graph for assessing relationships is the correlation matrix, as seen below; its color-coded cells make it easier to interpret than the tabular correlation matrix above. Red cells indicate positive correlation; blue cells indicate negative correlation; white cells indicate no correlation. The darker the colors, the stronger the correlation (positive or negative) between those two variables.

With the correlation matrix in mind as a future starting point for finding additional variables, Sally moves on for now and prepares to test her hypothesis.

Sally was approached with a problem: why are some schools in middle Tennessee under-performing? To answer this question, she did the following:

Conducted a lit review to educate herself on the topic.
Gathered data from a reputable source to explore school ratings and characteristics of the student bodies and schools in middle Tennessee.
The data indicated a robust relationship between school_rating and reduced_lunch .
Explored the data visually.
Though satisfied with her preliminary findings, Sally is keeping her mind open to other explanations.
Developed a hypothesis: a negative relationship exists between school_rating and reduced_lunch .

In a follow up article, Sally will test her hypothesis. Should she find a satisfactory explanation for her sample of schools, she will attempt to apply her explanation to the population of schools in Tennessee.

Course Recommendations

Further learning:, applied data science with python — coursera, statistics and data science micromasters — edx, get updates in your inbox.

Join over 7,500 data science learners.

Recent articles:

The 9 best ai courses for 2024 (and two to avoid), the 6 best python courses for 2024 – ranked by software engineer, best course deals for black friday and cyber monday 2024, sigmoid function, 7 best artificial intelligence (ai) courses.

Top courses you can take today to begin your journey into the Artificial Intelligence field.

Humanities Data Analysis: Case Studies with Python

Humanities data analysis: case studies with python #.

Humanities Data Analysis: Case Studies with Python is a practical guide to data-intensive humanities research using the Python programming language. The book, written by Folgert Karsdorp , Mike Kestemont and Allen Riddell , was originally published with Princeton University Press in 2021 (for a printed version of the book, see the publisher’s website ), and is now available as an Open Access interactive Juptyer Book.

The book begins with an overview of the place of data science in the humanities, and proceeds to cover data carpentry: the essential techniques for gathering, cleaning, representing, and transforming textual and tabular data. Then, drawing from real-world, publicly available data sets that cover a variety of scholarly domains, the book delves into detailed case studies. Focusing on textual data analysis, the authors explore such diverse topics as network analysis, genre theory, onomastics, literacy, author attribution, mapping, stylometry, topic modeling, and time series analysis. Exercises and resources for further reading are provided at the end of each chapter.

What is the book about?

Learn to how effectively gather, read, store and parse different data formats, such as CSV , XML , HTML , PDF , and JSON data.

Construct Vector Space Models for texts and represent data in a tabular format. Learn how use these and other representations (such as topics ) to assess similarities and distances between texts.

Emphasizes visual storytelling via data visualizations of character networks , patterns of cultural change , statistical distributions , and (shifts in) geographical distributions .

Work on real-world case studies using publicly available data sets. Dive into the world of historical cookbooks , French drama , Danish folktale collections , the Tate art gallery , mysterious medieval manuscripts , and many more.

Accompanying Data #

The book features a large number of quality datasets. These datasets are published online and are associated with the DOI 10.5281/zenodo.891264 . They can be downloaded from the address https://doi.org/10.5281/zenodo.891264 .

Citing HDA #

If you use Humanities Data Analysis in an academic publication, please cite the original publication:

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications You must be signed in to change notification settings

An introduction to the practicing neuroscientist to data analysis in Python

Mark-Kramer/Case-Studies-Python

Folders and files, repository files navigation, case-studies-python.

This repository is a companion to the textbook Case Studies in Neural Data Analysis , by Mark Kramer and Uri Eden. That textbook uses MATLAB to analyze examples of neuronal data. The material here is similar, except that we use Python.

The intended audience is the practicing neuroscientist - e.g., the students, researchers, and clinicians collecting neuronal data in the hospital or lab. The material can get pretty math-heavy, but we've tried to outline the main concepts as directly as possible, with hands-on implementations of all concepts. We focus on only two main types of data: spike trains and electric fields (such as the local field potential [LFP], or electroencephalogram [EEG]). If you're interested in other data (e.g., calcium imaging, or BOLD), you may still find the examples indirectly useful (for example, demonstrations of how to compute and interpret a power spectrum of a signal).

This repository was created by Emily Schlafly and Mark Kramer, with important contributions from Dr. Anthea Cheung.

Thank you to:

MIT Press for publishing the MATLAB version of this material.
NIH NIGMS R25GM114827 and NSF DMS #1451384 for support.

Quick start to learning Python for neural data analysis:

Visit the web-formatted version of the book .
Watch this 2 minute video .
Read and interact with the Python code in your web browser.

Slow start to learning Python for neural data analysis:

There are multiple ways to interact with these notebooks.

Simple : Visit the web-formatted version of the notebooks .

Intermediate : Open a notebook in Binder and interact with the notebooks through a JupyterHub server. Binder provides an easy interface to interact with this material; read about it in eLife .

Advanced : Download the notebooks and run them locally (i.e. on your own computer) in Jupyter . You'll then be able to read, edit and execute the Python code directly in your browser and you can save any changes you make or notes that you want to record. You will need to install Python and we recommend that you configure a Python environment as well.

Install Python

We assume you have installed Python and can get it running on your computer. Some useful references to do so include,

If this is your first time working with Python, using Anaconda is probably a good choice. It provides a simple, graphical interface to start Jupyter .

Configure Python

If you have never used the terminal before, consider using Anaconda Navigator , Anaconda's desktop graphical user interface (GUI).

Once you have installed Anaconda or Miniconda, we recommend setting up an environment to run the notebooks. If you downloaded the repository from Github , then you can run the commands below in your terminal to configure your local environment to match the Binder environment. If you have never used the terminal before, consider using Anaconda Navigator , Anaconda's desktop graphical user interface (GUI). The environment file we use on Binder is located in the binder folder.

This will ensure that you have all the packages needed to run the notebooks. Note that you can use make clean to remove the changes made during make config .

Finally, whenever you are ready to work with the notebooks, activate your environment and start Jupyter:

If you prefer, you can use jupyter lab instead of jupyter notebook .

Contributions

We very much appreciate your contributions to this material. Contribitions may include:

Error corrections
Suggestions
New material to include (please start from this template ).

There are two ways to suggest a contribution:

Simple : Visit Case Studies Python , locate the file to edit, and follow these instructions .

Advanced : Fork Case Studies Python and submit a pull request

Contributors 6

Jupyter Notebook 100.0%
Python 0.0%
Makefile 0.0%

Notice: While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience.

Notice: Your browser is ancient . Please upgrade to a different browser to experience a better web.

Chat on IRC

Python Success Stories

Python is part of the winning formula for productivity, software quality, and maintainability at many companies and institutions around the world. Here are real-life Python success stories, classified by application domain.

Accessibility

Assistive technologies, code generation, computer graphics, configuration, cross-platform development, data mining, documentation development, embedded systems, functional testing, game development, high availability, java and python, legacy system integration, network development, product development, python on windows, rapid application development, rss aggregator, scalability, systems administration, unit testing, unix/linux developers, user interface, visual effects, web development, apparel industry, business information, customer relationship management (crm), collaboration support, content management, document management, energy efficiency, enterprise resource planning (erp), financial services, fortune 500, gis and mapping, human resources, knowledge management, manufacturing, pharmaceuticals, project management, quality control, six sigma, lean manufacturing, relational online analytical processing (rolap), risk management, roi case study, post secondary, administration, homeland security, public safety, traffic control, urban infrastructure, bioinformatics, computational chemistry, data visualization, drug discovery, scientific programming, software development.

Python in The Blind Audio Tactile Mapping System
Python and Zope in the EZRO Content Management System
Python in The Blind Audio Tactile mapping System
A Custom Image Viewing Game for an Autistic Child
Cog: A Code Generation Tool Written in Python
Industrial Light & Magic Runs on Python
D-Link Australia Uses Python to Control Firmware Updates
Python as Technology Enabler for TTTech's Development Software
Integration of Legacy Monitoring Systems into a Central Management Console
Python On Guard
Python Powers Journyx Timesheet
The Devil Framework: A Python-based Distributed System For Technology Integration
ForecastWatch.com Uses Python To Help Meteorologists
Gusto! Chooses Python for Travel Social Network Transition
Python Enterprise-Wide at the University of St Andrews in Scotland
Honeywell Avoids Documentation Costs with Python and other Open Standards
Enovad Used Python to Deliver its Armadillo Commercial Anti-Spam Software
Why Python?
Carmanah Lights the Way with Python
Botonomy Uses Python to Create ProjectPipe.com for Web-based Project Management
Rapid Application Development with Python
GravityZoo: Bringing Your Desktop Applications To The Internet As A Service
Suzanne: Python Handles Critical Data in a Domain Name Landrush
Frequentis TAPtools® - Python in Air Traffic Control
AstraZeneca Uses Python for Collaborative Drug Discovery
Acqutek Uses Python to Control CD/DVD Packaging Hardware
IronPython at Resolver Systems: Python Learns New Tricks
Maritime Industry Increases Efficiency with Python
Wing IDE Takes Flight with Python
At Philips, The Semiconductor Line in Fishkill Runs on Python
MayaVi Uses Python for Scientific Data Visualization
Python Streamlines Space Shuttle Mission Design
Python is Rackspace's CORE Technology
DevNet: A web+based RSS aggregator developed in Python
Test&Go Uses Python for Data Validation
Putting Web Services to Work with Python
Verity Ultraseek: Building Successful Enterprise Solutions with Python
Wordstream Uses Python as Their Platform of Choice
LoveIntros Uses Python to Help Northwest Singles Click
ERP5: Mission-critical ERP/CRM with Python and Zope
XIST: An XML Transformation Engine Written in Python
Simulating Biomolecules with Python

Engineering

Python Machine Learning Case Studies

Five Case Studies for the Data Scientist

© 2017
Danish Haroon 0

Karachi, Pakistan

You can also search for this author in PubMed Google Scholar

Applies a case study-based approach to machine learning
Gives you insights into the core concepts of machine learning and optimization techniques
Uses Python as an aid to implement machine learning

53k Accesses

8 Citations

1 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

Available as EPUB and PDF
Read on any device
Instant download
Own it forever
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (5 chapters)

Front matter, statistics and probability.

Danish Haroon

Time Series

Classification, back matter.

Machine Leraning
Time Series Modelling
Data Analysys

About this book

Gain insights into machine learning concepts
Work on real-world applications of machine learning
Learn concepts of model selection and optimization
Get a hands-on overview of Python from a machine learning point of view

Authors and Affiliations

About the author, bibliographic information.

Book Title : Python Machine Learning Case Studies

Book Subtitle : Five Case Studies for the Data Scientist

Authors : Danish Haroon

DOI : https://doi.org/10.1007/978-1-4842-2823-4

Publisher : Apress Berkeley, CA

eBook Packages : Professional and Applied Computing , Apress Access Books , Professional and Applied Computing (R0)

Softcover ISBN : 978-1-4842-2822-7 Published: 29 October 2017

eBook ISBN : 978-1-4842-2823-4 Published: 27 October 2017

Edition Number : 1

Number of Pages : XVII, 204

Number of Illustrations : 21 b/w illustrations, 99 illustrations in colour

Topics : Artificial Intelligence , Python , Big Data , Programming Languages, Compilers, Interpreters , Programming Techniques , Database Management

Publish with us

Policies and ethics

Find a journal
Track your research

Data Science Projects with Python: A case study approach to gaining valuable insights from real data with machine learning

Gain hands-on experience of Python programming with industry-standard machine learning techniques using pandas, scikit-learn, and XGBoost

Key Features

Think critically about data and use it to form and test a hypothesis
Choose an appropriate machine learning model and train it on your data
Communicate data-driven insights with confidence and clarity

Book Description

What you will learn.

Load, explore, and process data using the pandas Python package
Use Matplotlib to create compelling data visualizations
Implement predictive machine learning models with scikit-learn
Use lasso and ridge regression to reduce model overfitting
Evaluate random forest and logistic regression model performance
Deliver business insights by presenting clear, convincing conclusions

Who this book is for

Data Science Projects with Python – Second Edition is for anyone who wants to get started with data science and machine learning. If you’re keen to advance your career by using data analysis and predictive modeling to generate business insights, then this book is the perfect place to begin. To quickly grasp the concepts covered, it is recommended that you have basic experience of programming with Python or another similar language, and a general interest in statistics.

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Dijkstra’s Algorithm Explained: Implementing with Python for Optimal Pathfinding

Connectivity between cities poses many issues to the supply chain. There are no train transportation facilities in the towns of Himachal Pradesh, Jammu, and Kashmir, and thus, it is essential to decrease transportation and other costs.

Dijkstra’s algorithm is one such method for finding the minimum distance between two cities or nodes. In this article, we will discuss this algorithm and understand its Python implementation.

Dijkstra’s algorithm is an efficient technique for finding the shortest path between nodes in a graph. It works by iteratively determining the minimal distance from a starting node to all other nodes, using a priority queue to explore the most promising paths first. This Python tutorial explains how to implement Dijkstra’s algorithm to compute shortest paths effectively

Recommended: Maximizing Cost Savings Through Offshore Development: A Comprehensive Guide

Recommended: Connected Health: Exploring IoT Solutions for Healthcare Transformation

Understanding Dijkstra’s Algorithm

This algorithm is very famous among people studying computer science. It is used to measure the shortest distance between the nodes with weights being attached to them. It is also an iterative process with random distances assumed between each pair of nodes. Let us understand it with a code in the next section.

Let us now look at its implementation in Python programming language.

Python Code Implementation

Let us understand this phenomenon with Python code.

These are the necessary libraries required to code our algorithms, networkx is used to plot the graphs, heapq is used to make queues, and matplotlib for visualization.

The above function is used to plot the shortest distance between the two nodes where all the nodes have some weight. We also initialize a queue which will be used to assign particular importance to different nodes.

We now introduce a loop where we find out the minimum distance between two nodes as defined by the heapq which gives the priority order of the nodes. Accordingly, distance is found between each node and based on the distance, priority is assigned to each of them based on which further weightage of the paths is decided.

Now, in the example, we have a dictionary with weights and nodes predefined. We set the starting node as ‘A’ and then Dijkstra’s algorithm is applied with the results being stored in a different dictionary. We then visualize the above plot using different libraries of the Python programming language.

We can see that this is an iterative code. Let us look at its output.

Case Study: Shortest Path in Indian Cities

Let us understand this concept with a simple case study. We have some Indian cities like Mumbai, Nagpur, etc. given in the code below, for which we want to minimize the distance. We need to find the best route possible from Nagpur.

Let us look at the output for the same. Essentially, we have highlighted the shortest routes with red as displayed in the output. The above code is more or less similar to the code discussed in the previous section.

You’ve now mastered Dijkstra’s algorithm and its practical application in Python. Whether optimizing routes or solving complex network issues, this algorithm stands as a cornerstone in the field of computational theory and operations research. How might these principles be applied to optimize daily logistics in your community?

Recommended: Random Search in Machine Learning: Hyperparameter Tuning Technique

Recommended: Levenshtein Distance in Python: Troubleshooting Installation Errors on Windows

Skip to content
Skip to search
Skip to footer

Support & Downloads

Worldwide - English
Arabic - عربي
Brazil - Português
Canada - Français
China - 简体中文
China - 繁體中文 (臺灣)
Germany - Deutsch
Italy - Italiano
Japan - 日本語
Korea - 한국어
Latin America - Español
Netherlands - Nederlands">Netherlands - Nederlands
Helpful Links
Licensing Support
Technology Support
Support for Cisco Acquisitions
Support Tools
Cisco Community

To open or view a case, you need a service contract

Get instant updates on your TAC Case and more

Contact TAC by Phone

800-553-2447 US/Canada

866-606-1866 US/Canada

Returns Portal

Products by Category

Unified Communications
Networking Software (IOS & NX-OS)
Collaboration Endpoints and Phones

Status Tools

The Cisco Security portal provides actionable intelligence for security threats and vulnerabilities in Cisco products and services and third-party products.

Get to know any significant issues, other than security vulnerability-related issues, that directly involve Cisco products and typically require an upgrade, workaround, or other customer action.

Check the current status of services and components for Cisco's cloud-based Webex, Security and IoT offerings.

The Cisco Support Assistant (formerly TAC Connect Bot) provides a self-service experience for common case inquiries and basic transactions without waiting in a queue.

Suite of tools to assist you in the day to day operations of your Collaboration infrastructure.

The Cisco CLI Analyzer (formerly ASA CLI Analyzer) is a smart SSH client with internal TAC tools and knowledge integrated. It is designed to help troubleshoot and check the overall health of your Cisco supported software.

My Notifications allows an user to subscribe and receive notifications for Cisco Security Advisories, End of Life Announcements, Field Notices, and Software & Bug updates for specific Cisco products and technologies.

More Support

Partner Support
Small Business Product Support
Business Critical Services
Customer Experience
DevNet Developer Support
Cisco Trust Portal

Cisco Communities

Generate and manage PAK-based and other device licenses, including demo licenses.

Track and manage Smart Software Licenses.

Generate and manage licenses from Enterprise Agreements.

Solve common licensing issues on your own.

Software and Downloads

Find software bugs based on product, release and keyword.

View Cisco suggestions for supported products.

Use the Cisco Software Checker to search for Cisco Security Advisories that apply to specific Cisco IOS, IOS XE, NX-OS and NX-OS in ACI Mode software releases.

Get the latest updates, patches and releases of Cisco Software.

IMAGES

Python Case Studies that Programmers should know
Think Python: Chapter 4 Case study interface design
Python Machine Learning Case Studies
Case Statement In Python
Python For Beginners
GitHub

VIDEO

studying college intermediate python
Think Python Chapter 9 Case study A word play
Loss Given Default LinearRegressionCaseStudy
python case study reference with video
Python
Learn Matplotlib in 15 Minutes

COMMENTS

Python Case Studies
A Case study in python. Creating an ML model to predict the apt price of a given diamond. Predicting the right price for an old car using python machine learning. Create an ML model to forecast the demand of rental bikes every hour of the day. Estimating the price of a computer, based on its specs.
Data Science Case Studies: Solved and Explained
Feb 21, 2021. --. 1. Solving a Data Science case study means analyzing and solving a problem statement intensively. Solving case studies will help you show unique and amazing data science use ...
16 Real World Case Studies of Machine Learning
6. Machine Learning Case Study on Tesla. Tesla is now a big name in the electric automobile industry and the chances that it will continue to be the trending topic for years to come are really high. It is popular and extensively known for its advanced and futuristic cars and their advanced models.
Using Python for Research
In this course, after first reviewing the basics of Python 3, we learn about tools commonly used in research settings. Using a combination of a guided introduction and more independent in-depth exploration, you will get to practice your new Python skills with various case studies chosen for their scientific breadth and their coverage of ...
Introduction
Python Case Studies This project is a collection of six captivating case studies that use Python and computational techniques to analyse data, build classification models and unravel insights on multifaceted datasets. The array of topics touch on different domains of knowledge. Central to these studies is the application of tools and concepts.
Data Science Case Studies: Solved using Python
February 19, 2021. Machine Learning. 1. Solving a Data Science case study means analyzing and solving a problem statement intensively. Solving case studies will help you show unique and amazing data science use cases in your portfolio. In this article, I'm going to introduce you to 3 data science case studies solved and explained using Python.
Python Case Study Collection: Unveiling Real-World Solutions
Welcome to the "Python Case Studies for Data Analysts and Data Scientists" playlist! This carefully curated collection of case studies is designed to provide...
Essential Statistics for Data Science: A Case Study using Python, Part
173SHARES. Author: Tim Dobbins Engineer & Statistician. Author: John Burke Research Analyst. Statistics. Essential Statistics for Data Science: A Case Study using Python, Part I. Get to know some of the essential statistics you should be very familiar with when learning data science. LearnDataSci is reader-supported.
Case-Studies-Python
This repository is a companion to the textbook Case Studies in Neural Data Analysis, by Mark Kramer and Uri Eden. That textbook uses MATLAB to analyze examples of neuronal data. The material here is similar, except that we use Python. The intended audience is the practicing neuroscientist - e.g., the students, researchers, and clinicians ...
Data Science Projects with Python: A case study approach to successful
Basic knowledge of Python and data analytics will help you get the most from this book. Familiarity with mathematical concepts such as algebra and basic statistics will also be useful. ... Data Science Projects with Python: A case study approach to successful data science projects using Python, pandas, and scikit-learn: Author: Stephen ...
Data Science Projects with Python: A case study approach to gaining
This creates a case-study approach that simulates the working conditions you'll experience in real-world data science projects. You'll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning ...
A Data Science Case Study with Python: Mercari Price Prediction
In this case study, we will walk through the Analysis, Modelling and Communication part of the workflow. The general steps involved for solving a data science problem are as follows: ... Those of you who are not familiar with the field of Data Science and Python programming language can still follow through the article as it will give an high ...
Humanities Data Analysis: Case Studies with Python
Humanities Data Analysis: Case Studies with Python is a practical guide to data-intensive humanities research using the Python programming language. The book, written by Folgert Karsdorp, Mike Kestemont and Allen Riddell, was originally published with Princeton University Press in 2021 (for a printed version of the book, see the publisher's website), and is now available as an Open Access ...
GitHub
Case-Studies-Python This repository is a companion to the textbook Case Studies in Neural Data Analysis , by Mark Kramer and Uri Eden. That textbook uses MATLAB to analyze examples of neuronal data.
Python Success Stories
Python is part of the winning formula for productivity, software quality, and maintainability at many companies and institutions around the world. Here are real-life Python success stories, classified by application domain. ... ROI Case Study. Honeywell Avoids Documentation Costs with Python and other Open Standards; Python Powers Journyx ...
Python Case Studies that Programmers should know
A case study is a detailed, in-depth examination of a specific situation, problem, or project. In the context of Python, a Python case study is a detailed examination of how Python has been used to solve a specific problem or achieve a particular goal. Python case studies can be useful for a number of purposes, including: 1.
Python Machine Learning Case Studies
The book uses a hands-on case study-based approach to crack real-world applications to which machine learning concepts can be applied. These smarter machines will enable your business processes to achieve efficiencies on minimal time and resources. Python Machine Learning Case Studies takes you through the steps to improve business processes ...
(PDF) Python Programming with Case Studies
If you don't know how, thenthis book titled "Analyzing and Visualizing Data using Free Open Source Software-Python Programming with case studies" (Jupyter Notebook 6.0.3 and Python 3.7.6)is an ...
Data Science Projects with Python: A case study approach to gaining
This creates a case-study approach that simulates the working conditions you'll experience in real-world data science projects. You'll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning ...
11 Real World Applications for Python Skills
A data engineer could use their Python skills to build a pipeline that automates collection from the various sources, joins and cleans the data, and makes it easier for analysts to access and filter. 7. Robotics. Python is a popular language in the field of robotics, both among hobbyists and professionals.
Python for the practicing neuroscientist
Python for the practicing neuroscientist¶. To be frank: this notebook is rather boring. Throughout all of the case studies, we will use the software package Python.The best way to learn new software (and probably most things) is when motivated by a particular problem.
Data Science Real-World Case Studies
The purpose of this course is to provide you with knowledge of key aspects of data science applications in business in a practical, easy and fun way. The course provides students with practical hands-on experience using real-world datasets. 1.Task #1 :: Predict Success of a Zomato Restaurant : Develop an Machine Learning model to predict ...
Hypothesis Testing with Python: Step by step hands-on tutorial with
It tests the null hypothesis that the population variances are equal (called homogeneity of variance or homoscedasticity). Suppose the resulting p-value of Levene's test is less than the significance level (typically 0.05).In that case, the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances.
Dijkstra's Algorithm Explained: Implementing with Python for Optimal
This Python tutorial explains how to implement Dijkstra's algorithm to compute shortest paths effectively. Recommended: Maximizing Cost Savings Through Offshore Development: A Comprehensive Guide. ... Case Study: Shortest Path in Indian Cities. Let us understand this concept with a simple case study. We have some Indian cities like Mumbai ...
Support
Check the current status of services and components for Cisco's cloud-based Webex, Security and IoT offerings. Cisco Support Assistant. The Cisco Support Assistant (formerly TAC Connect Bot) provides a self-service experience for common case inquiries and basic transactions without waiting in a queue.