Chapman University Digital Commons

Home > Dissertations and Theses > Computational and Data Sciences (PhD) Dissertations

Computational and Data Sciences (PhD) Dissertations

Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries' print collection or in Proquest's Dissertations and Theses database.

Dissertations from 2024 2024

A Novel Correction for the Multivariate Ljung-Box Test , Minhao Huang

Machine Learning and Geostatistical Approaches for Discovery of Weather and Climate Events Related to El Niño Phenomena , Sachi Perera

Global to Glocal: A Confluence of Data Science and Earth Observations in the Advancement of the SDGs , Rejoice Thomas

Dissertations from 2023 2023

Computational Analysis of Antibody Binding Mechanisms to the Omicron RBD of SARS-CoV-2 Spike Protein: Identification of Epitopes and Hotspots for Developing Effective Therapeutic Strategies , Mohammed Alshahrani

Integration of Computer Algebra Systems and Machine Learning in the Authoring of the SANYMS Intelligent Tutoring System , Sam Ford

Voluntary Action and Conscious Intention , Jake Gavenas

Random Variable Spaces: Mathematical Properties and an Extension to Programming Computable Functions , Mohammed Kurd-Misto

Computational Modeling of Superconductivity from the Set of Time-Dependent Ginzburg-Landau Equations for Advancements in Theory and Applications , Iris Mowgood

Application of Machine Learning Algorithms for Elucidation of Biological Networks from Time Series Gene Expression Data , Krupa Nagori

Stochastic Processes and Multi-Resolution Analysis: A Trigonometric Moment Problem Approach and an Analysis of the Expenditure Trends for Diabetic Patients , Isaac Nwi-Mozu

Applications of Causal Inference Methods for the Estimation of Effects of Bone Marrow Transplant and Prescription Drugs on Survival of Aplastic Anemia Patients , Yesha M. Patel

Causal Inference and Machine Learning Methods in Parkinson's Disease Data Analysis , Albert Pierce

Causal Inference Methods for Estimation of Survival and General Health Status Measures of Alzheimer’s Disease Patients , Ehsan Yaghmaei

Dissertations from 2022 2022

Computational Approaches to Facilitate Automated Interchange between Music and Art , Rao Hamza Ali

Causal Inference in Psychology and Neuroscience: From Association to Causation , Dehua Liang

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues , Hanna Lu

Novel Techniques for Quantifying Secondhand Smoke Diffusion into Children's Bedroom , Sunil Ramchandani

Probing the Boundaries of Human Agency , Sook Mun Wong

Dissertations from 2021 2021

Predicting Eye Movement and Fixation Patterns on Scenic Images Using Machine Learning for Children with Autism Spectrum Disorder , Raymond Anden

Forecasting the Prices of Cryptocurrencies using a Novel Parameter Optimization of VARIMA Models , Alexander Barrett

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing , Natalie Best

Exploring Behaviors of Software Developers and Their Code Through Computational and Statistical Methods , Elia Eiroa Lledo

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis , Arin Ghazarian

Multi-Modal Data Fusion, Image Segmentation, and Object Identification using Unsupervised Machine Learning: Conception, Validation, Applications, and a Basis for Multi-Modal Object Detection and Tracking , Nicholas LaHaye

Machine-Learning-Based Approach to Decoding Physiological and Neural Signals , Elnaz Lashgari

Learning-Based Modeling of Weather and Climate Events Related To El Niño Phenomenon via Differentiable Programming and Empirical Decompositions , Justin Le

Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning , Shiva Lotfallahzadeh Barzili

Novel Applications of Statistical and Machine Learning Methods to Analyze Trial-Level Data from Cognitive Measures , Chelsea Parlett

Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data , Jianwei Zheng

Dissertations from 2020 2020

Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents , Steven Agajanian

Allocation of Public Resources: Bringing Order to Chaos , Lance Clifner

A Novel Correction for the Adjusted Box-Pierce Test — New Risk Factors for Emergency Department Return Visits within 72 hours for Children with Respiratory Conditions — General Pediatric Model for Understanding and Predicting Prolonged Length of Stay , Sidy Danioko

A Computational and Experimental Examination of the FCC Incentive Auction , Logan Gantner

Exploring the Employment Landscape for Individuals with Autism Spectrum Disorders using Supervised and Unsupervised Machine Learning , Kayleigh Hyde

Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations , Oluyemi Odeyemi

On Quantum Effects of Vector Potentials and Generalizations of Functional Analysis , Ismael L. Paiva

Long Term Ground Based Precipitation Data Analysis: Spatial and Temporal Variability , Luciano Rodriguez

Gaining Computational Insight into Psychological Data: Applications of Machine Learning with Eating Disorders and Autism Spectrum Disorder , Natalia Rosenfield

Connecting the Dots for People with Autism: A Data-driven Approach to Designing and Evaluating a Global Filter , Viseth Sean

Novel Statistical and Machine Learning Methods for the Forecasting and Analysis of Major League Baseball Player Performance , Christopher Watkins

Dissertations from 2019 2019

Contributions to Variable Selection in Complexly Sampled Case-control Models, Epidemiology of 72-hour Emergency Department Readmission, and Out-of-site Migration Rate Estimation Using Pseudo-tagged Longitudinal Data , Kyle Anderson

Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images , Justin J. Gapper

Estimating Auction Equilibria using Individual Evolutionary Learning , Kevin James

Employing Earth Observations and Artificial Intelligence to Address Key Global Environmental Challenges in Service of the SDGs , Wenzhao Li

Image Restoration using Automatic Damaged Regions Detection and Machine Learning-Based Inpainting Technique , Chloe Martin-King

Theses from 2017 2017

Optimized Forecasting of Dominant U.S. Stock Market Equities Using Univariate and Multivariate Time Series Analysis Methods , Michael Schwartz

  • Collections
  • Disciplines

Advanced Search

  • Notify me via email or RSS

Author Corner

  • Submit Research
  • Rights and Terms of Use
  • Leatherby Libraries
  • Chapman University

ISSN 2572-1496

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

DigitalCommons@Kennesaw State University

Home > CCSE > Data Science and Analytics > PhD DSA

Doctor of Data Science and Analytics Dissertations

The PhD Website

The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests.

We launched the first formal PhD program in Data Science in 2015. Our program sits at the intersection of computer science, statistics, mathematics, and business. Our students engage in relevant research with faculty from across our eleven colleges. As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community. -Sherry Ni, Director, Ph.D. in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Need to Submit Your Dissertation? Submit Here!

Dissertations from 2023 2023.

Quantification of Various Types of Biases in Large Language Models , Sudhashree Sayenju

Dissertations from 2022 2022

Appley: Approximate Shapley Values for Model Explainability in Linear Time , Md Shafiul Alam

Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics , Jonathan Boardman

Novel Instance-Level Weighted Loss Function for Imbalanced Learning , Trent Geisler

Debiasing Cyber Incidents – Correcting for Reporting Delays and Under-reporting , Seema Sangari

Dissertations from 2021 2021

Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset , Mohammad Masum

A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in Episodes of Care Healthcare Delivery System , Lauren Staples

Dissertations from 2020 2020

A CREDIT ANALYSIS OF THE UNBANKED AND UNDERBANKED: AN ARGUMENT FOR ALTERNATIVE DATA , Edwin Baidoo

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies , Jessica M. Rudd

Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring , Yan Wang

A Novel Penalized Log-likelihood Function for Class Imbalance Problem , Lili Zhang

ATTACK AND DEFENSE IN SECURITY ANALYTICS , Yiyun Zhou

Dissertations from 2019 2019

One and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles , Bogdan Gadidov

Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis , Jie Hao

Deep Embedding Kernel , Linh Le

Ordinal HyperPlane Loss , Bob Vanderheyden

Advanced Search

  • Notify me via email or RSS
  • All Collections
  • Disciplines
  • Conferences
  • Faculty Works
  • Open Access
  • Research Support
  • Student Works
  • Data Science Homepage

Useful Links

  • Training Materials

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright DigitalCommons@Kennesaw State University ISSN: 2576-6805

Machine Learning - CMU

PhD Dissertations

PhD Dissertations

[all are .pdf files].

Learning Models that Match Jacob Tyo, 2024

Improving Human Integration across the Machine Learning Pipeline Charvi Rastogi, 2024

Reliable and Practical Machine Learning for Dynamic Healthcare Settings Helen Zhou, 2023

Automatic customization of large-scale spiking network models to neuronal population activity (unavailable) Shenghao Wu, 2023

Estimation of BVk functions from scattered data (unavailable) Addison J. Hu, 2023

Rethinking object categorization in computer vision (unavailable) Jayanth Koushik, 2023

Advances in Statistical Gene Networks Jinjin Tian, 2023 Post-hoc calibration without distributional assumptions Chirag Gupta, 2023

The Role of Noise, Proxies, and Dynamics in Algorithmic Fairness Nil-Jana Akpinar, 2023

Collaborative learning by leveraging siloed data Sebastian Caldas, 2023

Modeling Epidemiological Time Series Aaron Rumack, 2023

Human-Centered Machine Learning: A Statistical and Algorithmic Perspective Leqi Liu, 2023

Uncertainty Quantification under Distribution Shifts Aleksandr Podkopaev, 2023

Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023

Comparing Forecasters and Abstaining Classifiers Yo Joong Choe, 2023

Using Task Driven Methods to Uncover Representations of Human Vision and Semantics Aria Yuan Wang, 2023

Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023

Applied Mathematics of the Future Kin G. Olivares, 2023

METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023

NEURAL REASONING FOR QUESTION ANSWERING Haitian Sun, 2023

Principled Machine Learning for Societally Consequential Decision Making Amanda Coston, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Maxwell B. Wang, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Darby M. Losey, 2023

Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics David Zhao, 2023

Towards an Application-based Pipeline for Explainability Gregory Plumb, 2022

Objective Criteria for Explainable Machine Learning Chih-Kuan Yeh, 2022

Making Scientific Peer Review Scientific Ivan Stelmakh, 2022

Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity Pratik Patil, 2022

Active Robot Perception using Programmable Light Curtains Siddharth Ancha, 2022

Strategies for Black-Box and Multi-Objective Optimization Biswajit Paria, 2022

Unifying State and Policy-Level Explanations for Reinforcement Learning Nicholay Topin, 2022

Sensor Fusion Frameworks for Nowcasting Maria Jahja, 2022

Equilibrium Approaches to Modern Deep Learning Shaojie Bai, 2022

Towards General Natural Language Understanding with Probabilistic Worldbuilding Abulhair Saparov, 2022

Applications of Point Process Modeling to Spiking Neurons (Unavailable) Yu Chen, 2021

Neural variability: structure, sources, control, and data augmentation Akash Umakantha, 2021

Structure and time course of neural population activity during learning Jay Hennig, 2021

Cross-view Learning with Limited Supervision Yao-Hung Hubert Tsai, 2021

Meta Reinforcement Learning through Memory Emilio Parisotto, 2021

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning Lisa Lee, 2021

Learning to Predict and Make Decisions under Distribution Shift Yifan Wu, 2021

Statistical Game Theory Arun Sai Suggala, 2021

Towards Knowledge-capable AI: Agents that See, Speak, Act and Know Kenneth Marino, 2021

Learning and Reasoning with Fast Semidefinite Programming and Mixing Methods Po-Wei Wang, 2021

Bridging Language in Machines with Language in the Brain Mariya Toneva, 2021

Curriculum Learning Otilia Stretcu, 2021

Principles of Learning in Multitask Settings: A Probabilistic Perspective Maruan Al-Shedivat, 2021

Towards Robust and Resilient Machine Learning Adarsh Prasad, 2021

Towards Training AI Agents with All Types of Experiences: A Unified ML Formalism Zhiting Hu, 2021

Building Intelligent Autonomous Navigation Agents Devendra Chaplot, 2021

Learning to See by Moving: Self-supervising 3D Scene Representations for Perception, Control, and Visual Reasoning Hsiao-Yu Fish Tung, 2021

Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe Collin Politsch, 2020

Causal Inference with Complex Data Structures and Non-Standard Effects Kwhangho Kim, 2020

Networks, Point Processes, and Networks of Point Processes Neil Spencer, 2020

Dissecting neural variability using population recordings, network models, and neurofeedback (Unavailable) Ryan Williamson, 2020

Predicting Health and Safety: Essays in Machine Learning for Decision Support in the Public Sector Dylan Fitzpatrick, 2020

Towards a Unified Framework for Learning and Reasoning Han Zhao, 2020

Learning DAGs with Continuous Optimization Xun Zheng, 2020

Machine Learning and Multiagent Preferences Ritesh Noothigattu, 2020

Learning and Decision Making from Diverse Forms of Information Yichong Xu, 2020

Towards Data-Efficient Machine Learning Qizhe Xie, 2020

Change modeling for understanding our world and the counterfactual one(s) William Herlands, 2020

Machine Learning in High-Stakes Settings: Risks and Opportunities Maria De-Arteaga, 2020

Data Decomposition for Constrained Visual Learning Calvin Murdock, 2020

Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data Micol Marchetti-Bowick, 2020

Towards Efficient Automated Machine Learning Liam Li, 2020

LEARNING COLLECTIONS OF FUNCTIONS Emmanouil Antonios Platanios, 2020

Provable, structured, and efficient methods for robustness of deep networks to adversarial examples Eric Wong , 2020

Reconstructing and Mining Signals: Algorithms and Applications Hyun Ah Song, 2020

Probabilistic Single Cell Lineage Tracing Chieh Lin, 2020

Graphical network modeling of phase coupling in brain activity (unavailable) Josue Orellana, 2019

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees Christoph Dann, 2019 Learning Generative Models using Transformations Chun-Liang Li, 2019

Estimating Probability Distributions and their Properties Shashank Singh, 2019

Post-Inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making Willie Neiswanger, 2019

Accelerating Text-as-Data Research in Computational Social Science Dallas Card, 2019

Multi-view Relationships for Analytics and Inference Eric Lei, 2019

Information flow in networks based on nonstationary multivariate neural recordings Natalie Klein, 2019

Competitive Analysis for Machine Learning & Data Science Michael Spece, 2019

The When, Where and Why of Human Memory Retrieval Qiong Zhang, 2019

Towards Effective and Efficient Learning at Scale Adams Wei Yu, 2019

Towards Literate Artificial Intelligence Mrinmaya Sachan, 2019

Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data Calvin McCarter, 2019

Unified Models for Dynamical Systems Carlton Downey, 2019

Anytime Prediction and Learning for the Balance between Computation and Accuracy Hanzhang Hu, 2019

Statistical and Computational Properties of Some "User-Friendly" Methods for High-Dimensional Estimation Alnur Ali, 2019

Nonparametric Methods with Total Variation Type Regularization Veeranjaneyulu Sadhanala, 2019

New Advances in Sparse Learning, Deep Networks, and Adversarial Learning: Theory and Applications Hongyang Zhang, 2019

Gradient Descent for Non-convex Problems in Modern Machine Learning Simon Shaolei Du, 2019

Selective Data Acquisition in Learning and Decision Making Problems Yining Wang, 2019

Anomaly Detection in Graphs and Time Series: Algorithms and Applications Bryan Hooi, 2019

Neural dynamics and interactions in the human ventral visual pathway Yuanning Li, 2018

Tuning Hyperparameters without Grad Students: Scaling up Bandit Optimisation Kirthevasan Kandasamy, 2018

Teaching Machines to Classify from Natural Language Interactions Shashank Srivastava, 2018

Statistical Inference for Geometric Data Jisu Kim, 2018

Representation Learning @ Scale Manzil Zaheer, 2018

Diversity-promoting and Large-scale Machine Learning for Healthcare Pengtao Xie, 2018

Distribution and Histogram (DIsH) Learning Junier Oliva, 2018

Stress Detection for Keystroke Dynamics Shing-Hon Lau, 2018

Sublinear-Time Learning and Inference for High-Dimensional Models Enxu Yan, 2018

Neural population activity in the visual cortex: Statistical methods and application Benjamin Cowley, 2018

Efficient Methods for Prediction and Control in Partially Observable Environments Ahmed Hefny, 2018

Learning with Staleness Wei Dai, 2018

Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data Jing Xiang, 2017

New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Yu-Xiang Wang, 2017

Dynamic Question Ordering: Obtaining Useful Information While Reducing User Burden Kirstin Early, 2017

New Optimization Methods for Modern Machine Learning Sashank J. Reddi, 2017

Active Search with Complex Actions and Rewards Yifei Ma, 2017

Why Machine Learning Works George D. Montañez , 2017

Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human Vision Ying Yang , 2017

Computational Tools for Identification and Analysis of Neuronal Population Activity Pengcheng Zhou, 2016

Expressive Collaborative Music Performance via Machine Learning Gus (Guangyu) Xia, 2016

Supervision Beyond Manual Annotations for Learning Visual Representations Carl Doersch, 2016

Exploring Weakly Labeled Data Across the Noise-Bias Spectrum Robert W. H. Fisher, 2016

Optimizing Optimization: Scalable Convex Programming with Proximal Operators Matt Wytock, 2016

Combining Neural Population Recordings: Theory and Application William Bishop, 2015

Discovering Compact and Informative Structures through Data Partitioning Madalina Fiterau-Brostean, 2015

Machine Learning in Space and Time Seth R. Flaxman, 2015

The Time and Location of Natural Reading Processes in the Brain Leila Wehbe, 2015

Shape-Constrained Estimation in High Dimensions Min Xu, 2015

Spectral Probabilistic Modeling and Applications to Natural Language Processing Ankur Parikh, 2015 Computational and Statistical Advances in Testing and Learning Aaditya Kumar Ramdas, 2015

Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain Alona Fyshe, 2015

Learning Statistical Features of Scene Images Wooyoung Lee, 2014

Towards Scalable Analysis of Images and Videos Bin Zhao, 2014

Statistical Text Analysis for Social Science Brendan T. O'Connor, 2014

Modeling Large Social Networks in Context Qirong Ho, 2014

Semi-Cooperative Learning in Smart Grid Agents Prashant P. Reddy, 2013

On Learning from Collective Data Liang Xiong, 2013

Exploiting Non-sequence Data in Dynamic Model Learning Tzu-Kuo Huang, 2013

Mathematical Theories of Interaction with Oracles Liu Yang, 2013

Short-Sighted Probabilistic Planning Felipe W. Trevizan, 2013

Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Lucia Castellanos, 2013

Approximation Algorithms and New Models for Clustering and Learning Pranjal Awasthi, 2013

Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Mladen Kolar, 2013

Learning with Sparsity: Structures, Optimization and Applications Xi Chen, 2013

GraphLab: A Distributed Abstraction for Large Scale Machine Learning Yucheng Low, 2013

Graph Structured Normal Means Inference James Sharpnack, 2013 (Joint Statistics & ML PhD)

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le, 2013

Learning Large-Scale Conditional Random Fields Joseph K. Bradley, 2013

New Statistical Applications for Differential Privacy Rob Hall, 2013 (Joint Statistics & ML PhD)

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez, 2012

Spectral Approaches to Learning Predictive Representations Byron Boots, 2012

Attribute Learning using Joint Human and Machine Computation Edith L. M. Law, 2012

Statistical Methods for Studying Genetic Variation in Populations Suyash Shringarpure, 2012

Data Mining Meets HCI: Making Sense of Large Graphs Duen Horng (Polo) Chau, 2012

Learning with Limited Supervision by Input and Output Coding Yi Zhang, 2012

Target Sequence Clustering Benjamin Shih, 2011

Nonparametric Learning in High Dimensions Han Liu, 2010 (Joint Statistics & ML PhD)

Structural Analysis of Large Networks: Observations and Applications Mary McGlohon, 2010

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy Brian D. Ziebart, 2010

Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar, 2010

Rare Category Analysis Jingrui He, 2010

Coupled Semi-Supervised Learning Andrew Carlson, 2010

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong, 2009

Efficient Matrix Models for Relational Learning Ajit Paul Singh, 2009

Exploiting Domain and Task Regularities for Robust Named Entity Recognition Andrew O. Arnold, 2009

Theoretical Foundations of Active Learning Steve Hanneke, 2009

Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning Hao Cen, 2009

Detecting Patterns of Anomalies Kaustav Das, 2009

Dynamics of Large Networks Jurij Leskovec, 2008

Computational Methods for Analyzing and Modeling Gene Regulation Dynamics Jason Ernst, 2008

Stacked Graphical Learning Zhenzhen Kou, 2007

Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan, 2007

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar, 2007

Scalable Graphical Models for Social Networks Anna Goldenberg, 2007

Measure Concentration of Strongly Mixing Processes with Applications Leonid Kontorovich, 2007

Tools for Graph Mining Deepayan Chakrabarti, 2005

Automatic Discovery of Latent Variable Models Ricardo Silva, 2005

dissertation in data science

DiscoverDataScience.org

PhD in Data Science – Your Guide to Choosing a Doctorate Degree Program

dissertation in data science

Created by aasif.faizal

Professional opportunities in data science are growing incredibly fast. That’s great news for students looking to pursue a career as a data scientist. But it also means that there are a lot more options out there to investigate and understand before developing the best educational path for you.

A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field.

phd data science

This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research. This means that PhDs are not for everyone. Indeed, many who work in the world of big data hold master’s degrees rather than PhDs, which tend to involve the same coursework as PhD programs without a dissertation component. However, for the right candidate, a PhD program is the perfect choice to become a true expert on your area of focus.

If you’ve concluded that a data science PhD is the right path for you, this guide is intended to help you choose the best program to suit your needs. It will walk through some of the key considerations while picking graduate data science programs and some of the nuts and bolts (like course load and tuition costs) that are part of the data science PhD decision-making process.

Data Science PhD vs. Masters: Choosing the right option for you

If you’re considering pursuing a data science PhD, it’s worth knowing that such an advanced degree isn’t strictly necessary in order to get good work opportunities. Many who work in the field of big data only hold master’s degrees, which is the level of education expected to be a competitive candidate for data science positions.

So why pursue a data science PhD?

Simply put, a PhD in data science will leave you qualified to enter the big data industry at a high level from the outset.

You’ll be eligible for advanced positions within companies, holding greater responsibilities, keeping more direct communication with leadership, and having more influence on important data-driven decisions. You’re also likely to receive greater compensation to match your rank.

However, PhDs are not for everyone. Dissertations require a great deal of time and an interest in intensive research. If you are eager to jumpstart a career quickly, a master’s program will give you the preparation you need to hit the ground running. PhDs are appropriate for those who want to commit their time and effort to schooling as a long-term investment in their professional trajectory.

For more information on the difference between data science PhD’s and master’s programs, take a look at our guide here.

Topics include:

  • Can I get an Online Ph.D in Data Science?
  • Overview of Ph.d Coursework

Preparing for a Doctorate Program

Building a solid track record of professional experience, things to consider when choosing a school.

  • What Does it Cost to Get a Ph.D in Data Science?
  • School Listings

data analysis graph

Data Science PhD Programs, Historically

Historically, data science PhD programs were one of the main avenues to get a good data-related position in academia or industry. But, PhD programs are heavily research oriented and require a somewhat long term investment of time, money, and energy to obtain. The issue that some data science PhD holders are reporting, especially in industry settings, is that that the state of the art is moving so quickly, and that the data science industry is evolving so rapidly, that an abundance of research oriented expertise is not always what’s heavily sought after.

Instead, many companies are looking for candidates who are up to date with the latest data science techniques and technologies, and are willing to pivot to match emerging trends and practices.

One recent development that is making the data science graduate school decisions more complex is the introduction of specialty master’s degrees, that focus on rigorous but compact, professional training. Both students and companies are realizing the value of an intensive, more industry-focused degree that can provide sufficient enough training to manage complex projects and that are more client oriented, opposed to research oriented.

However, not all prospective data science PhD students are looking for jobs in industry. There are some pretty amazing research opportunities opening up across a variety of academic fields that are making use of new data collection and analysis tools. Experts that understand how to leverage data systems including statistics and computer science to analyze trends and build models will be in high demand.

Can You Get a PhD in Data Science Online?

While it is not common to get a data science Ph.D. online, there are currently two options for those looking to take advantage of the flexibility of an online program.

Indiana University Bloomington and Northcentral University both offer online Ph.D. programs with either a minor or specialization in data science.

Given the trend for schools to continue increasing online offerings, expect to see additional schools adding this option in the near future.

woman data analysis on computer screens

Overview of PhD Coursework

A PhD requires a lot of academic work, which generally requires between four and five years (sometimes longer) to complete.

Here are some of the high level factors to consider and evaluate when comparing data science graduate programs.

How many credits are required for a PhD in data science?

On average, it takes 71 credits to graduate with a PhD in data science — far longer (almost double) than traditional master’s degree programs. In addition to coursework, most PhD students also have research and teaching responsibilities that can be simultaneously demanding and really great career preparation.

What’s the core curriculum like?

In a data science doctoral program, you’ll be expected to learn many skills and also how to apply them across domains and disciplines. Core curriculums will vary from program to program, but almost all will have a core foundation of statistics.

All PhD candidates will have to take a qualifying exam. This can vary from university to university, but to give you some insight, it is broken up into three phases at Yale. They have a practical exam, a theory exam and an oral exam. The goal is to make sure doctoral students are developing the appropriate level of expertise.

Dissertation

One of the final steps of a PhD program involves presenting original research findings in a formal document called a dissertation. These will provide background and context, as well as findings and analysis, and can contribute to the understanding and evolution of data science. A dissertation idea most often provides the framework for how a PhD candidate’s graduate school experience will unfold, so it’s important to be thoughtful and deliberate while considering research opportunities.

Since data science is such a rapidly evolving field and because choosing the right PhD program is such an important factor in developing a successful career path, there are some steps that prospective doctoral students can take in advance to find the best-fitting opportunity.

Join professional associations

Even before being fully credentials, joining professional associations and organizations such as the Data Science Association and the American Association of Big Data Professionals is a good way to get exposure to the field. Many professional societies are welcoming to new members and even encourage student participation with things like discounted membership fees and awards and contest categories for student researchers. One of the biggest advantages to joining is that these professional associations bring together other data scientists for conference events, research-sharing opportunities, networking and continuing education opportunities.

Leverage your social network

Be on the lookout to make professional connections with professors, peers, and members of industry. There are a number of LinkedIn groups dedicated to data science. A well-maintained professional network is always useful to have when looking for advice or letters of recommendation while applying to graduate school and then later while applying for jobs and other career-related opportunities.

Kaggle competitions

Kaggle competitions provide the opportunity to solve real-world data science problems and win prizes. A list of data science problems can be found at Kaggle.com . Winning one of these competitions is a good way to demonstrate professional interest and experience.

Internships

Internships are a great way to get real-world experience in data science while also getting to work for top names in the world of business. For example, IBM offers a data science internship which would also help to stand out when applying for PhD programs, as well as in seeking employment in the future.

Demonstrating professional experience is not only important when looking for jobs, but it can also help while applying for graduate school. There are a number of ways for prospective students to gain exposure to the field and explore different facets of data science careers.

Get certified

There are a number of data-related certificate programs that are open to people with a variety of academic and professional experience. DeZyre has an excellent guide to different certifications, some of which might help provide good background for graduate school applications.

Conferences

Conferences are a great place to meet people presenting new and exciting research in the data science field and bounce ideas off of newfound connections. Like professional societies and organizations, discounted student rates are available to encourage student participation. In addition, some conferences will waive fees if you are presenting a poster or research at the conference, which is an extra incentive to present.

teacher in full classroom of students

It can be hard to quantify what makes a good-fit when it comes to data science graduate school programs. There are easy to evaluate factors, such as cost and location, and then there are harder to evaluate criteria such as networking opportunities, accessibility to professors, and the up-to-dateness of the program’s curriculum.

Nevertheless, there are some key relevant considerations when applying to almost any data science graduate program.

What most schools will require when applying:

  • All undergraduate and graduate transcripts
  • A statement of intent for the program (reason for applying and future plans)
  • Letters of reference
  • Application fee
  • Online application
  • A curriculum vitae (outlining all of your academic and professional accomplishments)

What Does it Cost to Get a PhD in Data Science?

The great news is that many PhD data science programs are supported by fellowships and stipends. Some are completely funded, meaning the school will pay tuition and basic living expenses. Here are several examples of fully funded programs:

  • University of Southern California
  • University of Nevada, Reno
  • Kennesaw State University
  • Worcester Polytechnic Institute
  • University of Maryland

For all other programs, the average range of tuition, depending on the school can range anywhere from $1,300 per credit hour to $2,000 amount per credit hour. Remember, typical PhD programs in data science are between 60 and 75 credit hours, meaning you could spend up to $150,000 over several years.

That’s why the financial aspects are so important to evaluate when assessing PhD programs, because some schools offer full stipends so that you are able to attend without having to find supplemental scholarships or tuition assistance.

Can I become a professor of data science with a PhD.? Yes! If you are interested in teaching at the college or graduate level, a PhD is the degree needed to establish the full expertise expected to be a professor. Some data scientists who hold PhDs start by entering the field of big data and pivot over to teaching after gaining a significant amount of work experience. If you’re driven to teach others or to pursue advanced research in data science, a PhD is the right degree for you.

Do I need a master’s in order to pursue a PhD.? No. Many who pursue PhDs in Data Science do not already hold advanced degrees, and many PhD programs include all the coursework of a master’s program in the first two years of school. For many students, this is the most time-effective option, allowing you to complete your education in a single pass rather than interrupting your studies after your master’s program.

Can I choose to pursue a PhD after already receiving my master’s? Yes. A master’s program can be an opportunity to get the lay of the land and determine the specific career path you’d like to forge in the world of big data. Some schools may allow you to simply extend your academic timeline after receiving your master’s degree, and it is also possible to return to school to receive a PhD if you have been working in the field for some time.

If a PhD. isn’t necessary, is it a waste of time? While not all students are candidates for PhDs, for the right students – who are keen on doing in-depth research, have the time to devote to many years of school, and potentially have an interest in continuing to work in academia – a PhD is a great choice. For more information on this question, take a look at our article Is a Data Science PhD. Worth It?

Complete List of Data Science PhD Programs

Below you will find the most comprehensive list of schools offering a doctorate in data science. Each school listing contains a link to the program specific page, GRE or a master’s degree requirements, and a link to a page with detailed course information.

Note that the listing only contains true data science programs. Other similar programs are often lumped together on other sites, but we have chosen to list programs such as data analytics and business intelligence on a separate section of the website.

Boise State University  – Boise, Idaho PhD in Computing – Data Science Concentration

The Data Science emphasis focuses on the development of mathematical and statistical algorithms, software, and computing systems to extract knowledge or insights from data.  

In 60 credits, students complete an Introduction to Graduate Studies, 12 credits of core courses, 6 credits of data science elective courses, 10 credits of other elective courses, a Doctoral Comprehensive Examination worth 1 credit, and a 30-credit dissertation.

Electives can be taken in focus areas such as Anthropology, Biometry, Ecology/Evolution and Behavior, Econometrics, Electrical Engineering, Earth Dynamics and Informatics, Geoscience, Geostatistics, Hydrology and Hydrogeology, Materials Science, and Transportation Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $7,236 total (Resident), $24,573 total (Non-resident)

View Course Offerings

Bowling Green State University  – Bowling Green, Ohio Ph.D. in Data Science

Data Science students at Bowling Green intertwine knowledge of computer science with statistics.

Students learn techniques in analyzing structured, unstructured, and dynamic datasets.

Courses train students to understand the principles of analytic methods and articulating the strengths and limitations of analytical methods.

The program requires 60 credit hours in the studies of Computer Science (6 credit hours), Statistics (6 credit hours), Data Science Exploration and Communication, Ethical Issues, Advanced Data Mining, and Applied Data Science Experience.

Students must also complete 21 credit hours of elective courses, a qualifying exam, a preliminary exam, and a dissertation.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,418 (Resident), $14,410 (Non-resident)

Brown University  – Providence, Rhode Island PhD in Computer Science – Concentration in Data Science

Brown University’s database group is a world leader in systems-oriented database research; they seek PhD candidates with strong system-building skills who are interested in researching TupleWare, MLbase, MDCC, Crowd DB, or PIQL.

In order to gain entrance, applicants should consider first doing a research internship at Brown with this group. Other ways to boost an application are to take and do well at massive open online courses, do an internship at a large company, and get involved in a large open-source software project.

Coding well in C++ is preferred.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $62,680 total

Chapman University  – Irvine, California Doctorate in Computational and Data Sciences

Candidates for the doctorate in computational and data science at Chapman University begin by completing 13 core credits in basic methodologies and techniques of computational science.

Students complete 45 credits of electives, which are personalized to match the specific interests and research topics of the student.

Finally, students complete up to 12 credits in dissertation research.

Applicants must have completed courses in differential equations, data structures, and probability and statistics, or take specific foundation courses, before beginning coursework toward the PhD.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,538 per year

Clemson University / Medical University of South Carolina (MUSC) – Joint Program – Clemson, South Carolina & Charleston, South Carolina Doctor of Philosophy in Biomedical Data Science and Informatics – Clemson

The PhD in biomedical data science and informatics is a joint program co-authored by Clemson University and the Medical University of South Carolina (MUSC).

Students choose one of three tracks to pursue: precision medicine, population health, and clinical and translational informatics. Students complete 65-68 credit hours, and take courses in each of 5 areas: biomedical informatics foundations and applications; computing/math/statistics/engineering; population health, health systems, and policy; biomedical/medical domain; and lab rotations, seminars, and doctoral research.

Applicants must have a bachelor’s in health science, computing, mathematics, statistics, engineering, or a related field, and it is recommended to also have competency in a second of these areas.

Program requirements include a year of calculus and college biology, as well as experience in computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 total (South Carolina Resident), $22,566 total (Non-resident)

View Course Offerings – Clemson

George Mason University  – Fairfax, Virginia Doctor of Philosophy in Computational Sciences and Informatics – Emphasis in Data Science

George Mason’s PhD in computational sciences and informatics requires a minimum of 72 credit hours, though this can be reduced if a student has already completed a master’s. 48 credits are toward graduate coursework, and an additional 24 are for dissertation research.

Students choose an area of emphasis—either computer modeling and simulation or data science—and completed 18 credits of the coursework in this area. Students are expected to completed the coursework in 4-5 years.

Applicants to this program must have a bachelor’s degree in a natural science, mathematics, engineering, or computer science, and must have knowledge and experience with differential equations and computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $13,426 total (Virginia Resident), $35,377 total (Non-resident)

Harrisburg University of Science and Technology  – Harrisburg, Pennsylvania Doctor of Philosophy in Data Sciences

Harrisburg University’s PhD in data science is a 4-5 year program, the first 2 of which make up the Harrisburg master’s in analytics.

Beyond this, PhD candidates complete six milestones to obtain the degree, including 18 semester hours in doctoral-level courses, such as multivariate data analysis, graph theory, machine learning.

Following the completion of ANLY 760 Doctoral Research Seminar, students in the program complete their 12 hours of dissertation research bringing the total program hours to 36.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $14,940 total

Icahn School of Medicine at Mount Sinai  – New York, New York Genetics and Data Science, PhD

As part of the Biomedical Science PhD program, the Genetics and Data Science multidisciplinary training offers research opportunities that expand on genetic research and modern genomics. The training also integrates several disciplines of biomedical sciences with machine learning, network modeling, and big data analysis.

Students in the Genetics and Data Science program complete a predetermined course schedule with a total of 64 credits and 3 years of study.

Additional course requirements and electives include laboratory rotations, a thesis proposal exam and thesis defense, Computer Systems, Intro to Algorithms, Machine Learning for Biomedical Data Science, Translational Genomics, and Practical Analysis of a Personal Genome.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $31,303 total

Indiana University-Purdue University Indianapolis  – Indianapolis, Indiana PhD in Data Science PhD Minor in Applied Data Science

Doctoral candidates pursuing the PhD in data science at Indiana University-Purdue must display competency in research, data analytics, and at management and infrastructure to earn the degree.

The PhD is comprised of 24 credits of a data science core, 18 credits of methods courses, 18 credits of a specialization, written and oral qualifying exams, and 30 credits of dissertation research. All requirements must be completed within 7 years.

Applicants are generally expected to have a master’s in social science, health, data science, or computer science. 

Currently a majority of the PhD students at IUPUI are funded by faculty grants and two are funded by the federal government. None of the students are self funded.

IUPUI also offers a PhD Minor in Applied Data Science that is 12-18 credits. The minor is open to students enrolled at IUPUI or IU Bloomington in a doctoral program other than Data Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $9,228 per year (Indiana Resident), $25,368 per year (Non-resident)

Jackson State University – Jackson, Mississippi PhD Computational and Data-Enabled Science and Engineering

Jackson State University offers a PhD in computational and data-enabled science and engineering with 5 concentration areas: computational biology and bioinformatics, computational science and engineering, computational physical science, computation public health, and computational mathematics and social science.

Students complete 12 credits of common core courses, 12 credits in the specialization, 24 credits of electives, and 24 credits in dissertation research.

Students may complete the doctoral program in as little as 5 years and no more than 8 years.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,270 total

Kennesaw State University  – Kennesaw, Georgia PhD in Analytics and Data Science

Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

Prior to dissertation research, the comprehensive examination will cover material from the three areas of study: computer science, mathematics, and statistics.

Successful applicants will have a master’s degree in a computational field, calculus I and II, programming experience, modeling experience, and are encouraged to have a base SAS certification.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,328 total (Georgia Resident), $19,188 total (Non-resident)

New Jersey Institute of Technology  – Newark, New Jersey PhD in Business Data Science

Students may enter the PhD program in business data science at the New Jersey Institute of Technology with either a relevant bachelor’s or master’s degree. Students with bachelor’s degrees begin with 36 credits of advanced courses, and those with master’s take 18 credits before moving on to credits in dissertation research.

Core courses include business research methods, data mining and analysis, data management system design, statistical computing with SAS and R, and regression analysis.

Students take qualifying examinations at the end of years 1 and 2, and must defend their dissertations successfully by the end of year 6.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $21,932 total (New Jersey Resident), $32,426 total (Non-resident)

New York University  – New York, New York PhD in Data Science

Doctoral candidates in data science at New York University must complete 72 credit hours, pass a comprehensive and qualifying exam, and defend a dissertation with 10 years of entering the program.

Required courses include an introduction to data science, probability and statistics for data science, machine learning and computational statistics, big data, and inference and representation.

Applicants must have an undergraduate or master’s degree in fields such as mathematics, statistics, computer science, engineering, or other scientific disciplines. Experience with calculus, probability, statistics, and computer programming is also required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,332 per year

View Course Offering

Northcentral University  – San Diego, California PhD in Data Science-TIM

Northcentral University offers a PhD in technology and innovation management with a specialization in data science.

The program requires 60 credit hours, including 6-7 core courses, 3 in research, a PhD portfolio, and 4 dissertation courses.

The data science specialization requires 6 courses: data mining, knowledge management, quantitative methods for data analytics and business intelligence, data visualization, predicting the future, and big data integration.

Applicants must have a master’s already.

Delivery Method: Online GRE: Required 2022-2023 Tuition: $16,794 total

Stevens Institute of Technology – Hoboken, New Jersey Ph.D. in Data Science

Stevens Institute of Technology has developed a data science Ph.D. program geared to help graduates become innovators in the space.

The rigorous curriculum emphasizes mathematical and statistical modeling, machine learning, computational systems and data management.

The program is directed by Dr. Ted Stohr, a recognized thought leader in the information systems, operations and business process management arenas.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $39,408 per year

University at Buffalo – Buffalo, New York PhD Computational and Data-Enabled Science and Engineering

The curriculum for the University of Buffalo’s PhD in computational and data-enabled science and engineering centers around three areas: data science, applied mathematics and numerical methods, and high performance and data intensive computing. 9 credit course of courses must be completed in each of these three areas. Altogether, the program consists of 72 credit hours, and should be completed in 4-5 years. A master’s degree is required for admission; courses taken during the master’s may be able to count toward some of the core coursework requirements.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,310 per year (New York Resident), $23,100 per year (Non-resident)

University of Colorado Denver – Denver, Colorado PhD in Big Data Science and Engineering

The University of Colorado – Denver offers a unique program for those students who have already received admission to the computer science and information systems PhD program.

The Big Data Science and Engineering (BDSE) program is a PhD fellowship program that allows selected students to pursue research in the area of big data science and engineering. This new fellowship program was created to train more computer scientists in data science application fields such as health informatics, geosciences, precision and personalized medicine, business analytics, and smart cities and cybersecurity.

Students in the doctoral program must complete 30 credit hours of computer science classes beyond a master’s level, and 30 credit hours of dissertation research.

The BDSE fellowship requires students to have an advisor both in the core disciplines (either computer science or mathematics and statistics) as well as an advisor in the application discipline (medicine and public health, business, or geosciences).

In addition, the fellowship covers full stipend, tuition, and fees up to ~50k for BDSE fellows annually. Important eligibility requirements can be found here.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $55,260 total

University of Marylan d  – College Park, Maryland PhD in Information Studies

Data science is a potential research area for doctoral candidates in information studies at the University of Maryland – College Park. This includes big data, data analytics, and data mining.

Applicants for the PhD must have taken the following courses in undergraduate studies: programming languages, data structures, design and analysis of computer algorithms, calculus I and II, and linear algebra.

Students must complete 6 qualifying courses, 2 elective graduate courses, and at least 12 credit hours of dissertation research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $16,238 total (Maryland Resident), $35,388 total (Non-resident)

University of Massachusetts Boston  – Boston, Massachusetts PhD in Business Administration – Information Systems for Data Science Track

The University of Massachusetts – Boston offers a PhD in information systems for data science. As this is a business degree, students must complete coursework in their first two years with a focus on data for business; for example, taking courses such as business in context: markets, technologies, and societies.

Students must take and pass qualifying exams at the end of year 1, comprehensive exams at the end of year 2, and defend their theses at the end of year 4.

Those with a degree in statistics, economics, math, computer science, management sciences, information systems, and other related fields are especially encouraged, though a quantitative degree is not necessary.

Students accepted by the program are ordinarily offered full tuition credits and a stipend ($25,000 per year) to cover educational expenses and help defray living costs for up to three years of study.

During the first two years of coursework, they are assigned to a faculty member as a research assistant; for the third year students will be engaged in instructional activities. Funding for the fourth year is merit-based from a limited pool of program funds

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $18,894 total (in-state), $36,879 (out-of-state)

University of Nevada Reno – Reno, Nevada PhD in Statistics and Data Science

The University of Nevada – Reno’s doctoral program in statistics and data science is comprised of 72 credit hours to be completed over the course of 4-5 years. Coursework is all within the scope of statistics, with titles such as statistical theory, probability theory, linear models, multivariate analysis, statistical learning, statistical computing, time series analysis.

The completion of a Master’s degree in mathematics or statistics prior to enrollment in the doctoral program is strongly recommended, but not required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,814 total (in-state), $22,356 (out-of-state)

University of Southern California – Los Angles, California PhD in Data Sciences & Operations

USC Marshall School of Business offers a PhD in data sciences and operations to be completed in 5 years.

Students can choose either a track in operations management or in statistics. Both tracks require 4 courses in fall and spring of the first 2 years, as well as a research paper and courses during the summers. Year 3 is devoted to dissertation preparation and year 4 and/or 5 to dissertation defense.

A bachelor’s degree is necessary for application, but no field or further experience is required.

Students should complete 60 units of coursework. If the students are admitted with Advanced Standing (e.g., Master’s Degree in appropriate field), this requirement may be reduced to 40 credits.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $63,468 total

University of Tennessee-Knoxville  – Knoxville, Tennessee The Data Science and Engineering PhD

The data science and engineering PhD at the University of Tennessee – Knoxville requires 36 hours of coursework and 36 hours of dissertation research. For those entering with an MS degree, only 24 hours of course work is required.

The core curriculum includes work in statistics, machine learning, and scripting languages and is enhanced by 6 hours in courses that focus either on policy issues related to data, or technology entrepreneurship.

Students must also choose a knowledge specialization in one of these fields: health and biological sciences, advanced manufacturing, materials science, environmental and climate science, transportation science, national security, urban systems science, and advanced data science.

Applicants must have a bachelor’s or master’s degree in engineering or a scientific field. 

All students that are admitted will be supported by a research fellowship and tuition will be included.

Many students will perform research with scientists from Oak Ridge national lab, which is located about 30 minutes drive from campus.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,468 total (Tennessee Resident), $29,656 total (Non-resident)

University of Vermont – Burlington, Vermont Complex Systems and Data Science (CSDS), PhD

Through the College of Engineering and Mathematical Sciences, the Complex Systems and Data Science (CSDS) PhD program is pan-disciplinary and provides computational and theoretical training. Students may customize the program depending on their chosen area of focus.

Students in this program work in research groups across campus.

Core courses include Data Science, Principles of Complex Systems and Modeling Complex Systems. Elective courses include Machine Learning, Complex Networks, Evolutionary Computation, Human/Computer Interaction, and Data Mining.

The program requires at least 75 credits to graduate with approval by the student graduate studies committee.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $12,204 total (Vermont Resident), $30,960 total (Non-resident)

University of Washington Seattle Campus – Seattle, Washington PhD in Big Data and Data Science

The University of Washington’s PhD program in data science has 2 key goals: training of new data scientists and cyberinfrastructure development, i.e., development of open-source tools and services that scientists around the world can use for big data analysis.

Students must take core courses in data management, machine learning, data visualization, and statistics.

Students are also required to complete at least one internship that covers practical work in big data.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $17,004 per year (Washington resident), $30,477 (non-resident)

University of Wisconsin-Madison – Madison, Wisconsin PhD in Biomedical Data Science

The PhD program in Biomedical Data Science offered by the Department of Biostatistics and Medical Informatics at UW-Madison is unique, in blending the best of statistics and computer science, biostatistics and biomedical informatics. 

Students complete three year-long course sequences in biostatistics theory and methods, computer science/informatics, and a specialized sequence to fit their interests.

Students also complete three research rotations within their first two years in the program, to both expand their breadth of knowledge and assist in identifying a research advisor.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,728 total (in-state), $24,054 total (out-of-state)

Vanderbilt University – Nashville, Tennessee Data Science Track of the BMI PhD Program

The PhD in biomedical informatics at Vanderbilt has the option of a data science track.

Students complete courses in the areas of biomedical informatics (3 courses), computer science (4 courses), statistical methods (4 courses), and biomedical science (2 courses). Students are expected to complete core courses and defend their dissertations within 5 years of beginning the program.

Applicants must have a bachelor’s degree in computer science, engineering, biology, biochemistry, nursing, mathematics, statistics, physics, information management, or some other health-related field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $53,160 per year

Washington University in St. Louis – St. Louis, Missouri Doctorate in Computational & Data Sciences

Washington University now offers an interdisciplinary Ph.D. in Computational & Data Sciences where students can choose from one of four tracks (Computational Methodologies, Political Science, Psychological & Brain Sciences, or Social Work & Public Health).

Students are fully funded and will receive a stipend for at least five years contingent on making sufficient progress in the program.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $59,420 total

Worcester Polytechnic Institute – Worcester, Massachusetts PhD in Data Science

The PhD in data science at Worcester Polytechnic Institute focuses on 5 areas: integrative data science, business intelligence and case studies, data access and management, data analytics and mining, and mathematical analysis.

Students first complete a master’s in data science, and then complete 60 credit hours beyond the master’s, including 30 credit hours of research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $28,980 per year

Yale University – New Haven, Connecticut PhD Program – Department of Stats and Data Science

The PhD in statistics and data science at Yale University offers broad training in the areas of statistical theory, probability theory, stochastic processes, asymptotics, information theory, machine learning, data analysis, statistical computing, and graphical methods. Students complete 12 courses in the first year in these topics.

Students are required to teach one course each semester of their third and fourth years.

Most students complete and defend their dissertations in their fifth year.

Applicants should have an educational background in statistics, with an undergraduate major in statistics, mathematics, computer science, or similar field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $46,900 total

dissertation in data science

  • Related Programs

dissertation in data science

  • Thesis Option

Data Science master’s students can choose to satisfy the research experience requirement by selecting the thesis option. Students will spend the majority of their second year working on a substantial data science project that culminates in the submission and oral defense of a master’s thesis. While all thesis projects must be related to data science, students are given leeway in finding a project in a domain of study that fits with their background and interest.

All students choosing the thesis option must find a research advisor and submit a thesis proposal by mid-April of their first year of study. Thesis proposals will be evaluated by the Data Science faculty committee and only those students whose proposals are accepted will be allowed to continue with the thesis option.  

To account for the time spent on thesis research, students choosing the thesis option are able substitute three required courses (the Capstone and two "free" elective courses (as defined in the final bullet point on the degree requirement page )) with AC 302.

In Applied Computation

  • How to Apply
  • Learning Outcomes
  • Master of Science Degree Requirements
  • Master of Engineering Degree Requirements
  • CSE courses
  • Degree Requirements
  • Data Science courses
  • Data Science FAQ
  • Secondary Field Requirements
  • Advising and Other Activities
  • AB/SM Information
  • Alumni Stories
  • Financing the Degree
  • Student FAQ

Warning icon

Thesis/Capstone for Master's in Data Science | Northwestern SPS - Northwestern School of Professional Studies

  • Post-baccalaureate
  • Undergraduate
  • Professional Development
  • Pre-College
  • Center for Public Safety
  • Get Information

SPS Logo

Data Science

Capstone and thesis overview.

Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can be highlighted on their resumes. Students should consider the factors below when deciding whether a capstone or thesis may be more appropriate to pursue.

A capstone is a practical or real-world project that can emphasize preparation for professional practice. A capstone is more appropriate if:

  • you don't necessarily need or want the experience of the research process or writing a big publication
  • you want more input on your project, from fellow students and instructors
  • you want more structure to your project, including assignment deadlines and due dates
  • you want to complete the project or graduate in a timely manner

A student can enroll in MSDS 498 Capstone in any term. However, capstone specialization courses can provide a unique student experience and may be offered only twice a year. 

A thesis is an academic-focused research project with broader applicability. A thesis is more appropriate if:

  • you want to get a PhD or other advanced degree and want the experience of the research process and writing for publication
  • you want to work individually with a specific faculty member who serves as your thesis adviser
  • you are more self-directed, are good at managing your own projects with very little supervision, and have a clear direction for your work
  • you have a project that requires more time to pursue

Students can enroll in MSDS 590 Thesis as long as there is an approved thesis project proposal, identified thesis adviser, and all other required documentation at least two weeks before the start of any term.

From Faculty Director, Thomas W. Miller, PhD

Tom Miller

Capstone projects and thesis research give students a chance to study topics of special interest to them. Students can highlight analytical skills developed in the program. Work on capstone and thesis research projects often leads to publications that students can highlight on their resumes.”

A thesis is an individual research project that usually takes two to four terms to complete. Capstone course sections, on the other hand, represent a one-term commitment.

Students need to evaluate their options prior to choosing a capstone course section because capstones vary widely from one instructor to the next. There are both general and specialization-focused capstone sections. Some capstone sections offer in individual research projects, others offer team research projects, and a few give students a choice of individual or team projects.

Students should refer to the SPS Graduate Student Handbook for more information regarding registration for either MSDS 590 Thesis or MSDS 498 Capstone.

Capstone Experience

If students wish to engage with an outside organization to work on a project for capstone, they can refer to this checklist and lessons learned for some helpful tips.

Capstone Checklist

  • Start early — set aside a minimum of one to two months prior to the capstone quarter to determine the industry and modeling interests.
  • Networking — pitch your idea to potential organizations for projects and focus on the business benefits you can provide.
  • Permission request — make sure your final project can be shared with others in the course and the information can be made public.
  • Engagement — engage with the capstone professor prior to and immediately after getting the dataset to ensure appropriate scope for the 10 weeks.
  • Teambuilding — recruit team members who have similar interests for the type of project during the first week of the course.

Capstone Lesson Learned

  • Access to company data can take longer than expected; not having this access before or at the start of the term can severely delay the progress
  • Project timeline should align with coursework timeline as closely as possible
  • One point of contact (POC) for business facing to ensure streamlined messages and more effective time management with the organization
  • Expectation management on both sides: (business) this is pro-bono (students) this does not guarantee internship or job opportunities
  • Data security/masking not executed in time can risk the opportunity completely

Publication of Work

Northwestern University Libraries offers an option for students to publish their master’s thesis or capstone in Arch, Northwestern’s open access research and data repository.

Benefits for publishing your thesis:

  • Your work will be indexed by search engines and discoverable by researchers around the world, extending your work’s impact beyond Northwestern
  • Your work will be assigned a Digital Object Identifier (DOI) to ensure perpetual online access and to facilitate scholarly citation
  • Your work will help accelerate discovery and increase knowledge in your subject domain by adding to the global corpus of public scholarly information

Get started:

  • Visit Arch online
  • Log in with your NetID
  • Describe your thesis: title, author, date, keywords, rights, license, subject, etc.
  • Upload your thesis or capstone PDF and any related supplemental files (data, code, images, presentations, documentation, etc.)
  • Select a visibility: Public, Northwestern-only, Embargo (i.e. delayed release)
  • Save your work to the repository

Your thesis manuscript or capstone report will then be published on the MSDS page. You can view other published work here .

For questions or support in publishing your thesis or capstone, please contact [email protected] .

  • Current Students
  • Online Only Students
  • Faculty & Staff
  • Parents & Family
  • Alumni & Friends
  • Community & Business
  • Student Life
  • College of Computing and Software Engineering
  • Executive Advisory Board
  • CCSE Job Openings
  • Academic Advising
  • Student Resources
  • International Student Resources
  • Faculty Resources
  • School of Data Science and Analytics
  • Department of Computer Science
  • Department of Information Technology
  • Department of Software Engineering and Game Development
  • Undergraduate
  • Why Partner?
  • Ways to Engage
  • Friends & Corporate Affiliates
  • K-12 outreach
  • Employer Networking

PhD in Data Science and Analytics

PhD in Data Science and Analytics

Degrees & Programs

  • Doctoral Degree in Data Science and Analytics
  • Certificates

We launched the first formal PhD program in Data Science in 2015.  Our program sits at the intersection ofcomputer science, statistics, mathematics, and business.  Our students engage in relevant research with faculty from across our eleven colleges.  As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community.   

Herman Ray , Director, Ph.D. in Data Science and Analytics

Sherry Ni

About the Doctoral Degree in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Because this degree is a Ph.D., it creates flexibility. Graduates can either pursue a position in the private or public sector as a "practicing" Data Scientist – where continued demand is expected to greatly outpace the supply - or pursue a position within academia, where they would be uniquely qualified to teach these skills to the next generation.

Information Sessions for Fall 2025 Admission

To be announced

Data Science and Analytics PhD Curriculum

Stage One: Pre-Program Requirements

  • Successful applicants will have completed a masters degree in a computational field (e.g., engineering, computer science, statistics, economics, finance, etc.)
  • Applicants are expected to have deep proficiency in at least one analytical programming language (e.g., SAS, R, Python). SQL and Java are helpful but not required.
  • Interested applicants who have earned an undergraduate degree are encouraged to apply to the Ph.D. Program with the embedded MS in Computer Science or with the MS in Applied Statistics.

Stage Two: Coursework

The Ph.D. in Data Science and Analytics requires 78 total credit hours spread over four years of study. Example Program of Study: 

  • CS 8265  - Big Data Analytics
  • CS 8267  - Machine Learning
  • MATH 8010  - Theory of Linear Models (optional)
  • MATH 8020  - Graph Theory
  • MATH 8030  - Applied Discrete and Combinatorial Mathematics 
  • STAT 8240  - Data Mining I
  • STAT 8250  - Data Mining II
  • Comprehensive Exam 
  • 21 credit hours of electives/concentration

Students take up to 9 credit hours of 6000- or 7000-level courses in DS, STAT, or CS with permission of the program director. Students take any 8000- or 9000-level course in DS, STAT, MATH, CS or IT, or the HHS courses in the mHealth concentration.

  • at least 15 credit hours in CS courses at 8000 or 9000 levels (except CS 9900)
  • at least 15 credit hours in STAT courses at 8000 or 9000 levels
  • HHS 8000 - Introduction to mHealth
  • HHS 8010 - Ethical Issues in mHealth, Healthcare and Human Subjects Research
  • STAT 8235 - Advanced Longitudinal Data Analysis
  • HHS 8050 - Advanced Research in mHealth
  • HHS 8020 - mHealth Applications or HHS 8030 - Advanced Special Topics in mHealth
  • Develop Dissertation Research Proposal 
  • DS 9700 Doctoral Internship/Research Lab
  • DS 9900 Dissertation
  • Dissertation Proposal Defense
  • DS 9900 DissertationFinal Dissertation Defense

Stage Three: Project Engagement and Research/Dissertation

Relevant, interdisciplinary research forms the foundation of the Ph.D. in Data Science and Analytics. While students are encouraged to engage in research from their first semester, the last two years of the program are structured to help students transition into becoming independent, lead researchers. In this last stage of the program, students will work with research faculty, including their advisor, in one of our data science research labs.

Program Student Learning Outcomes

At the end of the program, students will be able to:

  • Demonstrate their understanding of the research process
  • Demonstrate mastery of core concepts relevant to three key areas in mathematics, statistics and computer science
  • Develop themselves as professionals prepared for work as a doctoral-educated individual beyond graduation

Admission Requirements and Application

Frequently Asked Questions (FAQ)

How long will the program take?

How much does the program cost?

Who would be successful in the program?

Where do these graduates work after graduation?

What are the publication/research requirements?

What did Science Doctoral Students Study?

  • Applied Computer Science
  • Applied Economics and Statistics
  • Applied Statistics
  • Applied Mathematics
  • Bioinformatics
  • Business Analytics
  • Chemical Biology
  • Computer Science
  • Data Science
  • Forecasting & Strategic Management
  • Integrative Biology
  • Public Admin in Economic Policy Mgmt
  • Mathematics
  • Mechanical Engineering
  • Software Engineering

What is the Project Engagement requirement?

Can I pursue the program part- time while I am working full-time?

Can I live on campus?

Are the courses online?

Do I have to have a masters degree to apply?

Where did Data Doctoral Students Study?

  • Ajou University, South Korea
  • Albert-Ludwigs University of Freiburg
  • Auburn University
  • Bowling Green State University
  • Clemson University
  • Columbia University
  • Columbus State University
  • Florida State University
  • Georgia Southern University
  • Georgia State
  • Georgia Tech
  • Iran University of Science and Technology
  • Kennesaw State University
  • Marshall University
  • Michigan State University
  • Murray State University
  • North Carolina State University
  • St. Petersburg State University, Russia
  • University of KwaZulu-Natal, South Africa
  • University of Michigan
  • University of North Carolina
  • University of Toledo

Ph.D. in Data Science and Analytics Student Cohorts

Royce Alfred

Royce Alfred

Bachelor's Degree:   Psychology, Kennesaw State University

Master's Degree:   Applied Statistics and Analytics, Kennesaw State University

Work History:   4 years as a Data Scientist at Equifax

Professional Objective:   Work as a research data scientist in the corporate environment

Venkata Abhiram Chitty

Venkata Abhiram Chitty

Bachelor's Degree:   Mathematics, Statistics and Computer Science, Osmania University, Telangana, India

Master's Degree:   Data Science, VIT-AP University, Amaravati, Andhra Pradesh, India

Professional Objective:   To apply my Data Science skills in public health domain and help the society

Caleb Greski

Caleb Greski

Bachelor's Degree: 

Master's Degree: 

Work History: 

Courses Taught: 

Publications: 

Professional Objective: 

Moukthika Kadaparthi

Moukthika Kadaparthi

Bachelor's Degree:   Electrical and Electronics Engineering, SASTRA Deemed University

Master's Degree:   Computers and Information Science, Cleveland State University

Work History:  

  • Business Intelligence Analyst, Philips Healthcare, Georgia
  • Graduate Research Assistant, Cleveland State University, Ohio 

Professional Objective:   My objective is to enter academia with the aim of sharing the practical applications of data science in diverse domains and its potential positive impacts. With my unique blend of academic rigor and industry experience, I am driven to analyze complex data sets using cutting-edge data science techniques, to provide actionable insights and support data-driven decision-making.

Qiaomu Li

Bachelor's Degree:   Civil Engineering, Huazhong University of Science and Technology, China

Master's Degree:   Business Analytics, Syracuse University

  • Credit Modeling Analyst, Agricultural Development Bank of China
  • Research Assistant, Changjiang Securities
  • Graduate Assistant, Syracuse University

Courses Taught:  Calculus I, Marketing Analytics, Data Mining

Awards:   Merit-Based Scholarship, Syracuse University

Professional Objective:   To secure a challenging position in a reputable organization to expand myself within the field of Artificial Intelligence.

Kausar Perveen

Kausar Perveen

Bachelor's Degree:   Bachelor in Engineering Software Engineering, National University of Sciences and Technology, Pakistan

Master's Degree:   Masters in Data Science, Illinois Institute of Technology, Chicago

  • Fullstack Developer at ItRunsInMyFamily, Charleston, South Carolina
  • Software Engineer II , Xgrid Pakistan
  • Senior Research Coordinator, Aga Khan University Pakistan
  • Machine Learning Engineer, Agoda Thailand

Publications:  National cervical cancer burden estimation through systematic review and analysis of publicly available data in Pakistan 

Service and Awards:

  • Fulbright Scholarship award for Master’s degree in Data Science
  • Aga Khan Education Service Pakistan, merit cumulative need based scholarship for Bachelors in Software Engineering 

Professional Objective:  My main motivation behind getting a degree in Data Science is to receive and perform qualified research experience in Data Science and public health

Promi Roy

Bachelor's Degree:   Statistics, University of Dhaka, Dhaka, Bangladesh

Master's Degree:   Mathematics (Statistics Concentration), University of Toledo, Ohio

  • Analytics Engineer Intern, Cooper Smith, Toledo, Ohio
  • Business AnalystAkij Food and Beverage Limited, Dhaka, Bangladesh

Courses Taught:   Introduction to Statistics

Professional Objective:   I am interested to work as a data scientist in the industry

Ayomide Isaac Afolabi

Ayomide Isaac Afolabi

Bachelor's Degree:  Chemical Engineering, Ladoke Akintola University of Technology 

Master's Degree:  Data Science, Auburn University 

Work History:   Graduate Research Assistant, Auburn University 

Courses Taught:   Python Programming 

Publications:   Larson EA, Afolabi A, Zheng J, Ojeda AS. Sterols and sterol ratios to trace fecal contamination: pitfalls and potential solutions. Environ Sci Pollut Res Int. 2022 Jul;29(35):53395-53402.  doi: 10.1007/s11356-022-19611-2 . Epub 2022 Mar 14. PMID: 35287190

Professional Objective:  To work as a research data scientist in the industry

Dinesh Chowdary Attota

Dinesh Chowdary Attota

Bachelor's Degree:   Computer Science, Jawaharlal Nehru Technological University Kakinada (JNTUK), India

Master's Degree:   Computer Science, Kennesaw State University

Work History:   Associate Consultant, SL Techknow Solutions India Pvt Ltd, India  2018 - 2020

Publications:  

  • An Ensemble Multi-View Federated Learning Intrusion Detection for IoT
  • A Conversational Recommender System for Exploring Pedagogical Design Patterns
  • An Ensembled Method For Diabetic Retinopathy Classification using Transfer Learning  

Professional Objective:   I'd like to be a faculty member at a university so that I can continue to do research.

Nzubechukwu Ohalete

Nzubechukwu Ohalete

Bachelor's Degree:   Mathematics,University of Nigeria, Nsukka

Master's Degree:   Applied Statistics, Bowling Green State University

Work History:   Graduate Assistant/Data Analyst, Federal University of Technology, Owerri - Mathematics Department

Courses Taught:  Elementary Mathematics, Mathematical Methods

Awards:   James A. Sullivan Outstanding Graduate Student Award, Applied Statistics and Operations Research Department, April 2022

Professional Objective:   To use data science techniques to solve problems which makes our lives better and also makes our world a better place

Ryan Parker

Ryan Parker

Bachelor's Degree:  Microbiology, University of Tennessee - Knoxville

Master's Degree:   Integrative Biology, Kennesaw State University

Work History:  Instructor of Biology, Kennesaw State University

Courses Taught:   Nursing Microbiology Lectures and Labs, Introductory Biology Labs, Biotechnology Lectures and Labs

  • Parker RA, Gabriel KT, Graham K, Cornelison CT. Validation of methylene blue viability staining with the emerging pathogen Candida auris. J Microbiol Methods. 2020 Feb;169:105829.   doi: 10.1016/j.mimet.2019.105829 . Epub 2019 Dec 27. PMID: 31884053.
  • Parker RA, Gabriel KT, Graham KD, Butts BK, Cornelison CT. Antifungal Activity of Select Essential Oils against Candida auris and Their Interactions with Antifungal Drugs. Pathogens. 2022 Jul 22;11(8):821.   doi: 10.3390/pathogens11080821 . PMID: 35894044; PMCID: PMC9331469.

Awards:   Best Graduate Poster: Symposium for Student Scholars hosted by Kennesaw State University (Fall 2018) for Poster: "Antifungal Activity of Select Essential Oils and Synergism with Antifungal Drugs against Candida auris"

Professional Objective : To apply Data Science techniques to large scientific datasets, such as genomic and astronomical data, and to help bridge the gap between disparate fields by working in an interdisciplinary space to offer integrative and data-driven solutions to the increasingly complex problems presented to the traditional Sciences.

Askhat Yktybaev

Askhat Yktybaev

Bachelor's Degree:   Forecasting and Strategic Management, Saint-Petersburg State University of Economics and Finance, Russia

Master's Degree:   Forecasting and Strategic Management, Saint-Petersburg State University of Economics and Finance, Russia; Public Administration in Economic Policy Management, School of International and Public Affairs, Columbia University

Work History:

  • from Data Analyst to Head of Research Unit, Central Bank of Kyrgyz Republic
  • Sr. Data Scientist in OJSC, Aiyl Bank, Kyrgyzstan
  • Consultant, The World Bank, Washington D.C.

Courses Taught:   Financial Programing in the Central Bank, Monetary Policy Transmission Mechanism

Service and Awards:   Winner of the Joint Japan/World Bank Graduate Scholarship Program, National Bank Silver Medal for Best Forecast

Professional Objective:   I want to found a successful Fintech startup one day.

Sanad Biswas

Sanad Biswas

Bachelor's Degree:   Statistics, Biostatistics and Informatics, University of Dhaka, Bangladesh

Master's Degree:   Statistics, University of Toledo, OH

  • Research Assistant: US Army Research Lab, Kennesaw State University
  • Consultant, Statistical Consulting Service, University of Toledo
  • Graduate Teaching Assistant, University of Toledo

Courses Taught:   Calculus and Business Calculus, Facilitated students’ study of Statistics courses at the University of Toledo.

Professional Objective:   To work as a researcher in the industry or as a faculty. I am primarily interested in the application of machine learning in different fields.

Mallika Boyapati

Mallika Boyapati

Bachelor's Degree:  Electronics and Computer Engineering, K L University, India

Master's Degree:  Applied Computer Science, Columbus State University

  • T-Mobile, Seattle, WA, USA: Sr. Data analyst, 2018- 2021
  • UITS, Columbus State University, Columbus, GA, USA: Data Analyst -Graduate assistant, 2016-2018
  • Menlo Technologies, India: Jr. Data Analyst, Intern, 2014- 2016

Courses Taught:   DATA 4310 - Statistical Data Mining

Publications:

  • Anti-Phishing Approaches in the Era of the Internet of Things. In: Pathan, AS.K. (eds) Towards a Wireless Connected World: Achievements and New Technologies. Springer, Cham -   https://doi.org/10.1007/978-3-031-04321-5_3
  • An empirical analysis of image augmentation against model inversion attack in federated learning -   https://doi.org/10.1007/s10586-022-03596-1
  • M. Boyapati and R. Aygun, "Phishing Web Page Detection using Web Scraping," SoutheastCon 2023, Orlando, FL, USA, 2023, pp. 167-174, doi: 10.1109/SoutheastCon51012.2023.10115148.
  • M. Boyapati and R. Aygun, "Default Prediction on Commercial Credit Big Data Using Graph-based Variable Clustering," 2023 IEEE 17th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2023, pp. 139-142, doi: 10.1109/ICSC56153.2023.00029.
  • Boyapati, M., Aygun, R. (2023) Explainable Machine Learning for Default Prediction on Commercial Credit Big Data Using Graph-based Variable Clustering. In Encyclopedia with Semantic Computing and Robotic Intelligence VOL. 0 https://doi.org/10.1142/S2529737623500119
  • Winners of Dataiku March Madness Bracket-thon, 2021 in predicting the NBA bracket
  • Winners of 2021 Analytics Day Ph.D. level research poster presentation 

Professional Objective:   To leverage strong analytical and technical abilities to research and develop effective data models, visualize data, and uncover insights that makes an impact in field of data science

Nina Grundlingh

Nina Grundlingh

Bachelor's Degree:   Applied Mathematics and Statistics, University of KwaZulu-Natal, South Africa

Master's Degree:   Statistics, University of KwaZulu-Natal, South Africa

Courses Taught:   Introduction to Statistics, University of KwaZulu-Natal

  • Grundlingh, N., Zewotir, T., Roberts, D. & Manda, S. Modelling diabetes in South Africa. The 61st conference of the South African Statistical Association, 27-29 November 2019, Nelson Mandela University, South Africa.
  • Grundlingh, N., Zewotir, T., Roberts, D. & Manda, S. Modelling diabetes in the South African population. College of Agriculture, Engineering and Science Postgraduate Research & Innovation Symposium 2019, 17 October 2019, University of KwaZulu-Natal, Westville, South Africa (the award for best MSc presentation was also received for this).
  • Grundlingh, N., Zewotir, T., Roberts, D. & Manda, S. Modelling risk factors of diabetes and pre-diabetes in South Africa. IBS SUSAN-SSACAB 2019 Conference, 8-11 September 2019, Cape Town, South Africa.
  • University of KwaZulu-Natal Postgraduate Research & Innovation Symposium 2019 – Best Masters oral presentation
  • South African Statistical Association Honours Project Competition 2018/2019 – 2nd place and special prize for best use of SAS

Professional Objective:   To work in a teaching position – sharing how data science can be applied to different fields and the positive impact it could have. I would like to use my theological background and passion to bring insight, clarity, and wisdom to data science problems. 

Namazbai Ishmakhametov

Namazbai Ishmakhametov

Bachelor's Degree:   Specialist in Mathematical Methods in Economics, Kyrgyz-Russian Slavic University

Master's Degree:   Analytics, Institute for Advanced Analytics at North Carolina State University

  • Expert at the Centre for Economic Research, National bank of the Kyrgyz Republic
  • Consultant in World Bank project dedicated to strengthening the regulatory practices in Kyrgyz Republic
  • Consultant at Deloitte Consulting LLP, Science Based Services group, Analytics & Cognitive offering
  • Macroeconomic modeling expert in the Economic Department, National bank of the Kyrgyz Republic

Courses Taught:   Introductory statistics and econometrics (cross-sections, times series and panels) lecturer at Ata-Turk Alatoo International University, Kyrgyzstan

  • Ishmakhametov Namazbai, Abdygulov Tolkunbek, Jenish Nurbek. 2020. “ Impact of 2014-2015 shocks on economic behavior of the households in the Kyrgyz Republic ". Working Paper of the National Bank of the Kyrgyz Republic
  • Sherrill W. Hayes, Jennifer L. Priestley, Namazbai Ishmakhametov, Herman E. Ray. 2020. “ I’m not Working from Home, I’m Living at Work ”: Perceived Stress and Work-Related Burnout before and during COVID-19”. PsyArxiv Preprints
  • Ishmakhametov Namazbai, Arykov Ruslan. 2016. “ Credit Risk Model on the Example of the Commercial Banks of the Kyrgyz Republic ”. Working Paper of the National Bank of the Kyrgyz Republic
  • Namazbai Ishmakhametov, Anvar Muratkhanov.2015. “Modeling strategy of the Bank of the Kyrgyz Republic”. National bank of Poland – Swiss National bank joint seminar. Zurich, Switzerland

Professional Objective:   To apply my quantitative skills in the field of biotech either in corporate or government sector

Symon Kimitei

Symon Kimitei

Bachelor's Degrees:   Mathematics, Kennesaw State University, and Computer Science,  Kennesaw State University

Master's Degree:   Mathematics (Scientific Computing Concentration), Georgia State University 

Work History:   Senior Lecturer and Math Department Coordinator of Supplemental Instruction, Kennesaw State University

Courses Taught:   Calculus 1, Precalculus, Applied Calculus & College Algebra 

  • Haskin, S., Kimitei, S., Chowdhury, M., Rahman, F., Longitudinal Predictive Curves of Health-Risk Factors for American Adolescent Girls. Journal of Adolescent Health.  JAH-2021-00601R1
  • Symon K Kimitei,   Algorithms for Toeplitz Matrices with Applications to Image Deblurring . 2008. Georgia State University, Masters thesis. ScholarWorks 

Poster Presentations:

  • Kimitei, Symon & Sammie Haskin. "Nadaraya-Watson Kernel Regression Longitudinal Analysis of Healthcare Risk Factors of African American and Caucasian American Girls." Kennesaw State University R Day Presentation.  11 Nov. 2019. Poster presentation.
  • Kimitei, Symon. " Social Network Analysis in Supreme Court Case Rulings by Precedence Using SAS Optgraph/Python." 23rd Annual Symposium of Scholars. Kennesaw State University.  19 April. 2018. Poster presentation.

Professional Objective:   As a Ph.D. student in Analytics & Data Science, I hope to gain skills in the program that will propel me into a Data Scientist / Machine Learning Engineer with a specialization in the design and implementation of deep learning & machine learning algorithms.

Jitendra Sai Kota

Jitendra Sai Kota

Bachelor's Degree:   Computer Science & Engineering, Amrita Vishwa Vidyapeetham, India

Master's Degree:   Computer Science, Florida State University

Work History:   Teaching Assistant Professor in Computer Science at an Engineering College in India

Courses Taught:   Problem Solving & Program Design through C, Artificial Intelligence, Data Mining

Publications:  Kota, Jitendra Sai, Vayelapelli, Mamatha. 2020. "Predicting the Outcome of a T20 Cricket Game Based on the Players' Abilities to Perform Under Pressure". IEIE Transactions on Smart Processing and Computing 9(3):230-237.   DOI: 10.5573/IEIESPC.2020.9.3.230

Professional Objective:   to work in Data Science in a Corporate Environment

ResearchGate

Catrice Taylor

Catrice Taylor

Bachelor's Degree:   Economics, Clemson University 

Master's Degrees:  Applied Economics and Statistics, Clemson University, and Applied Statistics, Kennesaw State University 

Professional Objective:   To work as an industry data scientist in a corporate environment 

Sahar Yarmohammadtoosky

Sahar Yarmohammadtoosky

Bachelor's Degree:   Applied Mathematics, Sheikh Bahaei University, Isfahan, Iran 

Master's Degree:   Applied Mathematics, Iran University of Science & Technology, Tehran, Iran

Courses Taught:  Numerical Analysis and Linear Algebra, Iran University of Science & Technology

Publications:   Noah, G., Sahar, Y., Anthony P. & Hung, C.C. "ISODS: An ISODATA-Based Initial Centroid Algorithm". Accepted to: 10th International Conference on Information, March 6 - 8, 2021, Hosei University, Tokyo, Japan

Professional Objective:   My goal is to become a competent Data Science specialist capable of using my skills to bring meaning to data, getting a faculty position at a university

Martin Brown

Martin Brown

Graduation Date: Spring 2024

Dissertation: A Holistic and Collaborative Behavioral Health Detection Framework Using Sensitive Police Narratives

Dissertation Advisors: Dr. Dominic Thomas and Dr. Md Abdullah Al Hafiz Khan

 Inchan Hwang

Inchan Hwang

Bachelor’s Degree: Computer Science, Georgia Southwestern State University

Master’s Degree: Software Engineering, Ajou University, South Korea

Courses Tutored: Precalculus, College Algebra, Calculus I at Georgia Southwestern State University

Tutoring College Algebra, Calculus I and II at Academic Skills Center, Georgia Southwestern State University Research Assistant at Intelligence of HyperConnected Systems Lab of Ajou University Fullstack web developer, windows system programmer in the cybersecurity industry Professional Objective: To work in big data analytics, and research and development of machine learning in engineering, and security

Duleep Prasanna Rathgamage Don

Duleep Prasanna Rathgamage Don

Bachelor's degree:   Physics and Mathematics, The Open University of Sri Lanka

Master's degree:   Mathematics, Georgia Southern University

  • Graduate Teaching Assistant, Georgia Southern University, 2016 - 2018
  • Graduate Teaching Assistant, University of Wyoming, 2019 - 2020

Courses Taught:   Trigonometry, and Calculus I & II

Publications/Presentations:

  • Don, R. D. and Iacob, I. E., ‘DCSVM: Fast Multi-class Classification using Support Vector Machines’,   International Journal of Machine Learning and Cybernetics .
  • Rathgamage Don, D., Iacob, E., ‘Divide and Conquer Support Vector Machine for Multiclass Classification’, Research Symposium (2018), Georgia Southern University.
  • Rathgamage Don, D., Iacob, E., ‘Multiclass Classification using Support Vector Machines’, MAA Southeastern Section Meeting (2018), Clemson University.

Professional Objective:   To work in big data analytics, and research and development of machine learning in engineering, and medicine

Linglin Zhang

Linglin Zhang

Bachelor’s Degree:   Biological Sciences, Hubei University, China

Master’s Degree:   Chemical Biology, University of Michigan and Bioinformatics, Georgia Institute of Technology

Selected Publications:   Rebecca Shen, Zhi Li, Linglin Zhang, Yingqi Hua, Min Mao, Zhicong Li, Zhengdong Cai, Yunping Qiu, Jonathan Gryak, Kayvan Najarian. (2018). Osteosarcoma Patients Classification Using Plain X-Rays and Metabolomic Data. 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 690-693, 2018.

Professional Objective:  To become a researcher in industry or academia. My background in Biology and Bioinformatics could provide me strong theoretical support on a research role in the health industry. The experience of doing an internship at Equifax equipped me of certain knowledge on business cases. 

Yihong Zhang

Yihong Zhang

Bachelor’s Degree:   Psychology Mathematics Interdisciplinary, Chatham University

Master’s Degree:   Mathematics and Statistics Allied with Computer Science, Georgia State University

  • Research Assistant - Collaborated with biomedical department to analyze and visualize microarray gene expression data, Facilitated in data pre-processing and machine learning modeling of clinical liver cirrhosis image data, Assisted in feature engineering of image analysis in deep learning for pathology diagnosis with Mayo Clinic’s pilot project.
  • Graduate Lab Assistant - Tutored students with statistics and math subjects.

Professional Objective:   Make better use of data in healthcare and bioinformatic industry as a data scientist.

2019 - 2020

Trent Geisler

Trent Geisler

Graduation Date:   Summer 2022

Dissertation:   Novel Instance-Level Weighted Loss Function for Imbalanced Learning

Dissertation Advisor:   Dr. Herman Ray

Current Position:   Assistant Professor, Department of Systems Engineering, United States Military Academy West Point

Srivatsa Mallapragada

Srivatsa Mallapragada

Dissertation: Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap

Dissertation Advisor: Dr. Ying Xie

Current Position: Data Scientist, Rue Gilt Groupe (RGG)

Sudhashree Sayenju

Sudhashree Sayenju

Graduation Date:   Spring 2023

Dissertation:   Quantification and Mitigation of Various Types of Biases in Deep NLP Models

Dissertation Advisor:   Dr. Ramazan Aygun

Current Position: Lecturer, Data Science and Analytics, Kennesaw State University

Christina Stradwick

Christina Stradwick

Bachelor’s Degree:  Music Performance and Mathematics, Marshall University

Master’s Degree:  Mathematics with Emphasis in Statistics, Marshall University

Courses Taught:  Prep for College Algebra at Marshall University

Selected Presentations:

  • Stradwick, C. Exploring the Variance of the Sample Variance. Spring Meeting of the Mathematical Association of America Ohio Section, University of Akron, 2019.
  • Stradwick, C., Vaughn, L., Hanan Khan, A. Data Modeling on Insurance Beneficiary Dataset. College of Science Research Expo 2018, Marshall University, 2018. Poster Presentation.
  • Stradwick, C. Disease modeling on networks. The 13th Annual UNCG Regional Mathematics and Statistics Conference, University of North Carolina at Greensboro, 2017. Poster Presentation.

Professional Objectives:  To work as a researcher in industry or in a laboratory setting. I would like to use my background in mathematics and statistics to develop novel solutions that address limitations in current data science techniques and to apply known data science methods to solve real-world problems.

2018 - 2019

Md Shafiul Alam

Md Shafiul Alam

Graduation Date:   Fall 2022

Dissertation:   Appley:   App roximate Shap ley   Values for Model Explainability in Linear Time

Dissertation Advisor:   Dr. Ying Xie

Current Position:   AI Framework Engineer, Intel Corporation

Jonathan Boardman

Jonathan Boardman

Dissertation:   Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics

Current Position:   Data Scientist, Equifax

Tejaswini Mallavarapu

Tejaswini Mallavarapu

Bachelor’s Degree:   Pharmacy, Acharya Nagarjuna University, India

Master’s Degree:   Computer Science, Kennesaw State University

  • Graduate Research Assistant, Kennesaw State University, 2017-present
  • Research Analyst, Divis Laboratories, 2013-2014

Selected Publications:

  • T. Mallavarapu, Y. Kim, J.H. Oh, and M. Kang, "R-PathCluster: Identifying Cancer Subtype of Glioblastoma Multiforme Using Pathway-Based Restricted Boltzmann Machine," Proceedings of IEEE International Conference on Bioinformatics & Biomedicine (IEEE BIBM 2017), International Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics, Accepted, 2017.
  • M.R. Shivalingam, K.S.G. Arul Kumaran, D. Jeslin, Ch. MadhusudhanaRao, M. Tejaswini, "Design and Evaluation of Binding Properties of Cassia roxburghii Seed Galacto mannan and Moringa oleifera Gum in the Formulation of Paracetamol Tablets," Research Journal of Pharmacy and Technology(RJPT). 3(1): Jan.-Mar. 2010; Page 254-256.
  • M.R. Shivalingam, K.S.G. Arul Kumaran, D. Jeslin, Y.V. Kishore Reddy, M. Tejaswini, Ch. MadhusudhanaRao, V. Tejopavan, "Cassia roxburghii Seed Galacto manna— a potential binding agent in the tablet formulation," Journal of Biomedical Science and Research(JBSR), Vol 2 (1), 2010, 18-22

Professional Objective:   To be a data scientist in the field of health care or bioinformatics where I can leverage my analytical skills and knowledge towards the advancement of the research field.

Seema Sangari

Seema Sangari

Dissertation:   Debiasing Cyber Incidents - Correcting for Reporting Delays and Under-reporting

Dissertation Advisor:   Dr. Michael Whitman

Current Position:   Principal Modeler, HSB 

Srivarna Janney

Srivarna Settisara Janney

Bachelor’s Degree:   Mechanical Engineering, Visveswaraiah Technological University, India

  • Graduate Research Assistant, Kennesaw State University, 2016-2018
  • Senior Software Engineer, Torry Harris Business Solutions (THBS), United Kingdom, 2010-2012 and India, 2012-2014
  • Software Engineer, Torry Harris Business Solutions (THBS), India, 2007-2010

Selected Publications/Presentations:

  • S.S. Janney, S. Chakravarty, “New Algorithms for CS – MRI: WTWTS, DWTS, WDWTS”, One-page research paper, 40th International Conference of IEEE Engineering in Medicine and Biology Society (IEEE EMBC), Jul 2018
  • Master thesis presented at Southeast Symposium on Contemporary Engineering Topics (SSCET), UAH Engineering Forum, Alabama, Aug 2018
  • Master thesis poster is accepted to be presented at Biomedical Engineering Society (BMES) 2018 Annual Meeting, Oct 2018
  • Submitted draft copy for book chapter contribution on “Bioelectronics and Medical Devices”, Elsevier Publisher, May 2018
  • Showcased 3MT, Georgia Council of Graduate Schools (GCGS), Apr 2018
  • Master thesis presented in workshop for “Medical Signal and Image Processing” at Department of Biotechnology & Medical Engineering, NIT Rourkella, Feb 2018
  • S.S. Janney, I. Karim, J. Yang, C.C Hung, Y. Wang, “Monitoring and Assessing Traffic Safety Using Live Video Images”, GDOT project showcase, 4th Annual Transportation Research Expo, Sept 2016
  • 1st Place Winner, Graduate Research Project, C-day Poster Presentation, Kennesaw State University, Spring 2018
  • People's Choice Award, 3 Minute Thesis (3MT), Apr 2018
  • CCSE Dean’s 4.0 Club, Jan 2018
  • 3rd Place Winner, Hackathon 2017 - HPCC Systems Big Data
  • Foundation of Computer Science, Certified by Kennesaw State University, Jun 2016
  • Fundamental of RESTful API Design, Certified by APIGEE, Nov 2014
  • Member of HandsOnAtlanta, since 2014
  • SOA Associate, Certified by IBM, Jun 2008

Professional Objective:   I would like to be a researcher in Data Science and Analytics in medical imaging technologies contributing to advancements that would help medical and healthcare professionals provide value-based and personalized health care. I would like to look at career opportunities in industry and academia that fuel my interest in research.

2017 - 2018

Liyuan Liu

Graduation Date: Summer 2021

Dissertation: Incentive-based Data Sharing and Exchanging Mechanism Design

Dissertation Advisor: Dr. Meng Han

Current Position: Assistant Professor, Saint Joseph's University - Erivan K. Haub School of Business

Mohammad Masum

Mohammad Masum

Dissertation: Integrated Machine Learning Approaches to Improve Classification Performance and Feature Extraction Process for EEG Dataset

Dissertation Advisor: Dr. Hossain Shahriar

Current Position: Assistant Professor, San Jose State University

Lauren Staples

Lauren Staples

Graduation Date: Fall 2021

Dissertation: A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in the Episodes of Care Healthcare Delivery System

Dissertation Advisor: Dr. Joseph DeMaio

Current Position: Senior Data Scientist, Microsoft

2016 - 2017

Shashank Hebbar

Shashank Hebbar

Dissertation: Tree-BERT - Advanced Representation Learning for Relation Extraction

Current Position: Data Scientist, Credigy

Jessica Rudd

Jessica Rudd

Graduation Date: Summer 2020

Dissertation: Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies

Dissertation Advisor: Dr. Herman Ray

Current Position: Senior Data Engineer, Intuit Mailchimp

Yan Wang

Graduation Date: Spring 2020

Dissertation: Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring

Dissertation Advisor: Dr. Sherry NI

Current Position: Applied Scientist II, Amazon

Lili Zhang

Dissertation: A Novel Penalized Log-likelihood Function for Class Imbalance Problem

Current Position: Data Scientist/Research Engineer, Hewlett Packard Enterprise

Yiyun Zhou

Dissertation: Attack and Defense in Security Analytics

Dissertation Advisor: Dr. Selena He

Current Position: NLP Data Scientist, NBME

2015 - 2016

Edwin Baidoo

Edwin Baidoo

Graduation Date:  Spring 2020

Dissertation: A Credit Analysis of the Unbanked and Underbanked: An Argument for Alternative Data

Dissertation Advisor:  Dr. Stefano Mazzotta

Current Position: Assistant Professor, Business Analytics, Tennessee Technological University

Bogdan Gadidov

Bogdan Gadidov

Graduation Date:  Summer 2019

Dissertation: One- and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles

Dissertation Advisor: Dr. Mohammed Chowdhury

Current Position: Data Scientist, Variant

Jie Hao

Dissertation:  Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis

Dissertation Advisor:  Dr. Mingon Kang

Current Position:  Assistant Professor, Chinese Academy of Medical Sciences, Peking Union Medical College

Linh Le

Graduation Date:  Spring 2019

Dissertation:  Deep Embedding Kernel

Current Position: Assistant Professor, Information Technology, Kennesaw State University

Bob Vanderheyden

Bob Venderheyden

Graduation Date: Fall 2019

Dissertation:  Ordinal Hyperplane Loss

Dissertation Advisor:  Dr. Ying Xie

Current Position:  Principal Data Scientist, Microsoft

Contact Info

Kennesaw Campus 1000 Chastain Road Kennesaw, GA 30144

Marietta Campus 1100 South Marietta Pkwy Marietta, GA 30060

Campus Maps

Phone 470-KSU-INFO (470-578-4636)

kennesaw.edu/info

Media Resources

Resources For

Related Links

  • Financial Aid
  • Degrees, Majors & Programs
  • Job Opportunities
  • Campus Security
  • Global Education
  • Sustainability
  • Accessibility

470-KSU-INFO (470-578-4636)

© 2024 Kennesaw State University. All Rights Reserved.

  • Privacy Statement
  • Accreditation
  • Emergency Information
  • Report a Concern
  • Open Records
  • Human Trafficking Notice

data science Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Documentation matters: human-centered ai system to assist data science code documentation in computational notebooks.

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

Data science in the business environment: Insight management for an Executive MBA

Adventures in financial data science, gecoagent: a conversational agent for empowering genomic data extraction and analysis.

With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools. This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

Differentially Private Medical Texts Generation Using Generative Neural Networks

Technological advancements in data science have offered us affordable storage and efficient algorithms to query a large volume of data. Our health records are a significant part of this data, which is pivotal for healthcare providers and can be utilized in our well-being. The clinical note in electronic health records is one such category that collects a patient’s complete medical information during different timesteps of patient care available in the form of free-texts. Thus, these unstructured textual notes contain events from a patient’s admission to discharge, which can prove to be significant for future medical decisions. However, since these texts also contain sensitive information about the patient and the attending medical professionals, such notes cannot be shared publicly. This privacy issue has thwarted timely discoveries on this plethora of untapped information. Therefore, in this work, we intend to generate synthetic medical texts from a private or sanitized (de-identified) clinical text corpus and analyze their utility rigorously in different metrics and levels. Experimental results promote the applicability of our generated data as it achieves more than 80\% accuracy in different pragmatic classification problems and matches (or outperforms) the original text data.

Impact on Stock Market across Covid-19 Outbreak

Abstract: This paper analysis the impact of pandemic over the global stock exchange. The stock listing values are determined by variety of factors including the seasonal changes, catastrophic calamities, pandemic, fiscal year change and many more. This paper significantly provides analysis on the variation of listing price over the world-wide outbreak of novel corona virus. The key reason to imply upon this outbreak was to provide notion on underlying regulation of stock exchanges. Daily closing prices of the stock indices from January 2017 to January 2022 has been utilized for the analysis. The predominant feature of the research is to analyse the fact that does global economy downfall impacts the financial stock exchange. Keywords: Stock Exchange, Matplotlib, Streamlit, Data Science, Web scrapping.

Information Resilience: the nexus of responsible and agile approaches to information use

AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

Alzheimers disease (AD) is a brain disorder that is mainly characterized by a progressive degeneration of neurons in the brain, causing a decline in cognitive abilities and difficulties in engaging in day-to-day activities. This study compares an FFT-based spectral analysis against a functional connectivity analysis based on phase synchronization, for finding known differences between AD patients and Healthy Control (HC) subjects. Both of these quantitative analysis methods were applied on a dataset comprising bipolar EEG montages values from 20 diagnosed AD patients and 20 age-matched HC subjects. Additionally, an attempt was made to localize the identified AD-induced brain activity effects in AD patients. The obtained results showed the advantage of the functional connectivity analysis method compared to a simple spectral analysis. Specifically, while spectral analysis could not find any significant differences between the AD and HC groups, the functional connectivity analysis showed statistically higher synchronization levels in the AD group in the lower frequency bands (delta and theta), suggesting that the AD patients brains are in a phase-locked state. Further comparison of functional connectivity between the homotopic regions confirmed that the traits of AD were localized in the centro-parietal and centro-temporal areas in the theta frequency band (4-8 Hz). The contribution of this study is that it applies a neural metric for Alzheimers detection from a data science perspective rather than from a neuroscience one. The study shows that the combination of bipolar derivations with phase synchronization yields similar results to comparable studies employing alternative analysis methods.

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

A growing number of physical objects with embedded sensors with typically high volume and frequently updated data sets has accentuated the need to develop methodologies to extract useful information from big data for supporting decision making. This study applies a suite of data analytics and core principles of data science to characterize near real-time meteorological data with a focus on extreme weather events. To highlight the applicability of this work and make it more accessible from a risk management perspective, a foundation for a software platform with an intuitive Graphical User Interface (GUI) was developed to access and analyze data from a decommissioned nuclear production complex operated by the U.S. Department of Energy (DOE, Richland, USA). Exploratory data analysis (EDA), involving classical non-parametric statistics, and machine learning (ML) techniques, were used to develop statistical summaries and learn characteristic features of key weather patterns and signatures. The new approach and GUI provide key insights into using big data and ML to assist site operation related to safety management strategies for extreme weather events. Specifically, this work offers a practical guide to analyzing long-term meteorological data and highlights the integration of ML and classical statistics to applied risk and decision science.

Export Citation Format

Share document.

  • Warning : Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95 Warning : array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
  • ODSC EUROPE
  • AI+ Training
  • Speak at ODSC

dissertation in data science

  • Data Analytics
  • Data Engineering
  • Data Visualization
  • Deep Learning
  • Generative AI
  • Machine Learning
  • NLP and LLMs
  • Business & Use Cases
  • Career Advice
  • Write for us
  • ODSC Community Slack Channel
  • Upcoming Webinars

10 Compelling Machine Learning Dissertations from Ph.D. Students

10 Compelling Machine Learning Dissertations from Ph.D. Students

Data Science Academic Research Featured Post Academia Machine Learning Research posted by Daniel Gutierrez, ODSC June 18, 2019 Daniel Gutierrez, ODSC

As a data scientist, an integral part of my work in the field revolves around keeping current with research coming out of academia. I frequently scour arXiv.org for late-breaking papers that show trends and fertile areas of research. Other sources of valuable research developments are in the form of Ph.D. dissertations, the culmination of a doctoral candidate’s work to confer his/her degree. Ph.D. candidates are highly motivated to choose research topics that establish new and creative paths toward discovery in their field of study. In this article, I present 10 compelling machine learning dissertations that I found interesting in terms of my own areas of pursuit. I hope you’ll find several of them that match your own interests. Each thesis may take a while to consume but will result in hours of satisfying summer reading. Enjoy!

[Related Article: The Best Machine Learning Research of 2019 So Far ]

1. Recognition of Everyday Activities through Wearable Sensors and Machine Learning

machine learning dissertation

Over the past several years, the use of wearable devices has increased dramatically, primarily for fitness monitoring, largely due to their greater sensor reliability, increased functionality, smaller size, increased ease of use, and greater affordability. These devices have helped many people of all ages live healthier lives and achieve their personal fitness goals, as they are able to see quantifiable and graphical results of their efforts every step of the way (i.e. in real-time). Yet, while these device systems work well within the fitness domain, they have yet to achieve a convincing level of functionality in the larger domain of healthcare.

The goal of the research detailed in this dissertation is to explore and develop accurate and quantifiable sensing and machine learning techniques for eventual real-time health monitoring by wearable device systems. To that end, a two-tier recognition system is presented that is designed to identify health activities in a naturalistic setting based on accelerometer data of common activities. In Tier I a traditional activity recognition approach is employed to classify short windows of data, while in Tier II these classified windows are grouped to identify instances of a specific activity.

2. Algorithms and analysis for non-convex optimization problems in machine learning

This dissertation proposes efficient algorithms and provides theoretical analysis through the angle of spectral methods for some important non-convex optimization problems in machine learning. Specifically, the focus is on two types of non-convex optimization problems: learning the parameters of latent variable models and learning in deep neural networks. Learning latent variable models is traditionally framed as a non-convex optimization problem through Maximum Likelihood Estimation (MLE). For some specific models such as multi-view model, it’s possible to bypass the non-convexity by leveraging the special model structure and convert the problem into spectral decomposition through Methods of Moments (MM) estimator. In this research, a novel algorithm is proposed that can flexibly learn a multi-view model in a non-parametric fashion. To scale the nonparametric spectral methods to large datasets, an algorithm called doubly stochastic gradient descent is proposed which uses sampling to approximate two expectations in the problem, and it achieves better balance of computation and statistics by adaptively growing the model as more data arrive. Learning with neural networks is a difficult non-convex problem while simple gradient-based methods achieve great success in practice. This part of the research tries to understand the optimization landscape of learning one-hidden-layer networks with Rectified Linear (ReLU) activation functions. By directly analyzing the structure of the gradient, it can be shown that neural networks with diverse weights have no spurious local optima.

3. Algorithms, Machine Learning, and Speech: The Future of the First Amendment in a Digital World

We increasingly depend on algorithms to mediate information and thanks to the advance of computation power and big data, they do so more autonomously than ever before. At the same time, courts have been deferential to First Amendment defenses made in light of new technology. Computer code, algorithmic outputs, and arguably, the dissemination of data have all been determined as constituting “speech” entitled to constitutional protection. However, continuing to use the First Amendment as a barrier to regulation may have extreme consequences as our information ecosystem evolves. This research focuses on developing a new approach to determining what should be considered “speech” if the First Amendment is to continue to protect the marketplace of ideas, individual autonomy, and democracy.

4. Deep in-memory computing

There is much interest in embedding data analytics into sensor-rich platforms such as wearables, biomedical devices, autonomous vehicles, robots, and Internet-of-Things to provide these with decision-making capabilities. Such platforms often need to implement machine learning (ML) algorithms under stringent energy constraints with battery-powered electronics. Especially, energy consumption in memory subsystems dominates such a system’s energy efficiency. In addition, the memory access latency is a major bottleneck for overall system throughput. To address these issues in memory-intensive inference applications, this dissertation proposes deep in-memory accelerator (DIMA), which deeply embeds computation into the memory array, employing two key principles: (1) accessing and processing multiple rows of memory array at a time, and (2) embedding pitch-matched low-swing analog processing at the periphery of bitcell array.

5. Classification with Large Sparse Datasets: Convergence Analysis and Scalable Algorithms

Large and sparse datasets, such as user ratings over a large collection of items, are common in the big data era. Many applications need to classify the users or items based on the high-dimensional and sparse data vectors, e.g., to predict the profitability of a product or the age group of a user, etc. Linear classifiers are popular choices for classifying such data sets because of their efficiency. In order to classify the large sparse data more effectively, the following important questions need to be answered: (a) Sparse data and convergence behavior. How different properties of a data set, such as the sparsity rate and the mechanism of missing data systematically affect convergence behavior of classification? (b) Handling sparse data with non-linear model. How to efficiently learn non-linear data structures when classifying large sparse data? This dissertation attempts to address these questions with empirical and theoretical analysis on large and sparse data sets.

6. Collaborative detection of cyberbullying behavior in Twitter data

As the size of Twitter data is increasing, so are undesirable behaviors of its users. One such undesirable behavior is cyberbullying, which could lead to catastrophic consequences. Hence, it is critical to efficiently detect cyberbullying behavior by analyzing tweets, in real-time if possible. Prevalent approaches to identifying cyberbullying are mainly stand-alone, and thus, are time-consuming. This dissertation proposes a new approach called distributed-collaborative approach for cyberbullying detection. It contains a network of detection nodes, each of which is independent and capable of classifying tweets it receives. These detection nodes collaborate with each other in case they need help in classifying a given tweet. The study empirically evaluates various collaborative patterns, and it assesses the performance of each pattern in detail. Results indicate an improvement in recall and precision of the detection mechanism over the stand- alone paradigm.

7. Bringing interpretability and visualization with artificial neural networks

Extreme Learning Machine (ELM) is a training algorithm for Single-Layer Feed-forward Neural Network (SLFN). The difference in theory of ELM from other training algorithms is in the existence of explicitly-given solution due to the immutability of initialed weights. In practice, ELMs achieve performance similar to that of other state-of-the-art training techniques, while taking much less time to train a model. Experiments show that the speedup of training ELM is up to the 5 orders of magnitude comparing to standard Error Back-propagation algorithm. ELM is a recently discovered technique that has proved its efficiency in classic regression and classification tasks, including multi-class cases. In this dissertation, extensions of ELMs for non-typical for Artificial Neural Networks (ANNs) problems are presented.

8. Scalable Manifold Learning and Related Topics

The subject of manifold learning is vast and still largely unexplored. As a subset of unsupervised learning it has a fundamental challenge in adequately defining the problem but whose solution is to an increasingly important desire to understand data sets intrinsically. It is the overarching goal of this work to present researchers with an understanding of the topic of manifold learning, with a description and proposed method for performing manifold learning, guidance for selecting parameters when applying manifold learning to large scientific data sets and together with open source software powerful enough to meet the demands of big data.

9. The Intelligent Management of Crowd-Powered Machine Learning

Artificial intelligence and machine learning power many technologies today, from spam filters to self-driving cars to medical decision assistants. While this revolution has hugely benefited from algorithmic developments, it also could not have occurred without data, which nowadays is frequently procured at massive scale from crowds. Because data is so crucial, a key next step towards truly autonomous agents is the design of better methods for intelligently managing now-ubiquitous crowd-powered data-gathering processes. This dissertation takes this key next step by developing algorithms for the online and dynamic control of these processes. The research considers how to gather data for its two primary purposes: training and evaluation.

[Related Article: 25 Excellent Machine Learning Open Datasets ]

10. System-Aware Optimization for Machine Learning at Scale

New computing systems have emerged in response to the increasing size and complexity of modern datasets. For best performance, machine learning methods must be designed to closely align with the underlying properties of these systems. This dissertation illustrates the impact of system-aware machine learning through the lens of optimization, a crucial component in formulating and solving most machine learning problems. Classically, the performance of an optimization method is measured in terms of accuracy (i.e., does it realize the correct machine learning model?) and convergence rate (after how many iterations?). In modern computing regimes, however, it becomes critical to additionally consider a number of systems-related aspects for best overall performance. These aspects can range from low-level details, such as data structures or machine specifications, to higher-level concepts, such as the tradeoff between communication and computation. We propose a general optimization framework for machine learning, CoCoA, that gives careful consideration to systems parameters, often incorporating them directly into the method and theory.

dissertation in data science

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.

eu square

Google Introduces Generative AI Into Search and More.

AI and Data Science News posted by ODSC Team May 16, 2024 In a new blog, Google has announced that its search engine will be powered by a...

CorePlus Using AI To Enhance Cervical Cancer Screenings

CorePlus Using AI To Enhance Cervical Cancer Screenings

AI and Data Science News posted by ODSC Team May 16, 2024 CorePlus, a pathology-focused company, has announced that it became the first laboratory globally to operationalize the...

Microsoft Offering Relocation to AI Staff due to U.S.- China Tensions

Microsoft Offering Relocation to AI Staff due to U.S.- China Tensions

AI and Data Science News posted by ODSC Team May 16, 2024 Microsoft has reportedly asked employees in its China-based cloud computing and artificial intelligence operations to consider...

AI weekly square

MIT Libraries home DSpace@MIT

  • DSpace@MIT Home
  • MIT Libraries

This collection of MIT Theses in DSpace contains selected theses and dissertations from all MIT departments. Please note that this is NOT a complete collection of MIT theses. To search all MIT theses, use MIT Libraries' catalog .

MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

MIT Theses are openly available to all readers. Please share how this access affects or benefits you. Your story matters.

If you have questions about MIT theses in DSpace, [email protected] . See also Access & Availability Questions or About MIT Theses in DSpace .

If you are a recent MIT graduate, your thesis will be added to DSpace within 3-6 months after your graduation date. Please email [email protected] with any questions.

Permissions

MIT Theses may be protected by copyright. Please refer to the MIT Libraries Permissions Policy for permission information. Note that the copyright holder for most MIT theses is identified on the title page of the thesis.

Theses by Department

  • Comparative Media Studies
  • Computation for Design and Optimization
  • Computational and Systems Biology
  • Department of Aeronautics and Astronautics
  • Department of Architecture
  • Department of Biological Engineering
  • Department of Biology
  • Department of Brain and Cognitive Sciences
  • Department of Chemical Engineering
  • Department of Chemistry
  • Department of Civil and Environmental Engineering
  • Department of Earth, Atmospheric, and Planetary Sciences
  • Department of Economics
  • Department of Electrical Engineering and Computer Sciences
  • Department of Humanities
  • Department of Linguistics and Philosophy
  • Department of Materials Science and Engineering
  • Department of Mathematics
  • Department of Mechanical Engineering
  • Department of Nuclear Science and Engineering
  • Department of Ocean Engineering
  • Department of Physics
  • Department of Political Science
  • Department of Urban Studies and Planning
  • Engineering Systems Division
  • Harvard-MIT Program of Health Sciences and Technology
  • Institute for Data, Systems, and Society
  • Media Arts & Sciences
  • Operations Research Center
  • Program in Real Estate Development
  • Program in Writing and Humanistic Studies
  • Science, Technology & Society
  • Science Writing
  • Sloan School of Management
  • Supply Chain Management
  • System Design & Management
  • Technology and Policy Program

Collections in this community

Doctoral theses, graduate theses, undergraduate theses, recent submissions.

Thumbnail

The pulse amplifier in theory and experiment 

Thumbnail

Optical studies of the nature of metallic surfaces 

Thumbnail

A controlled community for Waterbury, Connecticut 

feed

dissertation in data science

Recent Dissertation Topics

Marty Wells and a student look over papers

Kerstin Emily Frailey - “PRACTICAL DATA QUALITY FOR MODERN DATA & MODERN USES, WITH APPLICATIONS TO AMERICA’S COVID-19 DATA"

Dissertation Advisor: Martin Wells

Initial job placement: Co-Founder & CEO

David Kent - “Smoothness-Penalized Deconvolution: Rates of Convergence, Choice of Tuning Parameter, and Inference"

Dissertation Advisor: David Ruppert

Initial job placement: VISITING ASSISTANT PROFESSOR - Cornell University

Yuchen Xu - “Dynamic Atomic Column Detection in Transmission Electron Microscopy Videos via Ridge Estimation”

Dissertation Advisor: David Matteson

Initial job placement: Postdoctoral Fellow - UCLA

Siyi Deng - “Optimal and Safe Semi-supervised Estimation and Inference for High-dimensional Linear Regression"

Dissertation Advisor: Yang Ning

Initial job placement: Data Scientist - TikTok

Peter (Haoxuan) Wu - “Advances in adaptive and deep Bayesian state-space models”

Initial job placement: Quantitative Researcher - DRW

Grace Deng - “Generative models and Bayesian spillover graphs for dynamic networks”

Initial job placement: Data Scientist - Research at Google

Samriddha Lahiry - “Some problems of asymptotic quantum statistical inference”

Dissertation Advisor: Michael Nussbaum

Initial job placement: Postdoctoral Fellow - Harvard University

Yaosheng Xu - “WWTA load-balancing for parallel-server systems with heterogeneous servers and multi-scale heavy traffic limits for generalized Jackson networks”

Dissertation Advisor: Jim Dai

Initial job placement: Applied Scientist - Amazon

Seth Strimas-Mackey - “Latent structure in linear prediction and corpora comparison”

Dissertation Advisor: Marten Wegkamp and Florentina Bunea

Initial job placement: Data Scientist at Google

Tao Zhang - “Topics in modern regression modeling”

Dissertation Advisor: David Ruppert and Kengo Kato

Initial job placement: Quantitative Researcher - Point72

Wentian Huang - “Nonparametric and semiparametric approaches to functional data modeling”

Initial job placement: Ernst & Young

Binh Tang - “Deep probabilistic models for sequential prediction”

Initial job placement: Amazon

Yi Su - “Off-policy evaluation and learning for interactive systems"

Dissertation Advisor: Thorsten Joachims

Initial job placement: Berkeley (postdoc)

Ruqi Zhang - “Scalable and reliable inference for probabilistic modeling”

Dissertation Advisor: Christopher De Sa

Jason Sun - “Recent developments on Matrix Completion"

Initial job placement: LinkedIn

Indrayudh Ghosal - “Model combinations and the Infinitesimal Jackknife : how to refine models with boosting and quantify uncertainty”

Dissertation Advisor: Giles Hooker

Benjamin Ryan Baer - “Contributions to fairness and transparency”

Initial job placement: Rochester (postdoc)

Megan Lynne Gelsinger - “Spatial and temporal approaches to analyzing big data”

Dissertation Advisor: David Matteson and Joe Guinness

Initial job placement: Institute for Defense Analysis

Zhengze Zhou - “Statistical inference for machine learning : feature importance, uncertainty quantification and interpretation stability”

Initial job placement: Facebook

Huijie Feng - “Estimation and inference of high-dimensional individualized threshold with binary responses”

Initial job placement: Microsoft

Xiaojie Mao - “Machine learning methods for data-driven decision making : contextual optimization, causal inference, and algorithmic fairness”

Dissertation Advisor: Nathan Kallus and Madeleine Udell

Initial job placement: Tsinghua University, China

Xin Bing - “Structured latent factor models : Identifiability, estimation, inference and prediction”

Initial job placement: Cambridge (postdoc), University of Toronto

Yang Liu - “Nonparametric regression and density estimation on a network"

Dissertation Advisor: David Ruppert and Peter Frazier

Initial job placement: Research Analyst - Cubist Systematic Strategies

Skyler Seto - “Learning from less : improving and understanding model selection in penalized machine learning problems”

Initial job placement: Machine Learning Researcher - Apple

Jiekun Feng - “Markov chain, Markov decision process, and deep reinforcement learning with applications to hospital management and real-time ride-hailing”

Initial job placement:

Wenyu Zhang - “Methods for change point detection in sequential data”

Initial job placement: Research Scientist - Institute for Infocomm Research

Liao Zhu - “The adaptive multi-factor model and the financial market"

Initial job placement: Quantitative Researcher - Two Sigma

Xiaoyun Quan - “Latent Gaussian copula model for high dimensional mixed data, and its applications”

Dissertation Advisor: James Booth and Martin Wells

Praphruetpong (Ben) Athiwaratkun - "Density representations for words and hierarchical data"

Dissertation Advisor: Andrew Wilson

Initial job placement: AI Scientist - AWS AI Labs

Yiming Sun - “High dimensional data analysis with dependency and under limited memory”

Dissertation Advisor: Sumanta Basu and Madeleine Udell

Zi Ye - “Functional single index model and jensen effect"

Dissertation Advisor: Giles Hooker 

Initial job placement: Data & Applied Scientist - Microsoft

Hui Fen (Sarah) Tan - “Interpretable approaches to opening up black-box models”

Dissertation Advisor: Giles Hooker and Martin Wells

Daniel E. Gilbert - “Luck, fairness and Bayesian tensor completion”

Yichen zhou - “asymptotics and interpretability of decision trees and decision tree ensemblesg”.

Initial job placement: Data Scientist - Google

Ze Jin - “Measuring statistical dependence and its applications in machine learning”  

Initial job placement: Research Scientist, Facebook Integrity Ranking & ML - Facebook

Xiaohan Yan - “Statistical learning for structural patterns with trees”

Dissertation Advisor: Jacob Bien

Initial job placement: Senior Data Scientist - Microsoft

Guo Yu - “High-dimensional structured regression using convex optimization”

Dan kowal - "bayesian methods for functional and time series data".

Dissertation Advisor: David Matteson and David Ruppert

Initial job placement: assistant professor, Department of Statistics, Rice University

Keegan Kang - "Data Dependent Random Projections"

David sinclair - "model selection results for high dimensional graphical models on binary and count data with applications to fmri and genomics", liu, yanning – "statistical issues in the design and analysis of clinical trials".

Dissertation Advisor: Bruce Turnbull

Nicholson, William Bertil – "Tools for Modeling Sparse Vector Autoregressions"

Tupper, laura lindley – "topics in classification and clustering of high-dimensional data", chetelat, didier – "high-dimensional inference by unbiased risk estimation".

Initial Job Placement: Assistant Professor Universite de Montreal, Montreal, Canada

Gaynanova, Irina – "Estimation Of Sparse Low-Dimensional Linear Projections"

Dissertation Advisor: James Booth

Initial Job Placement: Assistant Professor, Texas A&M, College Station, TX

Mentch, Lucas – "Ensemble Trees and CLTS: Statistical Inference in Machine Learning"

Initial Job Placement: Assistant Professor, University of Pittsburgh, Pittsburgh, PA

Risk, Ben – "Topics in Independent Component Analysis, Likelihood Component Analysis, and Spatiotemporal Mixed Modeling"

Dissertation Advisors: David Matteson and David Ruppert

Initial Job Placement: Postdoctoral Fellow, University of North Carolina, Chapel Hill, NC

Zhao, Yue – "Contributions to the Statistical Inference for the Semiparametric Elliptical Copula Model"

Disseration Advisor: Marten Wegkamp 

Initial Job Placement: Postoctoral Fellow, McGill University, Montreal, Canada

Chen, Maximillian Gene – "Dimension Reduction and Inferential Procedures for Images"

Dissertation Advisor: Martin Wells 

Earls, Cecelia – Bayesian hierarchical Gaussian process models for functional data analysis

Dissertation Advisor: Giles Hooker

Initial Job Placement: Lecturer, Cornell University, Ithaca, NY

Li, James Yi-Wei – "Tensor (Multidimensional Array) Decomposition, Regression, and Software for Statistics and Machine Learning"

Initial Job Placement: Research Scientist, Yahoo Labs

Schneider, Matthew John – "Three Papers on Time Series Forecasting and Data Privacy"

Dissertation Advisor: John Abowd

Initial Job Placement: Assistant Professor, Northwestern University, Evanston, IL

Thorbergsson, Leifur – "Experimental design for partially observed Markov decision processes"

Initial Job Placement: Data Scientist, Memorial Sloan Kettering Cancer Center, New York, NY

Wan, Muting – "Model-Based Classification with Applications to High-Dimensional Data in Bioinformatics"

Initial Job Placement: Senior Associate, 1010 Data, New York, NY

Johnson, Lynn Marie – "Topics in Linear Models: Methods for Clustered, Censored Data and Two-Stage Sampling Designs"

Dissertation Advisor: Robert Strawderman

Initial Job Placement: Statistical Consultant, Cornell, Statistical Consulting Unit, Ithaca, NY

Tecuapetla Gomez, Inder Rafael –  "Asymptotic Inference for Locally Stationary Processes"

Initial Job Placement: Postdoctoral Fellow, Georg-August-Universitat Gottigen, Gottigen, Germany. 

Bar, Haim – "Parallel Testing, and Variable Selection -- a Mixture-Model Approach with Applications in Biostatistics" 

Dissertation Advisor: James Booth

Initial Job Placement: Postdoc, Department of Medicine, Weill Medical Center, New York, NY

Cunningham, Caitlin –  "Markov Methods for Identifying ChIP-seq Peaks" 

Initial Job Placement: Assistant Professor, Le Moyne College, Syracuse, NY

Ji, Pengsheng – "Selected Topics in Nonparametric Testing and Variable Selection for High Dimensional Data" 

Dissertation Advisor: Michael Nussbaum 

Initial Job Placement: Assistant Professor, University of Georgia, Athens, GA

Morris, Darcy Steeg – "Methods for Multivariate Longitudinal Count and Duration Models with Applications in Economics" 

Dissertation Advisor: Francesca Molinari 

Initial Job Placement: Research Mathematical Statistician, Center for Statistical Research and Methodology, U.S. Census Bureau, Washington DC

Narayanan, Rajendran – "Shrinkage Estimation for Penalised Regression, Loss Estimation and Topics on Largest Eigenvalue Distributions" 

Initial Job Placement: Visiting Scientist, Indian Statistical Institute, Kolkata, India

Xiao, Luo – "Topics in Bivariate Spline Smoothing" 

Dissertation Advisor: David Ruppert 

Initial Job Placement: Postdoc, Johns Hopkins University, Baltimore, MD

Zeber, David – "Extremal Properties of Markov Chains and the Conditional Extreme Value Model" 

Dissertation Advisor: Sidney Resnick 

Initial Job Placement: Data Analyst, Mozilla, San Francisco, CA

Clement, David – "Estimating equation methods for longitudinal and survival data" 

Dissertation Advisor: Robert Strawderman 

Initial Job Placement: Quantitative Analyst, Smartodds, London UK

Eilertson, Kirsten – "Estimation and inference of random effect models with applications to population genetics and proteomics" 

Dissertation Advisor: Carlos Bustamante 

Initial Job Placement: Biostatistician, The J. David Gladstone Institutes, San Francisco CA

Grabchak, Michael – "Tempered stable distributions: properties and extensions" 

Dissertation Advisor: Gennady Samorodnitsky 

Initial Job Placement: Assistant Professor, UNC Charlotte, Charlotte NC

Li, Yingxing – "Aspects of penalized splines" 

Initial Job Placement: Assistant Professor, The Wang Yanan Institute for Studies in Economics, Xiamen University

Lopez Oliveros, Luis – "Modeling end-user behavior in data networks" 

Dissertation Advisor: Sidney Resnick  

Initial Job Placement: Consultant, Murex North America, New York NY

Ma, Xin – "Statistical Methods for Genome Variant Calling and Population Genetic Inference from Next-Generation Sequencing Data" 

Initial Job Placement: Postdoc, Stanford University, Stanford CA

Kormaksson, Matthias – "Dynamic path analysis and model based clustering of microarray data" 

Dissertation Advisor: James Booth 

Initial Job Placement: Postdoc, Department of Public Health, Weill Cornell Medical College, New York NY

Schifano, Elizabeth – "Topics in penalized estimation" 

Initial Job Placement: Postdoc, Department of Biostatistics, Harvard University, Boston MA

Hanlon, Bret – "High-dimensional data analysis" 

Dissertation Advisor: Anand Vidyashankar 

Shaby, Benjamin – "Tools for hard bayesian computations" 

Initial Job Placement: Postdoc, SAMSI, Durham NC

Zipunnikov, Vadim – "Topics on generalized linear mixed models" 

Initial Job Placement: Postdoc, Department of Biostatistics, Johns Hopkins University, Baltimore MD

Barger, Kathryn Jo-Anne – "Objective bayesian estimation for the number of classes in a population using Jeffreys and reference priors" 

Dissertation Advisor: John Bunge 

Initial Job Placement: Pfizer Incorporated

Chan, Serena Suewei – "Robust and efficient inference for linear mixed models using skew-normal distributions" 

Initial Job Placement: Statistician, Takeda Pharmaceuticles, Deerfield IL

Lin, Haizhi – "Distressed debt prices and recovery rate estimation" 

Dissertation Advisor: Martin Wells  

Initial Job Placement: Associate, Fixed Income Department, Credit Suisse Securities (USA), New York, NY

Open Access Theses and Dissertations

Thursday, April 18, 8:20am (EDT): Searching is temporarily offline. We apologize for the inconvenience and are working to bring searching back up as quickly as possible.

Advanced research and scholarship. Theses and dissertations, free to find, free to use.

Advanced search options

Browse by author name (“Author name starts with…”).

Find ETDs with:

Written in any language English Portuguese French German Spanish Swedish Lithuanian Dutch Italian Chinese Finnish Greek Published in any country US or Canada Argentina Australia Austria Belgium Bolivia Brazil Canada Chile China Colombia Czech Republic Denmark Estonia Finland France Germany Greece Hong Kong Hungary Iceland India Indonesia Ireland Italy Japan Latvia Lithuania Malaysia Mexico Netherlands New Zealand Norway Peru Portugal Russia Singapore South Africa South Korea Spain Sweden Switzerland Taiwan Thailand UK US Earliest date Latest date

Sorted by Relevance Author University Date

Only ETDs with Creative Commons licenses

Results per page: 30 60 100

October 3, 2022. OATD is dealing with a number of misbehaved crawlers and robots, and is currently taking some steps to minimize their impact on the system. This may require you to click through some security screen. Our apologies for any inconvenience.

Recent Additions

See all of this week’s new additions.

dissertation in data science

About OATD.org

OATD.org aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions . OATD currently indexes 7,241,108 theses and dissertations.

About OATD (our FAQ) .

Visual OATD.org

We’re happy to present several data visualizations to give an overall sense of the OATD.org collection by county of publication, language, and field of study.

You may also want to consult these sites to search for other theses:

  • Google Scholar
  • NDLTD , the Networked Digital Library of Theses and Dissertations. NDLTD provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not.
  • Proquest Theses and Dissertations (PQDT), a database of dissertations and theses, whether they were published electronically or in print, and mostly available for purchase. Access to PQDT may be limited; consult your local library for access information.

MS in Data Science

MSDS students choose among the many introductory graduate courses offered to students in the PhD program. These courses cover areas of computer science, optimization, linear algebra and statistics for students that have not had prior exposure to this required course work. Master’s students are fully integrated in the academic activities of the department alongside the PhD students.

Students must complete the required 5 core courses, 4 electives, and a final project to complete the program. There are also three foundational courses that students can test out. For the students who test out of foundational courses, the minimum number of courses taken in the program is 9. For the students who take all foundational courses, it is 12. These foundational courses can be taken in the summer before the program starts. Finally, students will be able to engage in a variety of opportunities across the Data Science Institute research programs and partnerships during their residency in the program.

The Curriculum

Foundational courses:.

Interested students will have the opportunity to test out of each of the 3 foundation courses below. Each of the courses will be offered in the late summer and offered online before the start of the fall quarter.

  • Computational Foundations for Data Science
  • Mathematical Foundations for Data Science
  • Statistical Foundations for Data Science

Core Courses:

  • Introduction to Data Science
  • Systems for Data and Computers/Data Design
  • Data Interaction
  • Introduction to ML and AI  or Foundations of Machine Learning and AI Part I
  • Responsible Use of Data and Algorithms

Four graduate-level electives can be selected from a wide variety of courses in Data Science, Computer Science, Statistics and across the University.

The online application portal will begin accepting applications for Fall 2024 admission in early Fall 2023. To ensure full consideration, applicants should apply by the deadline. The program may accept applications after the deadline if the cohort is not filled.

University of Leeds logo

  • Study and research support
  • Academic skills

Dissertation examples

Listed below are some of the best examples of research projects and dissertations from undergraduate and taught postgraduate students at the University of Leeds We have not been able to gather examples from all schools. The module requirements for research projects may have changed since these examples were written. Refer to your module guidelines to make sure that you address all of the current assessment criteria. Some of the examples below are only available to access on campus.

  • Undergraduate examples
  • Taught Masters examples

Data Science Institute

Ccmb and cntr graduate eight ph.d. students this year.

Congratulations to CCMB and CNTR’s doctoral candidates who successfully defended their dissertations this year!

2023-2024 CCMB and CNTR PhD graduates: (from left to right) Qing Wu, Isaac Kim, Pegah Nokhiz, Katie Brown, Dilum Aluthge, Lizze Kumar, Leah Namisa Rosenbloom (not pictured), Lucy Qin (not pictured).

During academic year 2023-2024, eight doctoral students from DSI’s two research centers (the Center for Computation Molecular Biology and the Center for Technological Responsibility, Reimagination, and Redesign) successfully defended their dissertations and are graduating with their Ph.D.s. Their dissertations cover a wide range of topics in data science, from modeling health systems to computational biology to cryptology. We are incredibly proud of their work and contributions towards improving lives in a data-driven world. Congratulations!

Dilum Aluthge Headshot

Dilum Aluthge, Ph.D. (CCMB)

Advisor: Indra Neil Sarkar, Ph.D., MLIS, FACMI

“Physiologic Reserve and Modular Learning Health Systems” 

Dilum is returning to the Warren Alpert Medical School to complete his M.D. degree.   https://aluthge.com/   

Katie Brown Headshot

Katie Brown, Ph.D., MSN, RN (CCMB)  

Advisor: Elizabeth Chen, Ph.D., FACMI

“Piecing the Puzzle Together: Building a Bridge to Discovery Using Health Informatics Approaches for Autism Spectrum Disorder” 

Katie is starting as a Senior Research Analyst at Truveta, a company that uses real-world health data from the electronic health records with more than 30 health systems across the US as members with the mission of "Save Lives with Data." https://www.linkedin.com/in/katie-brown-rn/

Isaac Kim Headshot

Isaac Kim, Ph.D. (CCMB)

Advisor: Dr. Jeffrey Bailey, M.D., Ph.D.

“Computational and Molecular Characterization of Genomic Variation in Burkitt Lymphoma and Associated Infectious Agents”

Isaac is entering his fourth year of medical school at Brown and applying for residency in urologic surgery this year. https://twitter.com/isaackimjr

Lizzie Kumar Headshot

Lizzie Kumar, Ph.D. (CNTR)

Advisor: Suresh Venkatasubramanian, Ph.D.

“Explainability, fairness, and human evaluation in machine learning: From theory to policy and back”

Lizzie will start as a Postdoctoral researcher in the Health Policy Department at Stanford University this fall. 

Pegah Nokhiz Headshot

Pegah Nokhiz, Ph.D. (CNTR)   

“Modeling and Simulation of Artificial Societies to Study Precarity and Inequity”

Pegah will start as a Postdoctoral Fellow at Cornell this fall. https://sites.google.com/view/pnokhiz/home

Leah Namisa Rosenbloom, Ph.D. (CNTR)

Advisors: Anna Lysyanskaya M.S., Ph.D. and Seny Kamara M.S., Ph.D.

“Cryptography for Grassroots Organizing” 

Leah will start as a Postdoctoral Research Associate at Northeastern University this fall. https://namisa.art/

Lucy Qin, Ph.D. (CNTR) 

Advisor: Seny Kamara, M.S., Ph.D. 

“Cryptography in Support of Public Policy Use Cases”

Lucy was the first person to defend her dissertation from CNTR! She is curren tly a Postdoctoral Fellow at Georgetown University's Initiative for Technology and Society. https://lucyq.in/  

Qing Qu Headshot

Qing Wu, Ph.D (CCMB); 

Advisors: Eric M. Morrow M.D., Ph.D., and Ece (Gamsiz) Uzun M.S., Ph.D., FAMIA.

“Machine Learning Method to Identify and Subgroup Individuals with Autism using Genetic and Phenotypic Data” 

Qing is currently a senior scientist at AstraZeneca. 

Related News

Faculty spotlight: linda clark, congratulations to amina abdullahi, shekhar pradhan joins dsi faculty.

School : California School of Management and Leadership

Modality(ies) : On-ground, hybrid, online

Calendar(s) : 8-week term

CIP Code : 30.7001 (approval pending)

CIP Code : 52.1301

Program Description/Overview

The Master of Science of Data Analytics (MSDA) degree enables students to learn the techniques and skills needed to work with diverse data sets, a range of analytics platforms and reporting tools, to ultimately tell an actionable data driven story, tell that story right, and tell it right now. 

Students are given the opportunity to roll up their sleeves in structured classroom environments to work directly with top enterprise solutions such as Google Analytics 360 Suite, Adobe Analytics Suite, Python, R, SQL, Hadoop, Moz, Hitwise, IBM CoreMetrics, Gephi, Power BI, Power Pivot, and so much more. Coupled to a dynamic range of statistical data modeling methods and functions, students learn the critical skills required to work with stakeholders and descriptive, predictive, prescriptive, diagnostic and logistical performance outcomes. 

In the emerging fields of Big Data, Data Science, Analytics, and Reporting, analysts are in demand across all vertical industries. The MSDA program puts these roles within the grasp of graduates, including Analytics Associates, Enterprise Analysts, CRM and Customer Journey Analysts, market analysts, data scientists, Optimization Analysts, Supply Chain Analysts, and more.

Practical Training Throughout the Program: Practical training is an integral part of the California School of Management Leadership (CSML) programs and aligns with the academic goals and learning outcomes for each program. CSML graduate programs at Alliant International University require practical training from the first term until graduation.

Students in the ground program are required to participate in curricular practical training as part of their experiential learning throughout the program. Practical training is intended to develop professional and applied practice related skills and expertise in the student’s program through a variety of work and learning experiences which could involve supervised practical training and/or applied client projects. This is intended for students to gain in-depth, supervised practical learning experiences. This is required throughout the academic program from start on Day 1 to program completion.

Domestic students can contact the CSML Professional Development (CPD) Coordinator directly for guidance. International students must apply for authorization for Curricular Practical Training to the Designated School Official (DSO) and schedule an appointment at least two weeks prior to the beginning of the Curricular Practical Training. Please email [email protected] to schedule an appointment. Note that an international students may begin curricular practical training ONLY after receiving their Form I-20 with the DSO endorsement. To be considered Curricular Practical Training, the work must be related to your major field of study. Please view  CSML CPT Application Process for International Students    for application information.

Emphasis/Concentration/Tracks

Concentrations, healthcare analytics.

This concentration targets the expertise required in current healthcare analytics environments and provides a clear understanding of practical healthcare analytics decision-making. Students will be enabled to learn techniques and skills needed to work with diverse data sets, a range of analytics platforms and reporting tools to improve health care through the use of innovative and essential techniques that enable the delivery of efficient and quality healthcare analytics. Students will learn to select, prepare, analyze, interpret, evaluate, and present health data related to health system performance and effectiveness.

Informatics

Within the informatics concentration, students will focus on enterprise level information management tactics, techniques, and modeling methods for extracting, transforming, and loading data into essential reports and visualizations utilized for evaluation, synthesis and interpretation of business operations results. Students will learn to establish optimal data-driven recommendations and prescriptions from historic, current, and future data that align with stakeholder departmental end-state objectives, conversions, and goals.

Fast Track Program

Students who are in good academic standing (3.0 GPA) are eligible to participate in the Fast Track program for the DBA program. Students whose GPA is lower than 3.0 and interested in the Fast Track option will be interviewed by the PD.

MSDA students who are interested in the Fast Track opportunity to doctoral programs will write an application essay which will be evaluated by a faculty committee. Those students who are approved by the committee will be allowed to take Fast Track courses.

In the Fast Track program, MSDA students can take up to 9 units of doctoral level bridge courses from Alliant’s Doctorate of Business Administration (DBA) program. If students complete the bridge courses with a B or above grade, they can transfer these courses into these doctoral programs if they enroll in them at Alliant International University upon completing their MSDA program.

The following Fast Track is available for this program:

Program Learning Outcomes/Goals

  • Demonstrate an understanding of techniques for maximizing the value of data in organizations
  • Apply critical thinking skills in the context of problem solving in the business workplace
  • Project a positive, proactive and non-judgmental attitude towards diverse cultural and international identities in interpersonal and professional interactions
  • Demonstrate competence in communicating data solutions to organizational audiences
  • Apply knowledge and skills in data science in the context of the organization
  • Be able to make ethical and socially responsible decisions for data applications in business
  • Leverage teams in the applications of data analytics and information technology

Internship, Practicum, and/or Dissertation Information

Practical Training  

Alliant is approved to offer practical training throughout the CSML program curriculum to domestic as well as international students. Practical training is defined as an approved work experience which is an integral part of an established curriculum and is directly related to the student’s major area of study. This schedule is repeated throughout the entire program. Practical training can be part-time (less than 20 hours a week) or full-time (more than 20 hours a week), paid or unpaid. International students should see guidelines from the International Office regarding details of FT and PT practical training (see Curricular Practical Training section).  

Approval of practical training sites: Program Director or Faculty Internship/Project Coordinator will have final approval, which is required each term. Detailed procedures for approval of a practical training site and the training details will be provided by the program. International students will meet the International Office and the PDSO for guidance and approval. 

Class schedule: For each course, students attend ground classes on weekdays per the published schedule for the courses. Each course duration is 8 weeks. 

Students in the ground program are required to participate in curricular practical training as part of their experiential learning throughout the program starting on Day 1.  

A student will have eight weeks (one term) to secure a practical training/internship site once they start their selected program. Students who do not have practical training/internship by the end of their first term will be placed in a project by the school’s practical training coordinator. If a student loses their practical training at any point during their program, they must notify the practical training coordinator immediately. They will have a maximum of 4 weeks to get a new practical training site or to be assigned to a CSML professional development client project. Students are required to be involved in an internship or a project throughout the program from start on Day 1 to program completion. 

In practical training courses students will be engaged in developing professional and applied practice related skills and expertise per the student’s program learning objectives. This is achieved through a variety of work and learning experiences which could involve supervised practical training and/or applied projects and/or client projects.  

CSML’s Career and Professional Development (CPD) plan and infrastructure: Goals:  

Practical training courses are part of CSML’s Career and Professional Development (CPD) plan and infrastructure. Its goals are the following: 

  • To enable students to gain their real-world experiences before graduation, CPD aims to assist them in developing and polishing their resumes and cover letters while also equipping them with application of a discipline-specific body of knowledge and encouraging continuous learning in their chosen field of study. 
  • To provide students with opportunities to participate in real-world application, consulting and research projects, including business development, professional development, applications in minority-owned businesses and humanitarian causes, while also emphasizing the importance of multicultural/international competence and promoting team-based and multidisciplinary approaches. This approach ensures that students develop a well-rounded set of skills, including cultural sensitivity and the ability to work collaboratively across diverse teams, which are essential in today’s global and interconnected workplace. 
  • To provide students with support, resources, and opportunities to develop their professional literacy and communication skills, including written and verbal communication, presentation skills, and interpersonal skills. This will help students effectively communicate their knowledge, skills, and experiences to potential employers, colleagues, and clients. 
  • To encourage students to engage in application, research, and learning / scholarship opportunities independently to enhance their knowledge and skills. CPD achieves this goal by providing students with resources, guidance, and support. 
  • To take a holistic approach that directs, equips, and empowers students to take control of their future selves by providing resources, accountability, and motivation to remove ambiguity and replace it with confidence and inspiration, while also fostering conduct, judgment, dispositions, and ethics. This approach ensures that students develop not only the skills and knowledge necessary for success but also the ethical values and personal qualities needed to make sound decisions and contribute positively to society. 
  • To offer students real opportunities, inspirational instructors, and authentic practical experiences that can make a real difference in the world, while also helping students build their resumes and set them apart in the job market. 

It is the student’s responsibility to find practical training opportunities, however the CSML Practical Training Coordinator will support students in providing practical training opportunities. 

International students should meet the International Student Office (ISSO) and the Designated School Official (DSO) for guidance and approval to start their practical training. International students must apply for authorization for Curricular Practical Training (CPT) to the DSO and schedule an appointment at least two weeks prior to the beginning of the Curricular Practical Training (CPT) or Day 1 CPT. Please email  [email protected]  to schedule an appointment. Note that international students may begin curricular practical training ONLY after receiving their Form I-20 with the DSO endorsement. To be considered for Curricular Practical Training, the work must be related to your current field of study in your program at CSML. Please view   CSML CPT Application Process for International Students    for application information. 

Credit Units

Total Credit Units: 33

Total Core Credit Units: 24

Total Elective Credit Units: N/A

Total Concentration Credit Units: 9

Degree Requirements

In addition to classroom instruction, all students in the on-ground program are required to get practical training experience in an approved setting for a minimum of 45 hours in every academic term throughout the program.

Prerequisite Courses

Applicants can submit a request for waiver to the program academic advisor. For consideration to waive the pre-requisite courses, students must satisfy one of the following requirements:

  • A 3-unit equivalent course completed at the bachelor’s level within the last 3 years in math or in statistics with a grade of B+ or better will waive the DAT50050    pre-requisite course. A 3-unit equivalent course completed at the bachelor’s level in programming (e.g., C++, .NET/C#, JAVA, R, or Python) within the last 3 years with a grade of B+ or better will waive the DAT50000    pre-requisite course.
  • In cases where the course(s) described above were completed more than 3 years ago, students can still apply for a waiver with the course syllabus from the year when they completed the course, and the program will assess the course contents to make a waiver decision.
  • Master’s degrees: Students with a masters including a course in math or statistics in the previous 3 years completed with a grade of B or better can waive the pre-requisite course DAT50050   , and a course in programming, in the previous 3 years completed with a grade of B or better can waive the pre-requisite course DAT50000   . In cases where the above course(s) were completed more than 3 years ago, students can still apply for a waiver with the course syllabus from the year when they completed the course, and the program will assess the course contents to make a waiver decision.

The prerequisite courses for this program are to be completed during Session 1 and 2 of Year One:

  • DAT50000 - Essentials of Informatics Using Python     (3 units)
  • DAT50050 - Basic Applied Statistics     (3 units)

Emphasis/Concentration/Track Requirements

  • HCM60100 - Healthcare Systems, Services, and Infrastructure - A Global Perspective (3 units)
  • HCA60000 - Quantitative & Qualitative Analysis Methods for Healthcare Data Analytics (3 units)
  • HCA60300 - Informatics for Patient Care, Public Health, and Epidemiology (3 units)
  • MGT60150 - Management & Marketing Models for Managerial Decision Making (3 units)
  • DAT60300 - Architectures and Methods for Data Mining (3 units)
  • IST64880 - Data Analytics and Decision Making (3 units)

Fast Track Options

Students can transfer up to 9 units to the DBA program from the following list:

  • CPD70000 - Career and Professional Development     (6 units total)
  • DAT70500 - Big Data Tools     (3 units)   

Curriculum Plan

The following curriculum plan is a sample and serves only as a general guide. Curriculum plans and course sequence are subject to variation depending on a student’s start term. Students must complete all coursework required for their program as set forth in their individual master plan of study.

8-Week Calendar

Term 1 (4 units).

  • DAT60200 - Database Design Principles and Technologies (3 units)
  • CPD70000 - Career and Professional Development (1 unit)

Term 2 (7 units)

  • Concentration Course 1 (Informatics concentration take DAT60300 - Architectures and Methods for Data Mining   ; Healthcare Analytic concentration take HCM60100 - Healthcare Systems, Services, and Infrastructure - A Global Perspective   )  (3 units)
  • MGT60200 - Strategy and Financial Planning in Global Contexts (3 units)

Term 3 (7 units)

  • DAT70500 - Big Data Tools (3 units)
  • Concentration Course 2 (3 units)

Term 4 (4 units)

  • IST65050 - Advanced Programming with Python (3 units)

Term 5 (4 units)

  • DAT60400 - Data Visualization (3 units)

Term 6 (4 units)

  • Concentration Course 3 (3 units)

Term 7 (3 units)

  • DAT60100 - Foundations of Data and Decision Algorithms (3 units)
  • Search SF State Search SF State Button SF State This Site

M.S. Physics Thesis Talk: Genessa Benton - "Data-driven estimates for light-quark connected and strange plus light-quark disconnected hadronic g-2 window quantities"

Upcoming events.

graduation 2024

Office Hours

Quick links.

  • Future Students
  • News @ SF State

College of Science

  • UTA Planetarium
  • Degree Programs
  • Departments
  • Financial Aid
  • College Info
  • Be A Maverick

A love of marine biology and data analysis

Thursday, May 09, 2024 • Katherine Egan Bennett :

Kelsey Beavers Scuba Research

Kelsey Beavers’ love of the ocean started at a young age. Coming from a family of avid scuba divers, she became a certified junior diver at age 11.

“It was a different world,” Beavers said. “I loved everything about the ocean.”

After graduating from high school, the Austin native moved to Fort Worth to study environmental science at Texas Christian University. One of her professors at TCU knew University of Texas at Arlington biology Professor Laura Mydlarz and encouraged Beavers to continue her studies in Arlington.

“Kelsey came to UTA to pursue a Ph.D. and study coral disease, and she quickly got involved in a large project studying stony coral tissue loss disease (SCTLD) , a rapidly spreading disease that has been killing coral all along Florida’s coast and in 22 Caribbean countries,” Mydlarz said. “She has been a real asset to our team, including being the lead author on a paper we published in Nature Communications last year on the disease.”

UT Arlington biology researchers Laura Mydlarz and Kelsey Beavers

As part of her doctoral program, Beavers completed original research studying the gene expression of coral reefs affected by SCTLD. Her research involved scuba diving off the coast of the U.S. Virgin Islands to collect coral tissue samples before returning to the lab for data analysis.

“What we found was that the symbiotic algae living within coral are also affected by SCTLD,” Beavers said. “Our current hypothesis is that when algae move from reef to reef, they may be spreading the disease that has been devastating coral reefs since it first appeared in 2014.”

A large part of Beavers’ dissertation project involved crunching large sets of gene expression data extracted from the coral samples and analyzing it in the context of disease susceptibility and severity.

“The analysis part of the project was so much larger than just using a regular Mac, so I worked with the Texas Advanced Computer Center (TACC) in Austin, which is part of the UT System, using their supercomputers,” Beavers said.

Beavers enjoyed the data analysis part of her project so much that when she saw an opening at TACC for a full-time position, she jumped at the chance. She’s now working there part-time until graduation, when she plans to move to Austin for her new role.

“I’m really looking forward to my new position, as I’ll be able to work on research projects other than my own,” she said. “It will be interesting to be a specialist in data analysis and help other scientists use the TACC supercomputers to solve complex questions.”

As part of the job, she’ll travel to other UT System campuses to educate researchers on how they can use the tools available at TACC.

The UTA College of Science, a Carnegie R1 research institution, is preparing the next generation of leaders in science through innovative education and hands-on research and offers programs in Biology, Chemistry & Biochemistry, Data Science, Earth & Environmental Sciences, Health Professions, Mathematics, Physics and Psychology. To support educational and research efforts visit the  giving page , or if you're a prospective student interested in beginning your #MaverickScience journey visit our  future students page .

News & Events

  • Events Calendar
  • Be a Maverick
  • Give to the College

COLLEGE OF SCIENCE

Life Sciences Building, Room 206 501 S. Nedderman Drive Box 19047 Arlington, TX 76019

Social Media

Phone: 817-272-3491 Fax: 817-272-3511 Email: [email protected]

Gender disparities in research fields in Russia: dissertation authors and their mentors

  • Published: 10 May 2024

Cite this article

dissertation in data science

  • Elena Chechik   ORCID: orcid.org/0000-0002-2277-0490 1  

33 Accesses

Explore all metrics

This study examines gender disparities in research fields as measured by scientific output in dissertations at two levels within the Russian academic system: PhD and the more advanced Doctor of Science (DS). The data for this study were extracted from over 250,000 dissertations spanning from 2005 to 2016. The chosen data source offers several advantages over bibliometric data for the purpose of this study: (a) it provides representative data, including the Social Sciences and Humanities and STEM fields; (b) gender disambiguation is straightforward due to the gendered nature of Russian patronyms; (c) it allows for easier attribution of text, as there is no need to attribute it to the first author in multi-authored publications; (d) it provides insights into the career stage by differentiating between PhD and DS authors, as well as between PhD and DS mentors. The results of this study reveal a gender imbalance across research fields and academic career levels. Furthermore, our observations indicate that male mentors more frequently collaborate with male authors, and female mentors with female authors, exceeding what would be expected by random chance. This gender homophily is evident in most research fields. While the results largely confirm findings from studies conducted in other countries, the four advantages mentioned above make this study an essential extension of studies based on bibliometric data. This research sheds light on the gender structure within research fields in Russia and invites nuanced discussions about achieving gender equality in the context of identified gender homophily.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

dissertation in data science

Similar content being viewed by others

dissertation in data science

Gender disparities in Russian academia: a bibliometric analysis

dissertation in data science

How does research productivity relate to gender? Analyzing gender differences for multiple publication dimensions

dissertation in data science

Gender gaps in international research collaboration: a bibliometric approach

Abalkina, A., & Libman, A. (2020). The real costs of plagiarism: Russian governors, plagiarized PhD theses, and infrastructure in Russian regions. Scientometrics, 125 (3), 2793–2820. https://doi.org/10.1007/s11192-020-03716-x

Article   Google Scholar  

AlShebli, B., Makovi, K., & Rahwan, T. (2020). Retracted article: The association between early career informal mentorship in academic collaborations and junior author performance. Nature Communications, 11 (1), 5855. https://doi.org/10.1038/s41467-020-19723-8

Bu, Y., Li, H., Wei, C., Liu, M., & Li, J. (2022). On the relationship between supervisor-supervisee gender difference and scientific impact of doctoral dissertations: Evidence from humanities and social sciences in China. Journal of Information Science, 48 (4), 492–502.

Campbell, L. G., Mehtani, S., Dozier, M. E., & Rinehart, J. (2013). Gender-heterogeneous working groups produce higher quality science. PLoS ONE, 8 (10), e79147. https://doi.org/10.1371/journal.pone.0079147

Canaan, S., & Mouganie, P. (2023). The impact of advisor gender on female students’ STEM Enrollment and persistence. Journal of Human Resources, 58 (2), 593–632. https://doi.org/10.3368/jhr.58.4.0320-10796R2

Cardoso, S., Carvalho, T., Rosa, M. J., & Soares, D. (2022). Gender (im)balance in the pool of graduate talent: The portuguese case. Tertiary Education and Management, 28 (2), 155–170. https://doi.org/10.1007/s11233-022-09093-9

Carrell, S. E., Page, M. E., & West, J. E. (2010). Sex and science: How professor gender perpetuates the gender gap*. The Quarterly Journal of Economics, 125 (3), 1101–1144. https://doi.org/10.1162/qjec.2010.125.3.1101

Cech, E. A., & Blair-Loy, M. (2019). The changing career trajectories of new parents in STEM. Proceedings of the National Academy of Sciences, 116 (10), 4182–4187. https://doi.org/10.1073/pnas.1810862116

Clauset, A., Arbesman, S., & Larremore, D. B. (2015). Systematic inequality and hierarchy in faculty hiring networks. Science Advances, 1 (1), e1400005. https://doi.org/10.1126/sciadv.1400005

Duarte-Martínez, V., Cobo, M. J., & López-Herrera, A. G. (2022). Uncovering patterns in the supervision of Spanish theses: A comprehensive analysis. Journal of Informetrics, 16 (3), 101319. https://doi.org/10.1016/j.joi.2022.101319

Gallen, Y., & Wasserman, M. (2023). Does information affect homophily? Journal of Public Economics, 222 , 104876.

Gaule, P., & Piacentini, M. (2018). An advisor like me? Advisor gender and post-graduate careers in science. Research Policy, 47 (4), 805–813. https://doi.org/10.1016/j.respol.2018.02.011

Ghiasi, G., Larivière, V., & Sugimoto, C. R. (2015). On the compliance of women engineers with a gendered scientific system. PLoS ONE, 10 (12), e0145931. https://doi.org/10.1371/journal.pone.0145931

Guba, K., Sokolov, M., & Sokolova, N. (2020). The Dynamics of Dissertation Industry in Russia, 2005–2015. Did new institutional templates change academic behavior. Ekonomicheskaya Sotsiologiya = Journal of Economic Sociology, 21 (3), 13–46.

Google Scholar  

Haake, U. (2011). Contradictory values in doctoral education: A study of gender composition in disciplines in Swedish academia. Higher Education, 62 (1), 113–127. https://doi.org/10.1007/s10734-010-9369-8

Hanson, S. L., Sykes, M., & Pena, L. B. (2017). Gender equity in science: The global context. International Journal of Social Science Studies, 6 (1), 33. https://doi.org/10.11114/ijsss.v6i1.2704

Hilmer, C., & Hilmer, M. (2007). Women helping women, men helping women? Same-gender mentoring, initial job placements, and early career publishing success for economics PhDs. American Economic Review, 97 (2), 422–426. https://doi.org/10.1257/aer.97.2.422

Holman, L., & Morandin, C. (2019). Researchers collaborate with same-gendered colleagues more often than expected across the life sciences. PLoS ONE, 14 (4), e0216128. https://doi.org/10.1371/journal.pone.0216128

Holman, L., Stuart-Fox, D., & Hauser, C. E. (2018). The gender gap in science: How long until women are equally represented? PLOS Biology, 16 (4), e2004956. https://doi.org/10.1371/journal.pbio.2004956

Huang, J., Gates, A. J., Sinatra, R., & Barabási, A.-L. (2020). Historical comparison of gender inequality in scientific careers across countries and disciplines. Proceedings of the National Academy of Sciences, 117 (9), 4609–4616. https://doi.org/10.1073/pnas.1914221117

Huisman, J., Smolentseva, A., & Froumin, I. (2018). 25 years of transformations of higher education systems in post-soviet countries: Reform and continuity . Springer International Publishing.

Book   Google Scholar  

Krasnyak, O. (2017). Gender representation in russian academic journals. The Journal of Social Policy Studies, 15 (4), 617–628. https://doi.org/10.17323/727-0634-2017-15-4-617-628

Larivière, V., Ni, C., Gingras, Y., Cronin, B., & Sugimoto, C. R. (2013). Bibliometrics: Global gender disparities in science. Nature, 504 (7479), 211–213. https://doi.org/10.1038/504211a

Lewison, G., & Markusova, V. (2011). Female researchers in Russia: Have they become more visible? Scientometrics, 89 (1), 139–152. https://doi.org/10.1007/s11192-011-0435-5

Makarova, E., Aeschlimann, B., & Herzog, W. (2019). The gender gap in STEM fields: The impact of the gender stereotype of math and science on secondary students’ career aspirations. Frontiers in Education, 4 , 60. https://doi.org/10.3389/feduc.2019.00060

Martín-Martín, A., Orduna-Malea, E., Thelwall, M., & Delgado López-Cózar, E. (2018). Google scholar, web of science, and scopus: A systematic comparison of citations in 252 subject categories. Journal of Informetrics, 12 (4), 1160–1177. https://doi.org/10.1016/j.joi.2018.09.002

Miller, D. I., Eagly, A. H., & Linn, M. C. (2015). Women’s representation in science predicts national gender-science stereotypes: Evidence from 66 nations. Journal of Educational Psychology, 107 (3), 631–644. https://doi.org/10.1037/edu0000005

Misra, J., Lundquist, J. H., & Templer, A. (2012). Gender, work time, and care responsibilities among faculty1: gender, work time, and care responsibilities among faculty. Sociological Forum, 27 (2), 300–323. https://doi.org/10.1111/j.1573-7861.2012.01319.x

Mongeon, P., & Paul-Hus, A. (2016). The journal coverage of web of science and scopus: A comparative analysis. Scientometrics, 106 (1), 213–228. https://doi.org/10.1007/s11192-015-1765-5

Morgan, A. C., Way, S. F., Hoefer, M. J. D., Larremore, D. B., Galesic, M., & Clauset, A. (2021). The unequal impact of parenthood in academia. Science Advances, 7 (9), eabd1996. https://doi.org/10.1126/sciadv.abd1996

Nakajima, K., Liu, R., Shudo, K., & Masuda, N. (2023). Quantifying gender imbalance in East Asian academia: Research career and citation practice. Journal of Informetrics, 17 (4), 101460.

Paul-Hus, A., Bouvier, R. L., Ni, C., Sugimoto, C. R., Pislyakov, V., & Larivière, V. (2015). Forty years of gender disparities in Russian science: A historical bibliometric analysis. Scientometrics, 102 (2), 1541–1553. https://doi.org/10.1007/s11192-014-1386-4

Pilkina, M., & Lovakov, A. (2022). Gender disparities in Russian academia: A bibliometric analysis. Scientometrics, 127 (6), 3577–3591. https://doi.org/10.1007/s11192-022-04383-w

Porter, C., & Serra, D. (2020). Gender differences in the choice of major: The importance of female role models. American Economic Journal: Applied Economics, 12 (3), 226–254. https://doi.org/10.1257/app.20180426

Régner, I., Thinus-Blanc, C., Netter, A., Schmader, T., & Huguet, P. (2019). Committees with implicit biases promote fewer women when they do not believe gender bias exists. Nature Human Behaviour, 3 (11), 1171–1179. https://doi.org/10.1038/s41562-019-0686-3

Sánchez-Jiménez, R., Botezan, I., Barrasa-Rodríguez, J., Suárez-Figueroa, M. C., & Blázquez-Ochando, M. (2023). Gender imbalance in doctoral education: An analysis of the Spanish university system (1977–2021). Scientometrics, 128 (4), 2577–2599. https://doi.org/10.1007/s11192-023-04648-y

Schwartz, L. P., Liénard, J. F., & David, S. V. (2022). Impact of gender on the formation and outcome of formal mentoring relationships in the life sciences. PLOS Biology, 20 (9), e3001771. https://doi.org/10.1371/journal.pbio.3001771

Seeber, M., & Horta, H. (2021). No road is long with good company. What factors affect Ph.D. student’s satisfaction with their supervisor? Higher Education Evaluation and Development, 15 (1), 2–18. https://doi.org/10.1108/HEED-10-2020-0044

Shapiro, J. R., & Williams, A. M. (2012). The role of stereotype threats in undermining girls’ and women’s performance and interest in STEM fields. Sex Roles, 66 (3), 175–183. https://doi.org/10.1007/s11199-011-0051-0

Shaw, A. K., & Stanton, D. E. (2012). Leaks in the pipeline: Separating demographic inertia from ongoing gender differences in academia. Proceedings of the Royal Society B: Biological Sciences, 279 (1743), 3736–3741. https://doi.org/10.1098/rspb.2012.0822

Sheltzer, J. M., & Smith, J. C. (2014). Elite male faculty in the life sciences employ fewer women. Proceedings of the National Academy of Sciences, 111 (28), 10107–10112. https://doi.org/10.1073/pnas.1403334111

Sterligov, I. (2017). Gender and income disparities among russian academic CEOs. HERB Issue Women in Academia, 4 (14), 12–14.

Stoet, G., & Geary, D. C. (2018). The gender-equality paradox in science, technology, engineering, and mathematics education. Psychological Science, 29 (4), 581–593. https://doi.org/10.1177/0956797617741719

Sugimoto, C. R., & Larivière, V. (2023). Equity for women in science: Dismantling systemic barriers to advancement . Harvard University Press.

Thelwall, M., & Mas-Bleda, A. (2020). A gender equality paradox in academic publishing: Countries with a higher proportion of female first-authored journal articles have larger first-author gender disparities between fields. Quantitative Science Studies, 1 (3), 1260–1282.

UNESCO Institute for Statistics. (n.d.). Percentage of teachers in tertiary education who are female (%). Data retrieved April 25, 2024, from http://data.uis.unesco.org/index.aspx?queryid=3801#

Van Den Besselaar, P., & Sandström, U. (2016). Gender differences in research performance and its impact on careers: A longitudinal case study. Scientometrics, 106 (1), 143–162. https://doi.org/10.1007/s11192-015-1775-3

Villarroya, A., Barrios, M., Borrego, A., & Frías, A. (2008). PhD theses in Spain: A gender study covering the years 1990–2004. Scientometrics, 77 (3), 469–483. https://doi.org/10.1007/s11192-007-1965-8

Witteman, H. O., Hendricks, M., Straus, S., & Tannenbaum, C. (2019). Are gender gaps due to evaluations of the applicant or the science? A natural experiment at a national funding agency. The Lancet, 393 (10171), 531–540. https://doi.org/10.1016/S0140-6736(18)32611-4

Zheng, X., Yuan, H., & Ni, C. (2022). How parenthood contributes to gender gaps in academia. eLife, 11 , e78909. https://doi.org/10.7554/eLife.78909

Download references

This study was funded by the Russian Science Foundation, grant #21–78-10102.

Author information

Authors and affiliations.

Center for Institutional Analysis of Science and Education, European University at St. Petersburg, 191187 Shpalernaya 1, St. Petersburg, Russia

Elena Chechik

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Elena Chechik .

Ethics declarations

Conflict of interest.

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 390 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Chechik, E. Gender disparities in research fields in Russia: dissertation authors and their mentors. Scientometrics (2024). https://doi.org/10.1007/s11192-024-05018-y

Download citation

Received : 02 September 2023

Accepted : 05 April 2024

Published : 10 May 2024

DOI : https://doi.org/10.1007/s11192-024-05018-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Dissertation
  • Research fields
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. The Science Behind Dissertation Data Analysis

    dissertation in data science

  2. Data analysis section of dissertation. How to Use Quantitative Data

    dissertation in data science

  3. (PDF) Towards Data Science

    dissertation in data science

  4. Writing the Best Dissertation Data Analysis Possible

    dissertation in data science

  5. Structure of the dissertation (part 1)

    dissertation in data science

  6. Data analysis sample dissertation proposal

    dissertation in data science

VIDEO

  1. DATA SCIENCE [MODULE-2]

  2. 77344859

  3. Why Data Science?

  4. "Learn data science the best way possible with Datacamp" #data #datascience

  5. Session 1: DATA ANALYSIS TRAINING FOR DISSERTATION WRITING USING SPSS

  6. Unleashing The Power Of Data Science To Transform Industries

COMMENTS

  1. Computational and Data Sciences (PhD) Dissertations

    Computational and Data Sciences (PhD) Dissertations. Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries ...

  2. How to write a great data science thesis

    They will stress the importance of structure, substance and style. They will urge you to write down your methodology and results first, then progress to the literature review, introduction and conclusions and to write the summary or abstract last. To write clearly and directly with the reader's expectations always in mind.

  3. 10 Compelling Machine Learning Ph.D. Dissertations for 2020

    This dissertation explores three topics related to random forests: tree aggregation, variable importance, and robustness. 10. Climate Data Computing: Optimal Interpolation, Averaging, Visualization and Delivery. This dissertation solves two important problems in the modern analysis of big climate data.

  4. Doctor of Data Science and Analytics Dissertations

    The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests. We launched the first formal PhD program in Data Science in 2015.

  5. PhD Dissertations

    PhD Dissertations [All are .pdf files] Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023. Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023. METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023. Applied Mathematics of the Future Kin G. Olivares, 2023

  6. PhD in Data Science

    PhD in Analytics and Data Science. Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

  7. 17 Compelling Machine Learning Ph.D. Dissertations

    This dissertation revisits and makes progress on some old but challenging problems concerning least squares estimation, the work-horse of supervised machine learning. Two major problems are addressed: (i) least squares estimation with heavy-tailed errors, and (ii) least squares estimation in non-Donsker classes.

  8. Getting a PhD in Data Science: What You Need to Know

    Typically, the entry-level degree to get a data science position is a bachelor's degree, meaning that even just an undergraduate degree could help you land a job that earns a higher than average salary. Nonetheless, a PhD will likely prepare you for more advanced positions that could offer higher pay than less specialized roles.

  9. Thesis Option

    Data Science master's students can choose to satisfy the research experience requirement by selecting the thesis option. Students will spend the majority of their second year working on a substantial data science project that culminates in the submission and oral defense of a master's thesis. While all thesis projects must be related to data science, students are given leeway in finding a ...

  10. Five Tips For Writing A Great Data Science Thesis

    Although educational programs, conventions and thesis requirements vary wildly, I hope to offer some common guidelines for any student currently working on a Data Science thesis. The article offers five guidance points, but may effectively be summarized in a single line: "Write for your reader, not for yourself."

  11. Thesis/Capstone for Master's in Data Science

    Data Science; Capstone and Thesis Overview; Capstone and Thesis Overview. Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can ...

  12. Top 10 Essential Data Science Topics to Real-World Application From the

    1. Introduction. Statistics and data science are more popular than ever in this era of data explosion and technological advances. Decades ago, John Tukey (Brillinger, 2014) said, "The best thing about being a statistician is that you get to play in everyone's backyard."More recently, Xiao-Li Meng (2009) said, "We no longer simply enjoy the privilege of playing in or cleaning up everyone ...

  13. Ten Research Challenge Areas in Data Science

    Abstract. To drive progress in the field of data science, we propose 10 challenge areas for the research community to pursue. Since data science is broad, with methods drawing from computer science, statistics, and other disciplines, and with applications appearing in all sectors, these challenge areas speak to the breadth of issues spanning science, technology, and society.

  14. 2021

    Wentian Huang - "Nonparametric and semiparametric approaches to functional data modeling" Dissertation Advisor: David Ruppert Initial job placement: Ernst & Young Binh Tang - "Deep probabilistic models for sequential prediction" Dissertation Advisor: David Matteson Initial job placement: Amazon Yi Su - "Off-policy evaluation and learning for interactive systems" Dissertation Advisor ...

  15. 2019

    Praphruetpong (Ben) Athiwaratkun - "Density representations for words and hierarchical data" Dissertation Advisor: Andrew Wilson Initial job placement: AI Scientist - AWS AI Labs Yiming Sun - "High dimensional data analysis with dependency and under limited memory" Dissertation Advisor: Sumanta Basu and Madeleine Udell Initial job placement: Applied Scientist - Amazon Zi Ye - "Functional ...

  16. PhD in Data Science and Analytics

    The Ph.D. in Data Science and Analytics requires 78 total credit hours spread over four years of study. Example Program of Study: Year 2. Students take up to 9 credit hours of 6000- or 7000-level courses in DS, STAT, or CS with permission of the program director.

  17. data science Latest Research Papers

    Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia. Fuel . 10.1016/j.fuel.2021.123098 . 2022 . Vol 314 . pp. 123098. Author (s): Muhammad Mohsin . Sobia Naseem .

  18. 10 Compelling Machine Learning Dissertations from Ph.D. Students

    Ph.D. candidates are highly motivated to choose research topics that establish new and creative paths toward discovery in their field of study. In this article, I present 10 compelling machine learning dissertations that I found interesting in terms of my own areas of pursuit. I hope you'll find several of them that match your own interests.

  19. MIT Theses

    MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

  20. Dissertations

    Indexes two-million dissertations from over one-thousand institutions, with citations from 1861-1980 and abstracts from 1980 to present. Includes the full text of most post-1996 CUNY dissertations and many post-1996 dissertations from other institutions, as well as thousands of earlier ones.

  21. Recent Dissertation Topics

    2015. 2014. 2013. 2012. 2011. 2010. 2009. 2008. This list of recent dissertation topics shows the range of research areas that our students are working on.

  22. OATD

    You may also want to consult these sites to search for other theses: Google Scholar; NDLTD, the Networked Digital Library of Theses and Dissertations.NDLTD provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not. Proquest Theses and Dissertations (PQDT), a database of dissertations and theses, whether they were published ...

  23. MS in Data Science

    The Master's in Data Science (MSDS) has been developed for students interested in pursuing a research career in data science with courses taught by faculty from the departments of statistics, computer science, and other departments across the university. MSDS students choose among the many introductory graduate courses offered to students in ...

  24. Dissertation examples

    Dissertation examples. Listed below are some of the best examples of research projects and dissertations from undergraduate and taught postgraduate students at the University of Leeds We have not been able to gather examples from all schools. The module requirements for research projects may have changed since these examples were written.

  25. CCMB and CNTR graduate eight Ph.D. students this year

    Their dissertations cover a wide range of topics in data science, from modeling health systems to computational biology to cryptology. We are incredibly proud of their work and contributions towards improving lives in a data-driven world. Congratulations! Dilum Aluthge, Ph.D. Advisor: Indra Neil Sarkar, Ph.D., MLIS, FACMI

  26. Program: Master of Science in Data Analytics

    The Master of Science of Data Analytics (MSDA) degree enables students to learn the techniques and skills needed to work with diverse data sets, a range of analytics platforms and reporting tools, to ultimately tell an actionable data driven story, tell that story right, and tell it right now. ... Internship, Practicum, and/or Dissertation ...

  27. OMICS Technologies and Data Science in Biomedicine*

    Unique to Austria. The English-language master degree programme in OMICS Technologies and Data Science in Biomedicine spans 4 semesters. The programme offers a comprehensive approach to biotechnological research. It aims to prepare you for versatile careers in interdisciplinary and multiprofessional contexts in industry, research and healthcare.

  28. M.S. Physics Thesis Talk: Genessa Benton

    M.S. Physics Thesis Talk: Genessa Benton - "Data-driven estimates for light-quark connected and strange plus light-quark disconnected hadronic g-2 window quantities" Thursday, May 16, 2024 Event Time 12:30 p.m. - 01:30 p.m. PT

  29. A love of marine biology and data analysis

    The UTA College of Science, a Carnegie R1 research institution, is preparing the next generation of leaders in science through innovative education and hands-on research and offers programs in Biology, Chemistry & Biochemistry, Data Science, Earth & Environmental Sciences, Health Professions, Mathematics, Physics and Psychology.

  30. Gender disparities in research fields in Russia: dissertation authors

    This study examines gender disparities in research fields as measured by scientific output in dissertations at two levels within the Russian academic system: PhD and the more advanced Doctor of Science (DS). The data for this study were extracted from over 250,000 dissertations spanning from 2005 to 2016. The chosen data source offers several advantages over bibliometric data for the purpose ...