data science Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Documentation matters: human-centered ai system to assist data science code documentation in computational notebooks.

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

Data science in the business environment: Insight management for an Executive MBA

Adventures in financial data science, gecoagent: a conversational agent for empowering genomic data extraction and analysis.

With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools. This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

Differentially Private Medical Texts Generation Using Generative Neural Networks

Technological advancements in data science have offered us affordable storage and efficient algorithms to query a large volume of data. Our health records are a significant part of this data, which is pivotal for healthcare providers and can be utilized in our well-being. The clinical note in electronic health records is one such category that collects a patient’s complete medical information during different timesteps of patient care available in the form of free-texts. Thus, these unstructured textual notes contain events from a patient’s admission to discharge, which can prove to be significant for future medical decisions. However, since these texts also contain sensitive information about the patient and the attending medical professionals, such notes cannot be shared publicly. This privacy issue has thwarted timely discoveries on this plethora of untapped information. Therefore, in this work, we intend to generate synthetic medical texts from a private or sanitized (de-identified) clinical text corpus and analyze their utility rigorously in different metrics and levels. Experimental results promote the applicability of our generated data as it achieves more than 80\% accuracy in different pragmatic classification problems and matches (or outperforms) the original text data.

Impact on Stock Market across Covid-19 Outbreak

Abstract: This paper analysis the impact of pandemic over the global stock exchange. The stock listing values are determined by variety of factors including the seasonal changes, catastrophic calamities, pandemic, fiscal year change and many more. This paper significantly provides analysis on the variation of listing price over the world-wide outbreak of novel corona virus. The key reason to imply upon this outbreak was to provide notion on underlying regulation of stock exchanges. Daily closing prices of the stock indices from January 2017 to January 2022 has been utilized for the analysis. The predominant feature of the research is to analyse the fact that does global economy downfall impacts the financial stock exchange. Keywords: Stock Exchange, Matplotlib, Streamlit, Data Science, Web scrapping.

Information Resilience: the nexus of responsible and agile approaches to information use

AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

Alzheimers disease (AD) is a brain disorder that is mainly characterized by a progressive degeneration of neurons in the brain, causing a decline in cognitive abilities and difficulties in engaging in day-to-day activities. This study compares an FFT-based spectral analysis against a functional connectivity analysis based on phase synchronization, for finding known differences between AD patients and Healthy Control (HC) subjects. Both of these quantitative analysis methods were applied on a dataset comprising bipolar EEG montages values from 20 diagnosed AD patients and 20 age-matched HC subjects. Additionally, an attempt was made to localize the identified AD-induced brain activity effects in AD patients. The obtained results showed the advantage of the functional connectivity analysis method compared to a simple spectral analysis. Specifically, while spectral analysis could not find any significant differences between the AD and HC groups, the functional connectivity analysis showed statistically higher synchronization levels in the AD group in the lower frequency bands (delta and theta), suggesting that the AD patients brains are in a phase-locked state. Further comparison of functional connectivity between the homotopic regions confirmed that the traits of AD were localized in the centro-parietal and centro-temporal areas in the theta frequency band (4-8 Hz). The contribution of this study is that it applies a neural metric for Alzheimers detection from a data science perspective rather than from a neuroscience one. The study shows that the combination of bipolar derivations with phase synchronization yields similar results to comparable studies employing alternative analysis methods.

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

A growing number of physical objects with embedded sensors with typically high volume and frequently updated data sets has accentuated the need to develop methodologies to extract useful information from big data for supporting decision making. This study applies a suite of data analytics and core principles of data science to characterize near real-time meteorological data with a focus on extreme weather events. To highlight the applicability of this work and make it more accessible from a risk management perspective, a foundation for a software platform with an intuitive Graphical User Interface (GUI) was developed to access and analyze data from a decommissioned nuclear production complex operated by the U.S. Department of Energy (DOE, Richland, USA). Exploratory data analysis (EDA), involving classical non-parametric statistics, and machine learning (ML) techniques, were used to develop statistical summaries and learn characteristic features of key weather patterns and signatures. The new approach and GUI provide key insights into using big data and ML to assist site operation related to safety management strategies for extreme weather events. Specifically, this work offers a practical guide to analyzing long-term meteorological data and highlights the integration of ML and classical statistics to applied risk and decision science.

Export Citation Format

Share document.

research papers on data science pdf

Data Science Journal

Press logo

research papers on data science pdf

International Journal of Data Science and Analytics

  • Focuses on fundamental and applied research outcomes in data and analytics theories, technologies and applications.
  • Promotes new scientific and technological approaches for strategic value creation in data-rich applications.
  • Encourages transdisciplinary and cross-domain collaborations.
  • Strives to bring together researchers, industry practitioners, and potential users of data science and analytics.
  • Addresses challenges ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization.

research papers on data science pdf

Latest issue

Volume 17, Issue 3

Latest articles

K -trickle: performance evaluation and impact on quality of service in resource-constrained networks.

  • P. Arivubrakan
  • G. R. Kanagachidambaresan

research papers on data science pdf

Stopping fake news: Who should be banned?

  • Pablo Ignacio Fierens
  • Leandro Chaves Rêgo

research papers on data science pdf

An efficient machine learning approach for extracting eSports players’ distinguishing features and classifying their skill levels using symbolic transfer entropy and consensus nested cross-validation

  • Amin Noroozi
  • Mohammad S. Hasan
  • Ying-Ying Law

research papers on data science pdf

Alternative feature selection with user control

  • Klemens Böhm

research papers on data science pdf

Forecasting implied volatilities of currency options with machine learning techniques and econometrics models

  • Asbjørn Olsen
  • Gard Djupskås
  • Morten Risstad

research papers on data science pdf

Journal updates

Cfp: theoretical and practical data science and analytics .

Submission Deadline: 15 April 2024

Guest Editor: Fragkiskos Malliaros

CfP: Innovative Hardware and Architectures for Ubiquitous Data Science

Submission Deadline: 10 September 2023

Guest Editors: Dr. Faheem Khan, Dr. Umme Laila, Dr. Muhammad Adnan Khan.

CfP: CCF BigData conference Journal Track on ‘Data Science in China’

Cfp: learning from temporal data.

Submission Deadline: 17 November 2023

Guest Editors: João Mendes-Moreira, Joydeep Chandra, Albert Bifet

Journal information

  • EI Compendex
  • Emerging Sources Citation Index
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • OCLC WorldCat Discovery Service
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Springer policies

© Springer Nature Switzerland AG

  • Find a journal
  • Publish with us
  • Track your research

Data Science and Artificial Intelligence

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Help | Advanced Search

Computer Science > Computation and Language

Title: realm: reference resolution as language modeling.

Abstract: Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

tableau.com is not available in your region.

Read our research on: Gun Policy | International Conflict | Election 2024

Regions & Countries

6 facts about americans and tiktok, rising numbers of americans say jews and muslims face a lot of discrimination, what’s it like to be a teacher in america today.

77% of public K-12 teachers say their job is frequently stressful, and 52% would not advise a young person starting out today to become a teacher.

  • In their own words: What Public K-12 Teachers Want Americans To Know About Teaching

About half of Americans say public K-12 education is going in the wrong direction

The hardships and dreams of asian americans living in poverty, most americans favor legalizing marijuana for medical, recreational use, sign up for our weekly newsletter.

Fresh data delivered Saturday mornings

Latest Publications

How common is religious fasting in the united states.

In the United States, 21% of adults overall say they fast for certain periods during holy times.

A majority of those who say it’s headed in the wrong direction say a major reason is that schools are not spending enough time on core academic subjects.

What Public K-12 Teachers Want Americans To Know About Teaching

Many public K-12 teachers say people should know that teaching is hard job, and that teachers care about students and deserve respect.

Public K-12 teachers express low job satisfaction and few are optimistic about the future of U.S. education.

62% of U.S. adults under 30 say they use TikTok, compared with 39% of those ages 30 to 49, 24% of those 50 to 64, and 10% of those 65 and older.

Most Popular

Sign up for the briefing.

Weekly updates on the world of news & information

Politics & Policy

Americans’ top policy priority for 2024: strengthening the economy.

Growing shares of Republicans rate immigration and terrorism as top priorities for the president and Congress this year.

Majorities of adults see decline of union membership as bad for the U.S. and working people

How republicans view climate change and energy issues, how americans view the situation at the u.s.-mexico border, its causes and consequences, from businesses and banks to colleges and churches: americans’ views of u.s. institutions, how people in 24 countries think democracy can improve, an audio tour through america’s top-ranked podcasts, tuning out: americans on the edge of politics, do you tip more or less often than the average american, international affairs, many in east asia say men and women make equally good leaders, despite few female heads of government.

When Taiwanese President Tsai Ing-wen’s term ends in May, only one woman will serve as head of government anywhere in Asia, excluding the Pacific Islands.

What Can Improve Democracy?

Amid growing discontent with the state of democracy globally, we asked over 30,000 people what changes would make their democracy work better.

How Americans view the conflicts between Russia and Ukraine, Israel and Hamas, and China and Taiwan

74% of Americans view the war between Russia and Ukraine as important to U.S. national interests – with 43% describing it as very important.

8 charts on technology use around the world

In most countries surveyed, around nine-in-ten or more adults are online. In South Korea, 99% of adults use the internet.

Internet & Technology

Majorities in most countries surveyed say social media is good for democracy.

Across 27 countries surveyed, people generally see social media as more of a good thing than a bad thing for democracy.

Americans’ Social Media Use

YouTube and Facebook are by far the most used online platforms among U.S. adults. But TikTok’s user base has grown significantly in recent years: 33% of U.S. adults now say they use it, up from 21% in 2021.

How U.S. Adults Use TikTok

About half of all U.S. adults who use TikTok have never posted a video themselves. And the top 25% of U.S. adults on the site by posting volume produce 98% of all publicly accessible videos from this group. Users who have posted videos are generally more active on the platform than non-posters.

Race & Ethnicity

Latinos’ views on the migrant situation at the u.s.-mexico border.

U.S. Hispanics are less likely than other Americans to say increasing deportations or a larger wall along the border will help the situation.

U.S. Christians more likely than ‘nones’ to say situation at the border is a crisis

Majorities of White Christian groups say the large number of migrants seeking to enter at the border with Mexico is a “crisis” for the United States.

Black Americans’ Views on Success in the U.S.

While Black adults define personal and financial success in different ways, most see these measures of success as major sources of pressure in their lives.

Among Black adults, those with higher incomes are most likely to say they are happy

Black adults in upper-income families are about twice as likely as those in lower-income families to say they are extremely or very happy.

Our Methods

research papers on data science pdf

Our Experts

“A record 23 million Asian Americans trace their roots to more than 20 countries … and the U.S. Asian population is projected to reach 46 million by 2060.”

research papers on data science pdf

Methods 101 Videos

Methods 101: random sampling.

The first video in Pew Research Center’s Methods 101 series helps explain random sampling – a concept that lies at the heart of all probability-based survey research – and why it’s important.

Methods 101: Survey Question Wording

Methods 101: mode effects, methods 101: what are nonprobability surveys, add pew research center to your alexa.

Say “Alexa, enable the Pew Research Center flash briefing”

Signature Reports

Race and lgbtq issues in k-12 schools, representative democracy remains a popular ideal, but people around the world are critical of how it’s working, americans’ dismal views of the nation’s politics, measuring religion in china, diverse cultures and shared experiences shape asian american identities, parenting in america today, editor’s picks, religious ‘nones’ in america: who they are and what they believe, among young adults without children, men are more likely than women to say they want to be parents someday, fewer young men are in college, especially at 4-year schools, about 1 in 5 u.s. teens who’ve heard of chatgpt have used it for schoolwork, women and political leadership ahead of the 2024 election, #blacklivesmatter turns 10, immigration & migration, migrant encounters at the u.s.-mexico border hit a record high at the end of 2023, what we know about unauthorized immigrants living in the u.s., latinos’ views of and experiences with the spanish language, social media, how teens and parents approach screen time, 5 facts about how americans use facebook, two decades after its launch, a declining share of adults, and few teens, support a u.s. tiktok ban, 81% of u.s. adults – versus 46% of teens – favor parental consent for minors to use social media, how americans view data privacy.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

COMMENTS

  1. data science Latest Research Papers

    Data Science . Information Use . Regulatory Compliance . Future Research . Public And Private . Social Good . Public And Private Sector . Effective Use. AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations.

  2. Data science: a game changer for science and innovation

    This paper shows data science's potential for disruptive innovation in science, industry, policy, and people's lives. We present how data science impacts science and society at large in the coming years, including ethical problems in managing human behavior data and considering the quantitative expectations of data science economic impact. We introduce concepts such as open science and e ...

  3. (PDF) Data Science: the impact of statistics

    In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods. to find structure in and to give deeper insight into data, and ...

  4. Ten Research Challenge Areas in Data Science

    Ten Research Challenge Areas in Data Science. To drive progress in the field of data science, we propose 10 challenge areas for the research community to pursue. Since data science is broad, with methods drawing from computer science, statistics, and other disciplines, and with applications appearing in all sectors, these challenge areas speak ...

  5. PDF Data Science Methodologies: Current Challenges and Future Approaches

    data science research activities, along the implications of dif-ferent methods for executing industry and business projects. At present, data science is a young field and conveys the impres-Preprint submitted to Big Data Research - Elsevier January 6, 2020 arXiv:2106.07287v2 [cs.LG] 14 Jan 2022

  6. (PDF) What Is Data Science?

    science. Data S cience is a body of principles and techniques for applying data analytic. methods to data at scal e, including volume, velocity, and variety, to accelerate the. investig ation of ...

  7. Data Science: the impact of statistics

    In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty. We give an overview over different proposed structures of Data Science and address the impact of statistics on such steps as data ...

  8. Data Science and Analytics: An Overview from Data-Driven Smart

    The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various applications domains. In the area of data science ...

  9. Data Science: A Comprehensive Overview

    Refers to the theories, technologies, tools, and processes that enable an in-depth understanding and discovery of actionable insight into data. Data analytics consists of descriptive analytics, predictive analytics, and prescriptive analytics. Data science. Is the science of data. Data scientist.

  10. Data science approaches to confronting the COVID-19 pandemic: a

    1. Introduction. The use of data science methodologies in medicine and public health has been enabled by the wide availability of big data of human mobility, contact tracing, medical imaging, virology, drug screening, bioinformatics, electronic health records and scientific literature along with the ever-growing computing power [1-4].With these advances, the huge passion of researchers and ...

  11. [2007.03606] Data Science: A Comprehensive Overview

    View PDF Abstract: The twenty-first century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics.

  12. Harvard Data Science Review

    As an open access platform of the Harvard Data Science Initiative, Harvard Data Science Review (HDSR) features foundational thinking, research milestones, educational innovations, and major applications, with a primary emphasis on reproducibility, replicability, and readability.We aim to publish content that help define and shape data science as a scientifically rigorous and globally impactful ...

  13. (PDF) A Review of Artificial Intelligence Methods for Data Science and

    PDF | On Aug 1, 2018, C V Krishna and others published A Review of Artificial Intelligence Methods for Data Science and Data Analytics: Applications and Research Challenges | Find, read and cite ...

  14. Articles

    The CODATA Data Science Journal is a peer-reviewed, open access, electronic journal, publishing papers on the management, dissemination, use and reuse of research data and databases across all research domains, including science, technology, the humanities and the arts. The scope of the journal includes descriptions of data systems, their implementations and their publication, applications ...

  15. A Deep Dissertion of Data Science: Related Issues and its Applications

    Section II of this paper consists of the different review regarding data science. Section III of this paper illustrates about the complete process of data science. Section IV describes all the related research issues for data science. At the end the paper is concluded with some suggested future work regarding data science.

  16. Home

    Overview. The International Journal of Data Science and Analytics is a pioneering journal in data science and analytics, publishing original and applied research outcomes. Focuses on fundamental and applied research outcomes in data and analytics theories, technologies and applications. Promotes new scientific and technological approaches for ...

  17. PDF A Hands-On Introduction to Data Science

    His research focuses on issues of search and recommendations using data mining and machine learning. Dr. Shah received his M.S. in Computer Science from the University ... 1.3.5 Data Science, Social Science, and Computational Social Science 14 1.4 The Relationship between Data Science and Information Science 15 1.4.1 Information vs. Data 16

  18. (PDF) Data Science and Applications

    This paper investigates the significance of data science as an indispensable instrument for decision-making across multiple domains. The study examines the history, concepts, methods, and ...

  19. Data Science and Artificial Intelligence

    The articles in this special section are dedicated to the application of artificial intelligence AI), machine learning (ML), and data analytics to address different problems of communication systems, presenting new trends, approaches, methods, frameworks, systems for efficiently managing and optimizing networks related operations. Even though AI/ML is considered a key technology for next ...

  20. Research on Data Science, Data Analytics and Big Data

    Abstract. Big Data refers to a huge volume of data of various types, i.e., structured, semi structured, and unstructured. This data is generated through various digital channels such as mobile, Internet, social media, e-commerce websites, etc. Big Data has proven to be of great use since its inception, as companies started realizing its importance for various business purposes.

  21. (PDF) Top 20 Data Science Research Topics and Areas For the 2020-2030

    The following are the hottest data science topics and areas that any aspiring data. scientist should know whether they are data analysts or just business intelligence specialists who aim to ...

  22. [2403.20329] ReALM: Reference Resolution As Language Modeling

    ReALM: Reference Resolution As Language Modeling. Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the ...

  23. Tableau Research

    About. Tableau Research is an industrial research team focused on Tableau's mission of helping people see and understand data. We actively work to be a source of new and inspiring product and technology directions, generating ideas that influence, drive, or significantly change what Tableau delivers to customers.

  24. 69901 PDFs

    Data science combines the power of computer science and applications, modeling, statistics, engineering, economy and analytics. Whereas a... | Explore the latest full-text research PDFs, articles ...

  25. Pew Research Center

    About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  26. (PDF) Data Science in Healthcare

    Abstract. Data science is an interdisciplinary field that applies numerous techniques, such as machine learning (ML), neural networks (NN) and artificial intelligence (AI), to create value, based ...