data analysis researchgate

Quantitative Data Analysis

A Companion for Accounting and Information Systems Research

© 2017
Willem Mertens 0 ,
Amedeo Pugliese 1 ,
Jan Recker ORCID: https://orcid.org/0000-0002-2072-5792 2

QUT Business School, Queensland University of Technology, Brisbane, Australia

You can also search for this author in PubMed Google Scholar

Dept. of Economics and Management, University of Padova, Padova, Italy

School of accountancy, queensland university of technology, brisbane, australia.

Offers a guide through the essential steps required in quantitative data analysis
Helps in choosing the right method before starting the data collection process
Presents statistics without the math!
Offers numerous examples from various diciplines in accounting and information systems
No need to invest in expensive and complex software packages

49k Accesses

24 Citations

13 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

Available as EPUB and PDF
Read on any device
Instant download
Own it forever
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

Accounting and Economics

Management Accounting and Partial Least Squares-Structural Equation Modelling (PLS-SEM): Some Illustrative Examples

A Brief Historical Appreciation of Accounting Theory? But Who Cares?

quantitative data analysis
nested models
quantitative data analysis method
building data analysis skills

Table of contents (9 chapters)

Front matter, introduction.

Willem Mertens, Amedeo Pugliese, Jan Recker

Comparing Differences Across Groups

Assessing (innocuous) relationships, models with latent concepts and multiple relationships: structural equation modeling, nested data and multilevel models: hierarchical linear modeling, analyzing longitudinal and panel data, causality: endogeneity biases and possible remedies, how to start analyzing, test assumptions and deal with that pesky p -value, keeping track and staying sane, back matter, authors and affiliations.

Willem Mertens

Amedeo Pugliese

About the authors

Willem Mertens is a Postdoctoral Research Fellow at Queensland University of Technology, Brisbane, Australia, and a Research Fellow of Vlerick Business School, Belgium. His main research interests lie in the areas of innovation, positive deviance and organizational behavior in general.

Amedeo Pugliese (PhD, University of Naples, Federico II) is currently Associate Professor of Financial Accounting and Governance at the University of Padova and Colin Brain Research Fellow in Corporate Governance and Ethics at Queensland University of Technology. His research interests span across boards of directors and the role of financial information and corporate disclosure on capital markets. Specifically he is studying how information risk faced by board members and its effects on the decision-making quality and monitoring in the boardroom.

Jan Recker is Alexander-von-Humboldt Fellow and tenured Full Professor of Information Systems at Queensland University of Technology. His research focuses on process-oriented systems analysis, Green Information Systems and IT-enabled innovation. He has written a textbook on scientific research in Information Systems that is used in many doctoral programs all over the world. He is Editor-in-Chief of the Communications of the Association for Information Systems, and Associate Editor for the MIS Quarterly.

Bibliographic Information

Book Title : Quantitative Data Analysis

Book Subtitle : A Companion for Accounting and Information Systems Research

Authors : Willem Mertens, Amedeo Pugliese, Jan Recker

DOI : https://doi.org/10.1007/978-3-319-42700-3

Publisher : Springer Cham

eBook Packages : Business and Management , Business and Management (R0)

Hardcover ISBN : 978-3-319-42699-0 Published: 10 October 2016

Softcover ISBN : 978-3-319-82640-0 Published: 14 June 2018

eBook ISBN : 978-3-319-42700-3 Published: 29 September 2016

Edition Number : 1

Number of Pages : X, 164

Number of Illustrations : 9 b/w illustrations, 20 illustrations in colour

Topics : Business Information Systems , Statistics for Business, Management, Economics, Finance, Insurance , Information Systems and Communication Service , Corporate Governance , Methodology of the Social Sciences

Publish with us

Policies and ethics

Find a journal
Track your research

Advancing Big Data Best Practices

Why the Enterprise Big Data Framework Alliance?
What We Offer
Enterprise Big Data Framework

Learn about indivual memberships.

Enterprise Big Data Framework Alliance - Individual Membership

Learn about enterprise memberships.

Big Data Framework Alliance - Corporate Membership

Learn about Educator Memberships

Memberships

Certifications

Enterprise Big Data Professional (EBDP®)

Enterprise Big Data Analyst (EBDA®)

Enterprise Big Data Scientist (EBDS®)

Enterprise Big Data Engineer (EBDE®)

Enterprise Big Data Architect (EBDAR®)

Certificates

Data Literacy Fundamentals

Data Governance Fundamentals

Data Management Fundamentals

Data Security & Privacy Fundamentals

Training & Exams

Certification Overview

Enterprise Big Data Professional

Enterprise Big Data Analyst

Enterprise Big Data Scientist

Enterprise Big Data Engineer

Enterprise Big Data Architect

Ambassador Program

Learn about the EBDFA Ambassador Program

Academic Partners

Learn about the terms and benefits of the EBDFA Academic Partner Program

Training Partners

Learn how to become an Accredited Training Organization

Corporate Partners

Join the Corporate Partner Program and connect with the EBDFA community.

Partnerships

Become an Ambassador

Become an Academic Partner New
Become a Corporate Partner

Become a Training Partner

Find a Training Partner
Blog & Big Data News
Big Data Events & Webinars

Big Data Days 2024

Big Data Knowledge Base

Big Data Talks Podcast

Free Downloads & Store

What is Data Analysis? An Introductory Guide

The data analysis process, key data analysis skills, start your journey into data analysis with the official enterpise big data analyst certification, data analysis examples in the enterprise, frequently asked questions (faqs).

Data analysis is the process of inspecting, cleaning, transforming, and modeling data to derive meaningful insights and make informed decisions. It involves examining raw data to identify patterns, trends, and relationships that can be used to understand various aspects of a business, organization, or phenomenon. This process often employs statistical methods, machine learning algorithms, and data visualization techniques to extract valuable information from data sets.

At its core, data analysis aims to answer questions, solve problems, and support decision-making processes. It helps uncover hidden patterns or correlations within data that may not be immediately apparent, leading to actionable insights that can drive business strategies and improve performance. Whether it’s analyzing sales figures to identify market trends, evaluating customer feedback to enhance products or services, or studying medical data to improve patient outcomes, data analysis plays a crucial role in numerous domains.

Effective data analysis requires not only technical skills but also domain knowledge and critical thinking. Analysts must understand the context in which the data is generated, choose appropriate analytical tools and methods, and interpret results accurately to draw meaningful conclusions. Moreover, data analysis is an iterative process that may involve refining hypotheses, collecting additional data, and revisiting analytical techniques to ensure the validity and reliability of findings.

Why spend time to learn data analysis?

Learning about data analysis is beneficial for your career because it equips you with the skills to make data-driven decisions, which are highly valued in today’s data-centric business environment. Employers increasingly seek professionals who can gather, analyze, and interpret data to drive innovation, optimize processes, and achieve strategic objectives.

The data analysis process is a systematic approach to extracting valuable insights and making informed decisions from raw data. It begins with defining the problem or question at hand, followed by collecting and cleaning the relevant data. Exploratory data analysis (EDA) helps in understanding the data’s characteristics and uncovering patterns, while data modeling and analysis apply statistical or machine learning techniques to derive meaningful conclusions. In most organizations, data analysis is structured in a number of steps:

Define the Problem or Question: The first step is to clearly define the problem or question you want to address through data analysis. This could involve understanding business objectives, identifying research questions, or defining hypotheses to be tested.
Data Collection: Once the problem is defined, gather relevant data from various sources. This could include structured data from databases, spreadsheets, or surveys, as well as unstructured data like text documents or social media posts.
Data Cleaning and Preprocessing: Clean and preprocess the data to ensure its quality and reliability. This step involves handling missing values, removing duplicates, standardizing formats, and transforming data if needed (e.g., scaling numerical data, encoding categorical variables).
Exploratory Data Analysis (EDA): Explore the data through descriptive statistics, visualizations (e.g., histograms, scatter plots, heatmaps), and data profiling techniques. EDA helps in understanding the distribution of variables, detecting outliers, and identifying patterns or trends.
Data Modeling and Analysis: Apply appropriate statistical or machine learning models to analyze the data and answer the research questions or address the problem. This step may involve hypothesis testing, regression analysis, clustering, classification, or other analytical techniques depending on the nature of the data and objectives.
Interpretation of Results: Interpret the findings from the data analysis in the context of the problem or question. Determine the significance of results, draw conclusions, and communicate insights effectively.
Decision Making and Action: Use the insights gained from data analysis to make informed decisions, develop strategies, or take actions that drive positive outcomes. Monitor the impact of these decisions and iterate the analysis process as needed.
Communication and Reporting: Present the findings and insights derived from data analysis in a clear and understandable manner to stakeholders, using visualizations, dashboards, reports, or presentations. Effective communication ensures that the analysis results are actionable and contribute to informed decision-making.

These steps form a cyclical process, where feedback from decision-making may lead to revisiting earlier stages, refining the analysis, and continuously improving outcomes.

Key data analysis skills encompass a blend of technical expertise, critical thinking, and domain knowledge. Some of the essential skills for effective data analysis include:

Statistical Knowledge: Understanding statistical concepts and methods such as hypothesis testing, regression analysis, probability distributions, and statistical inference is fundamental for data analysis.

Data Manipulation and Cleaning: Proficiency in tools like Python, R, SQL, or Excel for data manipulation, cleaning, and transformation tasks, including handling missing values, removing duplicates, and standardizing data formats.

Data Visualization: Creating clear and insightful visualizations using tools like Matplotlib, Seaborn, Tableau, or Power BI to communicate trends, patterns, and relationships within data to non-technical stakeholders.

Machine Learning: Familiarity with machine learning algorithms such as decision trees, random forests, logistic regression, clustering, and neural networks for predictive modeling, classification, clustering, and anomaly detection tasks.

Programming Skills: Competence in programming languages such as Python, R, or SQL for data analysis, scripting, automation, and building data pipelines, along with version control using Git.

Critical Thinking: Ability to think critically, ask relevant questions, formulate hypotheses, and design robust analytical approaches to solve complex problems and extract actionable insights from data.

Domain Knowledge: Understanding the context and domain-specific nuances of the data being analyzed, whether it’s finance, healthcare, marketing, or any other industry, is crucial for meaningful interpretation and decision-making.

Data Ethics and Privacy: Awareness of data ethics principles , privacy regulations (e.g., GDPR, CCPA), and best practices for handling sensitive data responsibly and ensuring data security and confidentiality.

Communication and Storytelling: Effectively communicating analysis results through clear reports, presentations, and data-driven storytelling to convey insights, recommendations, and implications to diverse audiences, including non-technical stakeholders.

These skills are crucial in data analysis because they empower analysts to effectively extract, interpret, and communicate insights from complex datasets across various domains. Statistical knowledge forms the foundation for making data-driven decisions and drawing reliable conclusions. Proficiency in data manipulation and cleaning ensures data accuracy and consistency, essential for meaningful analysis. Here

The Enterprise Big Data Analyst certification is aimed at Data Analyst and provides in-depth theory and practical guidance to deduce value out of Big Data sets. The curriculum segments between different kinds of Big Data problems and its corresponding solutions. This course will teach participants how to autonomously find valuable insights in large data sets in order to realize business benefits.

Data analysis plays an important role in driving informed decision-making and strategic planning within enterprises across various industries. By harnessing the power of data, organizations can gain valuable insights into market trends, customer behaviors, operational efficiency, and performance metrics. Data analysis enables businesses to identify opportunities for growth, optimize processes, mitigate risks, and enhance overall competitiveness in the market. Examples of data analysis in the enterprise span a wide range of applications, including sales and marketing optimization, customer segmentation, financial forecasting, supply chain management, fraud detection, and healthcare analytics.

Sales and Marketing Optimization: Enterprises use data analysis to analyze sales trends, customer preferences, and marketing campaign effectiveness. By leveraging techniques like customer segmentation and predictive modeling, businesses can tailor marketing strategies, optimize pricing strategies, and identify cross-selling or upselling opportunities.
Customer Segmentation: Data analysis helps enterprises segment customers based on demographics, purchasing behavior, and preferences. This segmentation allows for targeted marketing efforts, personalized customer experiences, and improved customer retention and loyalty.
Financial Forecasting: Data analysis is used in financial forecasting to analyze historical data, identify trends, and predict future financial performance. This helps businesses make informed decisions regarding budgeting, investment strategies, and risk management.
Supply Chain Management: Enterprises use data analysis to optimize supply chain operations, improve inventory management, reduce lead times, and enhance overall efficiency. Analyzing supply chain data helps identify bottlenecks, forecast demand, and streamline logistics processes.
Fraud Detection: Data analysis is employed to detect and prevent fraud in financial transactions, insurance claims, and online activities. By analyzing patterns and anomalies in data, enterprises can identify suspicious activities, mitigate risks, and protect against fraudulent behavior.
Healthcare Analytics: In the healthcare sector, data analysis is used for patient care optimization, disease prediction, treatment effectiveness evaluation, and resource allocation. Analyzing healthcare data helps improve patient outcomes, reduce healthcare costs, and support evidence-based decision-making.

These examples illustrate how data analysis is a vital tool for enterprises to gain actionable insights, improve decision-making processes, and achieve strategic objectives across diverse areas of business operations.

Below are some of the most frequently asked questions about data analysis and their answers:

What role does domain knowledge play in data analysis?

Domain knowledge is crucial as it provides context, understanding of data nuances, insights into relevant variables and metrics, and helps in interpreting results accurately within specific industries or domains.

How do you ensure the quality and accuracy of data for analysis?

Ensuring data quality and accuracy involves data validation, cleaning techniques like handling missing values and outliers, standardizing data formats, performing data integrity checks, and validating results through cross-validation or data audits.

What tools and techniques are commonly used in data analysis?

Commonly used tools and techniques in data analysis include programming languages like Python and R, statistical methods such as regression analysis and hypothesis testing, machine learning algorithms for predictive modeling, data visualization tools like Tableau and Matplotlib, and database querying languages like SQL.

What are the steps involved in the data analysis process?

The data analysis process typically includes defining the problem, collecting data, cleaning and preprocessing the data, conducting exploratory data analysis, applying statistical or machine learning models for analysis, interpreting results, making decisions based on insights, and communicating findings to stakeholders.

What is data analysis, and why is it important?

Data analysis involves examining, cleaning, transforming, and modeling data to derive meaningful insights and make informed decisions. It is crucial because it helps organizations uncover trends, patterns, and relationships within data, leading to improved decision-making, enhanced business strategies, and competitive advantage.

Big Data Framework

Official account of the Enterprise Big Data Framework Alliance.

Stay in the loop

Subscribe to our free newsletter.

What is Data Fabric?

Orchestration, Management and Monitoring of Data Pipelines

ETL in Data Engineering

The framework.

Framework Overview

Download the Guides

About the Big Data Framework

PARTNERSHIPS

Academic Partner Program

Corporate Partnerships

CERTIFICATIONS

Big data events.

Events and Webinars

CERTIFICATES

Data Privacy Fundamentals

BIG DATA RESOURCES

Big Data News & Updates

Downloads and Resources

CONNECT WITH US

Endenicher Allee 12 53115, DE Bonn Germany

[email protected]

SOCIAL MEDIA

Exploring the social activity of open research data on ResearchGate: implications for the data literacy of researchers

Online Information Review

ISSN : 1468-4527

Article publication date: 3 January 2022

Issue publication date: 18 January 2023

Supplementary Material

Although current research has investigated how open research data (ORD) are published, researchers' behaviour of ORD sharing on academic social networks (ASNs) remains insufficiently explored. The purpose of this study is to investigate the connections between ORDs publication and social activity to uncover data literacy gaps.

Design/methodology/approach

This work investigates whether the ORDs publication leads to social activity around the ORDs and their linked published articles to uncover data literacy needs. The social activity was characterised as reads and citations, over the basis of a non-invasive approach supporting this preliminary study. The eventual associations between the social activity and the researchers' profile (scientific domain, gender, region, professional position, reputation) and the quality of the ORD published were investigated to complete this picture. A random sample of ORD items extracted from ResearchGate (752 ORDs) was analysed using quantitative techniques, including descriptive statistics, logistic regression and K-means cluster analysis.

The results highlight three main phenomena: (1) Globally, there is still an underdeveloped social activity around self-archived ORDs in ResearchGate, in terms of reads and citations, regardless of the published ORDs quality; (2) disentangling the moderating effects over social activity around ORD spots traditional dynamics within the “innovative” practice of engaging with data practices; (3) a somewhat similar situation of ResearchGate as ASN to other data platforms and repositories, in terms of social activity around ORD, was detected.

Research limitations/implications

Although the data were collected within a narrow period, the random data collection ensures a representative picture of researchers' practices.

Practical implications

As per the implications, the study sheds light on data literacy requirements to promote social activity around ORD in the context of open science as a desirable frontier of practice.

Originality/value

Researchers data literacy across digital systems is still little understood. Although there are many policies and technological infrastructure providing support, the researchers do not make an in-depth use of them.

Peer review

The peer-review history for this article is available at: https://publons.com/publon/10.1108/OIR-05-2021-0255 .

Open research data
Academic social networks
Open data use
Open data quality
Researchers data literacy

Raffaghelli, J.E. and Manca, S. (2023), "Exploring the social activity of open research data on ResearchGate: implications for the data literacy of researchers", Online Information Review , Vol. 47 No. 1, pp. 197-217. https://doi.org/10.1108/OIR-05-2021-0255

Emerald Publishing Limited

Published by Emerald Publishing Limited . This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode .

Introduction

The most enthusiastic discussions on the availability of data and the feasibility of appropriation by civil society and researchers immediately encountered other factors blocking advanced data practices, crowd science, quality in research and second-hand data usage for industry or research purposes ( Molloy, 2011 ). Research on open data placed in open data repositories has considered several hypotheses in this regard. First, data cultures connected to disciplinary issues, research funding and value given to specific research practices address researchers' attention and practices ( Borgman, 2015 ; Koltay, 2017 ). Second, any open data quality standard parameters embedded into the digital infrastructures to share data would determine usage and sharing ( Berends et al. , 2020 ). In this regard, the FAIR movement has set an agenda pushing the application of quality standard ( Association of European Research Libraries, 2017 ). Third, the methodological difficulties in capturing the social life of open research data (ORD), with some platforms providing more features to study sharing and reusing approaches than others ( Quarati and Raffaghelli, 2020 ).

Moving beyond open data repositories to other types of digital environments promoting scholars' networking and professional learning, research on the researchers' professional practices on social media and the related digital skills must not be left out ( Manca and Ranieri, 2017 ; Raffaghelli, 2017 ). The literature suggests that scholars have moved to social media from traditional repositories and publication in search of strengthening mutual relationships, facilitating peer collaboration, publishing and sharing research products and discussing research topics in open and public formats ( Greenhow et al. , 2019 ; Hildebrandt and Couros, 2016 ). This is particularly true of increased activity by scholars on academic social networks (ASNs) and ResearchGate ( Manca, 2018 ). Overall, new forms of scholarship aligning with open science ideals have been characterised as open, networked and social ( Goodfellow, 2014 ; Veletsianos, 2013 ). However, their study moves forward through separate lines of research, between information literacy studies and professional learning in networked and online spaces ( Raffaghelli et al. , 2016 ).

In fact, despite the plethora of studies on scholarly practices on social media, no specific research has been conducted to our knowledge on sharing open data usage on ResearchGate. So, it is not clear how researchers engage in such practices as part of their professional learning and identity. Moreover, a preliminary exploration of current practices on data could unravel the existing literacies and spot the skills gaps as a critical piece of open science.

Mind the gap: a way forward to uphold critical data literacy in the data practices of researchers

The tricky situation depicted in the previous section requires scholars to reflect upon data practices from a critical perspective. Emerging forms of research data literacy could be at the cutting edge, aiming at an integrated reflection and action taking in higher education, to provide the necessary support to faculty development ( Raffaghelli, 2020 ; Usova and Laws, 2021 ). From its inception, the concept of literacy relates to a social activity, namely knowledge that is activated in specific contexts of life or work. This is particularly true when dealing with dynamic social environments like social media ( Manca et al. , 2021 ). The visible practices, undertaken by specific groups, show the value given by those social and professional collectives. The hidden or inexistent practices signify both a technical inability and the lack of engagement with a broader view of what developing (a practice) means. In this sense, the open science discussions, including open data infrastructures comprehension, open data production and data sharing and reusage, act as a context of specific professional literacy. The need for knowledge and skills to operate in such contexts is not new. In 2013, Schneider (2013) considered generating a framework to address research data literacy. Some studies also referred to the need for support and coaching by the researchers to develop a more sophisticated understanding of data platforms and practices, showing basic data usage without technical support from Libraries ( Pouchard and Bracke, 2016 ). Wiorogórska et al. (2018) investigated data practices through a quantitative study in Poland led by the Information Literacy Association (InLitAs). The results revealed that a significant number of respondents knew some basic concepts related to research data management (RDM), but they had not used institutional solutions elaborated in their parent institutions. In another EU case study conducted in Slovenia, Vilar and Zabukovec (2019) studied the researchers' information behaviour in all research disciplines concerning selected demographic variables, through an online survey delivered to a random sample central registry of all active researchers. Age and discipline, and in a few cases, gender, were noticeable factors influencing the researchers' information behaviour, including data management, curation and publishing within digital environments. McKiernan et al. (2016) studied the literature through 2016 to show the many benefits of sharing data in Applied Sciences, Life Sciences, Maths, Physical Science and Social Sciences, where the advantages are related to the visibility of research relative to citations rates. As a result, the authors pointed out the need to support the researchers on paths to open data practices.

The literature has been concerned not only with detecting the skills gap, but professional development programmes, conducted primarily by university libraries, have taken an active part in developing data literacy amongst researchers. In determining data information literacy needs, Carlson et al. (2011) noticed that researchers need to integrate the disposition, management and curation of data and research activities. The authors conducted several interviews to analyse advanced students' performance in Geoinformatics activities within a data information literacy programme. Given the difficulties of finding useful training resources for researchers, Teal et al. (2015) developed an intensive two-day introductory workshop on “Data Carpentry”, designed to teach basic concepts, skills and tools for working more effectively and reproducibly with data. Raffaghelli (2018) designed some workshops to discuss, reflect on and design open data activities in the specific field of online, networked learning. There is no documentation on whether said activities integrated research on ASNs. ASNs have been primarily considered a space for informal professional learning, which is frequently intuitive and misses the reference to formal, public infrastructures of digital knowledge available for the scholarly work. Researchers move on these platforms, particularly ResearchGate and Academia.edu , using the affordances provided and learning from each other ( Kuo et al. , 2017 ; Manca, 2018 ; Thelwall and Kousha, 2015 ). However, the literature also portrays the preference for traditional research-related activities to improve reputation, due to incorrect behaviours, lack of quality of the resources shared and gaming within ASNs ( Jamali et al. , 2016 ). Overall, it is necessary to understand the extent to which researchers adopt ASNs in appropriate ways, not as a primary space but with the social purpose of sharing and reusing ORDs. The lack of engagement or the erratic behaviour in these contexts would signal the need to develop data literacy as a complex understanding of the open science context, including the appropriate usage of digital infrastructures. Therefore, we purport here that data literacy refers not only to a technical ability but also to strategic, holistic knowledge and the ability to deal with a new context of professional practice, namely, open science. This preliminary picture is also necessary to promote Libraries and Faculty Development services as institutional strategies to promote professional engagement, learning and activism by researchers on digital platforms.

What are the characteristics of the social activity related to self-archived open data on ResearchGate as a critical component of promoting open science?

What are the characteristics of the social activity related to self-archived open data on ResearchGate when compared to the social activity of the linked published research and in terms of quality?

Is there any factor (including researcher profiles, publications social activity or the quality of the ORD) that predicts the social activity of ORD?

Do the social practices related to ORD and linked publication show any patterns across specific research groups?

Data collection: instruments and procedures

ResearchGate affordances : ResearchGate is considered as one of the most prominent ASNs ( Manca, 2018 ). Its main affordances encourage researcher visibility and social activity. These affordances include public researcher profiles and pages, access to the researchers' publication through specific links generated by ResearchGate and the possibility of linking supplementary material , such as images, tables or data.

As is characteristic of social and professional network sites, the resources cannot be browsed as in a database but are connected to the researcher's profile. Therefore, the resources selected through an algorithm connected to the researchers' profiling – or the other researchers' profile and reputation – are the “hook” to the curiosity and engagement of others with the information.

Metrics : ResearchGate collects and displays several direct metrics (frequencies, percentages) or metrics built upon layers of data. The metrics are also classified as public (accessed without registering on ResearchGate) or private (only registered users can see them). Public metrics include the number of publications, number of questions and answers, number of research projects opened by the researcher or in which the researcher is engaged, number of reads and number of citations. While the metrics are quite direct, they are always calculated based on activity within ResearchGate, namely: the number of publications uploaded by the researcher or detected by ResearchGate, reads counted as views of a publication summary (such as the title, abstract and list of authors), clicks on a figure or views and downloads of the full texts and citations of articles within the ResearchGate platform.

The second type of metrics includes the ResearchGate Score©, a composite metric showing the researcher's reputation calculated on all research elements, including publications, questions, answers and how other researchers interact with said content, particularly as followers, and through views and citations. ResearchGate metrics are aimed at stimulating social life on the platform. Not only complex metrics such as the RG score but also data views motivate the researcher; for example, to make comparisons with their evolution on the timeline and across the collective of researchers.

Data collection procedure : The use of metrics that are not public would involve requesting research scrapping ( Barthel, 2015 ). Moreover, manual procedures (visiting the researchers' profiles one by one upon agreement) would encompass low feasibility of sampling a sufficient, random number of cases (unless the activity is undertaken via survey or crowdsourced research). As a result, an approach that ensures an initial economy of efforts lies with data-driven procedures through metadata extraction procedures.

Therefore, to obtain fundamental insights on the social activity of a considerable number of users, we analysed two public indicators: the number of online reads and the number of citations associated with each researcher's public profile.

The number of views (as basic interaction with a ResearchGate element) and citations (as an essential reusage parameter) was adopted for the two central ResearchGate elements: namely, the ORD (data item) and the linked publication. The sampling procedure, collecting and transforming data into the final variables, included an initial procedure of web scrapping [1] , conducted between November and December 2018, based on a random list of 1,500 objects labelled as data (ORD). The list was applied to search for 1,500 ORDs (sampled on the basis of the links), using the software FMiner ( http://www.fminer.com/ ). After the selection, the linked publications were also searched for automatically. The procedure was repeated to extract the main author's profile (metadata on the institutional affiliation, professional activity/level and RG score). A final data set was assembled with all the information scrapped. After data polishing (removing authors with insufficient metadata, repeated cases, unclear connections between the linked publication and the ORD), 399 items were removed. Finally, 752 cases were considered. At a 95% confidence level and 5% margin of error, the expected sample size is 385 cases. The sample in this study outperformed such values with an margin of error of ±3.57%.

Finally, a set of variables was created through manual analysis by visiting each researcher's profile to ascertain the information retrieved. Variables included gender, scientific domain, geographical region, professional position. The RG score was also rechecked. The metrics, definitions and procedures for data extraction and conversion are synthesised in Table 1 . The authors collaborated with two research assistants to classify the ORDs and analyse agreement for the reliability of the creation of variables. On a list of 6% of randomly selected cases, the agreement level was absolute (100%) on gender (including missed or unclear values) and geographical region. However, the professional position required discussion on technical profiles and research practitioner aggregation, which ended up in a 74% agreement. Cohen's kappa coefficient was used to measure researcher agreement: Overall, coded values were 0.66 on the basis of 31 agreements on using a code, seven agreements on not using a code, five disagreements (two using three not using), representing 88% of agreement.

Another variable built was the FAIR quality assessment. One of the authors assessed each of the 752 ORDs, applying the simplified FAIR checklist ( https://www.go-fair.org/fair-principles/ ). If the four FAIR dimensions were fulfilled (an RG data item was findable, accessible, interoperable and reuseable), a score of 4 was assigned. Conversely, one-, two- or three-point scores were assigned when one, two or three of the FAIR criteria were met. A score of 0 meant that none of the FAIR criteria were detected. In this case, the kappa coefficient was applied to 6% of the list above, obtaining a value of 0.30, with 68% agreement.

Data analysis

The analysis encompassed an exploratory approach to data to detect and represent underlying structures in the datasets to be interpreted based on the research questions.

RQ1 , the descriptive statistics, including frequencies and percentages, central and dispersion robust measures, were reported to provide an initial synthetic representation that led to insights on the dimensions being studied. The descriptive statistics included univariate and bivariate tables with the overall social activity (reads and citations for ORDs and linked publications) and the social activity characterised by ORD quality and the researcher profile (gender, scientific domain, geographical region, professional position, reputation).

As for RQ2 , a relevant issue from the descriptive statistics was the extremely negatively skewed distributions relating to the social activity around ORDs and publications (skewing for publication citations = 11.33; publication reads = 17.61; ORD citations = 18.30; ORD reads = 25.91. Reference value = 0 for perfectly symmetrical distributions, −1 or +1 for highly skewed distributions). As a result, a non-parametrical correlation (Spearman's rank order correlation) was applied to explore initial relationships. Moreover, the relevant relationships were explored through binary logistic regression. The relevant response variables in this study (reads and citations for ORDs) were recoded as dummy variables (Y/N), taking into consideration a reference value set upon the = 0/>0 reads and citations. As in any regression analysis, the logit model aimed to model potential relationships between explanatory variables and response variables. The aim was to model the response of reading/citing or not reading/citing ORDs, according to the different researchers' characteristics, the quality of the ORDs and the social activity related to the linked publications (explanatory variables).

Finally, for RQ3 , an unsupervised k -means cluster analysis was performed to observe whether the reads of ORDs and linked publications (as most basic but stable parameters of social activity) generated groups of cases. As expected, the clustering algorithm forms groups (clusters) of observations that should show similar patterns of relationship (in our case, between reading ORDs and reading linked publications). Moreover, to determine each cluster's relevance, the analysis of variance was adopted, computed per variable and its resultant variance table, including the model sum of squares and degrees of freedom as the variance statistics. The other categorical and numerical variables in the study (researcher profile and quality) were adopted to study their behaviour within the clusters on the clusters generated. Thus, the clusters yielded further information on over-usage trends, considering the researcher profiles and ORD quality.

RQ1 – Overall social activity related to self-archived ORDs compared to linked published research and quality

Initially, we researched the distributions related to the social activity (reads and citations) of publications and ORDs. As reported above, these were extremely skewed, with cases deserving the attention of the research community (max number of publication reads = 10,423; max number of ORD reads = 3,438). Most cases of ORDs and linked publications were never read or cited. The medians around 0 (as a robust measure of central tendency) highlighted such phenomena. Table 2 illustrates the social activity related to ORDs and linked publications and the quartiles, mean and standard deviation showing the skewed distribution. A stable relationship between publication/ORD reads and publication/ORD citations can also be obtained.

Combined social activity with the researcher's profile (gender, scientific domain, region, professional position, reputation in terms of RG score dimensions) also showed interesting, specific phenomena within the overall situation ( Table 3 ).

First, female researchers were underrepresented in the sample, having the fewest reads ( F = 5,192 publication reads and 1,439 ORD reads compared to M = 40,178 publication reads and 10,150 ORD reads). Notably, the ORD reads were almost nine times higher for males. The relationship between publication/ORD reads follows the overall pattern detected above. However, regarding citations, the male/female researchers related to publications come closer ( F = 415/M = 310); this situation is not repeated for the ORD citations ( F = 1/M = 12).

Regarding Scientific Domain, Applied and Natural Sciences outperform the Formal Sciences, Humanities and Social Sciences. These areas of science established the type of scientific communication based on short articles and citations earlier. The Humanities and Social Sciences have evolved through different forms of research communication ( Borgman, 2015 ). The relationship between publication and ORD reads and citations aligns with the overall situation: Specifically, the more ORDs published in a Scientific Field, the more reads and citations received. Interestingly, most ORD citations come from the field of Formal Sciences (12 out of 13), which include Maths, Statistics and Computer Science. Considering the Open Source movement, we can assume that sharing scripts in programming activity is a more common practice, requiring collaborative literacies, than in other fields ( Dabbish et al. , 2012 ).

Regarding the Geographical Regions to which the researchers' institutions belong, most self-archived ORDs fell into the categories of Western Europe (293), North America (107) and the Asian region (136). The relationship between published ORDs and the social activity related to the same ORD and the linked publications is again stable: The more the ORD gets published, the more reads and citations occur, with Western EU and North America showing the highest levels of attention in a typical centre–periphery relationship with knowledge. However, it is interesting to notice some specific cases that could be shaping different cultures of collaboration regarding data. In the Middle East, with a low number of ORD publications (39 out of 752 cases), the ORD is read even as much as the linked publications (Pub Reads = 799; ORD reads = 503) even though there are no citations for the ORD. ORD reads are even higher in the Pacific Region than the linked publication reads (228 compared to 196). As in Western Europe, in Eastern Europe, there is a high concentration of reads on some publications (21,646 reads in the first case and 11,442 in the second case out of 45,313 reads overall). Yet, in the second case, the social activity connected to publication citations (20 for 49 publications) and ORD reads (644 for 49 ORDs published and out of 11,629 reads to all the ORD published) is lower. Moreover, ORD citations are null (0 citations).

The professional position variable shows higher productivity in terms of self-archived ORDs for the academic mid-positions (researchers, lecturers and professors who should have achieved a seniority level) with 357 out of 752 records. They are followed at a distance by the assistant positions (both in research and teaching). The situation is consistent for social activity: Academics in their mid-career positions get more reads on their publications (24,907 out of 45,370) and on their ORDs (4,254 out of 11,629). This is also the case for citations (218 pub citations and 13 ORD citations for 357 publications). More importantly, all ORD citations computed belong to mid-positioned academics. Remarkably, assistants received almost an equal number of ORD reads (4,461 against the 4,254 of mid-position academics) for a relevant fewer number of self-archived ORDs (92 of assistants compared to 357 of mid-position academics).

Finalising the data analysis reported in Table 2 , the researchers' reputation in RG score was considered. We discovered that most self-archived ORDs are related to researchers with relatively low reputation (1–10 RG score = 239; 11–20 = 204 out of 752). Nonetheless, when analysing social activity related to ORDs, we observe a slight change in the trend. The highest number of reads for linked publications occurs for the researchers with the highest reputation (31- … = 16,861 of 45,730). Also, a relevant number of ORD reads are included in this category (2,272 out of 11,629). However, many linked publication reads are consistent with the lowest reputation (1–10 = 11,538). The highest ORD reads are related to scholars with a relatively low reputation (11–20). The linked published citations and the researchers with a low reputation attract a higher number (273 out of 725 overall citations), followed by scholars with a mid-reputation (RG score 21–30 = 188 of 725). Most ORDs get citations regardless of the reputation (high or low) of the researchers who published them. Even if the ORD citations are negligible, it can be assumed that the researchers focus on the research they know, for specific purposes (reading and citing despite the reputation). However, the reads and citations on ORDs produced by scholars with higher reputation show that this parameter attracts other scholars' attention.

Moving to the social activity related to ORD quality, results are displayed in Table 3 . The social activity in terms of publication citations and open data reads also showed little researcher attention to ORD quality. An overwhelming number of self-archived ORDs were not compliant with the FAIR criteria (562 out of 752) followed by elements compliant with only one criterion (126 of 752). At the same time, only one ORD reaches the top level of quality with four FAIR criteria. Moreover, most citations of linked publications (642 out of 725) and ORD reads (10,201 out of 11,629) were directed to low FAIR scores. However, another unusual pattern is shown in Table 3 . A handful of 12 articles linked to published ORDs, compliant with three FAIR criteria, concentrate a very high number of reads (10,750 out of 45,370) and receive 4 of the 13 ORD citations (see Table 4 ).

In conclusion, Figure 1 represents the relationships between (a) publication reads and ORD reads and citations; (b) ORD reads and ORD citations; (c) these two relationships compared to a third variable, namely, ORD quality. The results are not particularly encouraging and confirm the intuitions emerging from the tables: We observe that highly skewed distributions were most ORDs published, and their linked publications are underseen and underused; and a high concentration of social activity related to specific records. The quality of the ORD is also irrelevant to address the researchers' behaviours.

RQ2 – Factors that predict the social activity of ORDs

The binary logistic regression on the response variable ORD reads and ORD citations did not yield significant models, rather interesting insights. Non-linear relationship, independence of errors and multicollinearity were met as assumptions that support the logistic regression. Therefore, given our study's exploratory nature, we adopted the forced entry method that considered the explanatory variables theoretically identified and showed t -values near significance levels.

For the ORD reads, these variables were: quality, RG score and linked publication read. In the latter case, the explanatory variables were transformed into categorical predictors. Annex , published as open data ( Raffaghelli and Manca, 2020 ), respectively, summarise the logit analysis for predicting ORD reads and the predictors for ORD citations. In the case of ORD reads, the high AIC value (−42,348), the non-significant chi-square coefficient (χ 2 = 1, df = 688, p > 0) and the negative very high pseudo R -squared index (McFadden −2.211247e.03) show a poor model fit and somewhat random behaviour related to ORDs. In any case, the “Quality 3” level and the highest number of reads on the linked publications yield a significant t -value regarding the ORD reads, which could point to an association. The more published research linked to an ORD is read, the more the ORD attracts the attention of researchers.

In the ORD citations, the model included the scientific domain, the ORD quality and the ORD reads, converted into categorical variables at two levels (no read = 0 reads, read < 0). While the model got better fit values (AIC -1793), the non-significant chi-square coefficients (χ 2 = 0.99, df = 719, p > 0) and the low R -squared (McFadden −0.04; Hosmer and Lemeshow 0.11) also showed a poor model fit. However, some interesting relationships appeared. The scientific domain of formal sciences ( p > 0.001), achievement of at least three FAIR quality criteria ( p > 0.001) and the presence of ODR reads ( p > 0) could be associated with ORD citations at significant levels. It can be concluded that while it is not possible to find a model for predicting when ORD citations will occur, there are some specific scientific domains that are moving their social practice towards acknowledging and citing the data of others. As in the case of ODR reads, ODR quality also led to citations.

RQ3 – Social practices related to ORDs and patterns of linked publication across specific groups of researchers

This question was explored through cluster analysis that grouped data points according to ORD reads and linked publication reads, as the most stable parameters of social activity, which proved to be associated. The Silhouette method established three cluster as the optimal number ( Figure 2 ). Figure 3 shows the distribution of the three clusters, where, despite the skewed distribution, it is possible to see that the three groups display diversified patterns. Namely, Cluster 1 relates to self-archived ORDs with linked publications that tend to be read more; Cluster 3 is made up of self-archived ORDs that tend to be read more, with some cases of highly read linked publications; and Cluster 2 shows self-archived ORDs with negligible levels of social activity in terms of reads, both of the ORD and the linked publications. The statistics computed within the cluster (sum of squares = C1 = 119.32; C2 = 163.41; C3 = 104.61) showed a similar distance between cluster centroids. The model explained 72% of the variance (between_Squared distances SS/total_SS parameter).

Table 5 shows the distribution of self-archived ORDs relating to the profiles of the researchers (gender, scientific domain, region, professional position and reputation) and the quality of the published ORDs per cluster. We can observe that Cluster 2 (negligible social activity related to ORDs and linked publications) is made up of most cases. The profiles of the researchers in this cluster are consistent with the overall situation: More males, coming from Applied and Natural Sciences, mostly located in Western EU, North America and the Asian region and overwhelmingly mid-career academics. However, when analysing reputation and the quality of the published research, we find most self-archived ORDs in Cluster 2 (negligible social activity) published by scholars with very low RG scores. These scholars tend to publish mostly low-quality ORDs (0/1 FAIR criterion covered). Within the second-largest cluster (1), which shows some social activity for linked publications related to the self-archived ORDs, the situation also aligns with the global distribution and Cluster 1. But it is also worth noticing that in this cluster, the weight of the Asian region and Western Europe is higher when compared to North America. The RG score is more balanced, with cases with a higher RG score (6.05% of 14.18% of C2 weight for the overall sample). Finally, for Cluster 3 (some social activity related to the ORD), the trends are similar to those described above: more males, scientific domains of Applied Sciences and Natural Sciences, same regional areas (Western EU, the Asian Region and North America), more presence of mid-position academics and low quality of the ORDs published. In this cluster, some interesting trends are related to a better representation of the Middle East (2.16% out of 10.07% compared to 3.17% of the Western EU). It was also more relevant social activity found related to self-archived ORDs published by scholars with a higher RG score (the two levels of 21–30 and 31 or higher RG score contribute with 4.44% out of 9.9% of the overall cluster).

In response to the three RQs, we observed that there is still undeveloped social activity related to self-archived ORDs in ResearchGate, in terms of reads, citations and the quality of the published ORDs as their influence to engage with them. We also found that the relevance of the moderating effects on ORDs underpins the assumption of traditional dynamics that is still stuck on “new” data practices. Finally, ResearchGate exhibited a similar situation to other data platforms and repositories.

Our study portrays group characteristics that may support situated values and cultures, preventing or hindering quality data sharing or reuse. Most ORDs were published primarily by males from Western EU, North American and Asian institutions, with a position of academic seniority. They attracted specific citations on the ORDs (as recognition of their work of publishing ORDs) from fields that are already recognised as being dominated by males (Mathematics, Statistics and, particularly, Computer Science). Notwithstanding, female colleagues were underrepresented, but their publications linked to ORDs were cited similarly to male colleagues. Moreover, though the research assistants showed fewer self-archived ORDs than mid-career researchers, the former attracted an equal number compared to the latter. However, the research assistants' self-archived ORDs get much fewer citations. In this regard, even if the ORD citations were negligible, it is related to specific research fields, namely male researchers from Western EU and North America, in their mid-career and of higher reputation. Undoubtedly, centripetal forces attract attention to specific research, which might be linked to several factors.

Our assumptions would be further supported by the sociocritical lens of Bates (2018) who highlights the complex nature of the behaviours of researchers (and other stakeholders) in circulating the ORDs, which entail voluntary or involuntary “data friction”. According to Bates, data friction is an emergent effect of the many cultures, jargons and procedures adopted by the researchers in the different disciplines or groups. The friction effect impedes outsider researchers from understanding what the colleagues do with data, making such data unusable. Post-phenomenological approaches might also support data friction in engaging with research objects and representations: It is not the object per se that communicates its possible affordances, but the relationship between the prior and present researcher's experience to enable her to engage with it ( Trifonas, 2009 ). Moreover, we might consider Robert Merton's foundational work on science's normative structure ( Merton, 1973 ). There are hidden rules connected to the research cultures across fields that permeate and guide researchers' attention, decision over topics and methodologies. These cultural factors could influence researchers when focussing their attention on the most influential researchers in their fields. The example of the concentration of publication reads in Western and Eastern Europe, with low ORD reads and citations, demonstrates patterns that follow tradition and hierarchies. Such information is confirmed by the prevalence of ORDs published by mid-career academics compared to all other positions. Contradictory information comes from RG scores, with most published objects from researchers with low RG scores. But the ORDs are consistent with the tradition and/or expertise that get attention (highest RG score in most read and cited ORD). Even if attention also goes to research published by low-reputation scholars, these research items are cited less often.

Another factor is related to expertise and knowledge levels, supported by professional networked learning ( Pataraia et al. , 2013 ). Consulting specific resources and recognising where the expertise can be found (as in the case of most ORDs published, read and cited by mid-career academics) is consistent with the idea of intuition and self-determination in searching for the relevant knowledge in one's own field. This is not contradictory to Merton's theory; those that hold power in institutional or professional groups/communities are those whose knowledge is most relevant.

As for the latter factor, there might be values deemed applicable across research disciplines, as Lee et al. (2019) expressed. These authors created a model based on 18 factors, amongst which the most important was accessibility, followed by altruism, reciprocity, trust, self-efficacy, reputation and publicity. Nonetheless, data cultures across disciplines are based on the methodological assumption and the research topics' ontological approaches. Borgman's (2015) in-depth qualitative analysis for data practices across disciplines supports this hypothesis, and our work sheds light on the quantitative differences, though not on the motivations. One could consider whether this situation should change: Should all research fields behave similarly and be prolific in opening data? Should Humanities and Social Sciences work on more open data patterns? To a certain extent, as Borgman pointed out, a researcher dealing with unique cultural heritage and/or a social scientist handling sensitive issues would be slower in producing and sharing their data.

In this regard, the values and the ideology of digital and open science could be embraced differently. As Lämmerhirt (2016) and Wouters and Haak (2017) pointed out, data sharing is strongly encouraged by policymakers in some disciplines, such as Physics and Genomics. Still, this concept is far less developed in other fields of research. The recent existence of a field of research could also be considered, as Raffaghelli and Manca (2020) documented for the case of Educational Technology.

Even if our work did not compare the variables explored here within the context of ResearchGate with other contexts of ORD self-archiving like Zenodo, Figshare or institutional repositories, the literature related to these latter cases addresses a similar situation ( Quarati and Raffaghelli, 2020 ). Lack of ORD attention, sharing and reuse is common phenomenon, even if there is an increasing publication trend. Therefore, an infrastructure whose affordances are prepared to support social activity does not encompass specific changes to the researchers' practices, professional cultures and contextual literacies. This aligns with the idea that the technical structure for opening data is embedded in complex sociotechnical ecosystems ( Manca, 2018 ). A clear example is that of ORD quality observed in this study. Most ORDs did not achieve even one FAIR parameter. Within RG, the situation related to the quality of self-archived ORDs and related metadata could be worse than in specialised data repositories due to the lack of specific affordances addressing the appropriate presentation and findability of data sets. However, the very few ORDs of good quality published on ResearchGate deserved attention; thus, when the professional communities engage in specific social behaviour patterns, the participants will adopt the technologies accordingly, rather than the opposite.

The specific situation of groups showing advanced data practices also requires attention, in a changing (and somewhat pressing) situation for digital scholarship to “show up” within social media and ASNs. Weller (2011) highlights the system's pushing effect to adopt digital means supports the idea that many researchers feel obliged to embrace the open practice and “go wild” on some platforms, with somewhat performative practices. The lack of quality of the ORDs published would go in that direction. Moreover, the openness has been emphasised disregarding the relevance of networking and being digital, another fact supported by the low social activity (no networking) and low quality (low digital abilities to treat the self-archived digital items appropriately). As a result, publishing open data might be a mere performative act. We found that the ORDs published underpin new approaches to data. However, the lack of quality could be an eloquent expression of “no concern/no time to devote” about the life of such objects after being published (open dimension despite the networked and digital). Nonetheless, it could also be the result of behaving as double gamers ( Costa, 2016 ). According to Costa, researchers struggle between pursuing the highest values of transparency and public knowledge embedded in the action of publishing open data and the lack of recognition for such an endeavour in most traditional contexts of doing science, where only the final publication supports career advancement.

All in all, there is an emerging scene that hinders scholars' reflection on data practices through more holistic and critical perspectives entailing better quality and reuse. The traditional profiles and social activity (reads and citations) related to the publications underpin the assumption that there is attrition between new professional practices (sharing data in a context of open science) and the consolidated mechanisms of reputation and career advancement. The need for critical research data literacy is part of a culture of pushing for innovations and getting a broader picture of what open science might bring to society in the future, not the present (stuck in the past/tradition). Therefore, better open data quality, replication and second-hand data reusage, as innovative practices of open research, require technical knowledge and require engagement with the policy context, and with strategies to advance the quality and ethics of being an open, networked and social researcher. As an example, we could consider the FAIR data principles. Knowing them helps the researcher publish better quality open data and understand the differences between making data circulate between ASNs or institutional repositories. But knowing the context of generating the FAIR principles might imply attention to familiar patterns and languages, as social knowledge connected to critical data literacy.

Conclusions

This study explored the social activity of researchers related to ORDs, in an attempt to spot areas of conflict relating to making a professional identity as digital scholars in the era of open science. As our study showed, there is still a long way to go for the effective adoption of ASNs to share and reuse ORDs. We considered several hypotheses relating to this phenomenon beyond digital infrastructures and the quality of the digital objects published. In this regard, focussing on practices and the culture supporting them leads to a discussion on sociocultural transformation and development. This element addresses training as a key dimension of institutional cultures, professional practice and critical literacies. In terms of future research, promoting such a critical approach to data practices should be considered. Here, formal training would not be the way to balance a situation where the motivations to publish, read, cite and potentially reuse ORDs could correspond to the researchers' struggle within conflictive institutional and data cultures. To strike a balance between the initial formal learning activities related to institutional repositories and infrastructures for open science and the informal learning occurring in the context of ASNs, engagement and reflective practice within professional learning communities could be explored as a possible way. However, it should also be considered that professional learning requires complex, self-directed pathways including all sorts of engagement with resources, activities and networks to fulfil personal developmental goals into what could be considered an ecology of learning ( Sangrá et al. , 2019 ). Once again, the institutional agendas might pressure researchers to focus on specific forms of literacy. As a result, researchers might resist disregarding activities when not rewarded to activism and civil disobedience. It goes without saying that while formal training imposes an explicit institutional agenda that outlines the types of desired literacies, a critical approach to self-determined data literacy in research might be connected to more informal spaces, particularly activism.

While we found that the factors influencing data practices are relevant, our research could not reach a clear relationship for all the sampled researchers and specific groups. Future research should explore the researchers' (open) data cultures as social contexts of data literacy development, including elaborating open data and publication motivations either in institutional repositories or ASNs. This could be done both by qualitative observational approaches and design-based research on professional learning. In scenarios where the researchers' skills gap predominates, the impact of creating spaces of reflection and informal or non-formal learning amongst researchers could be researched as a source of grounding communication over data to move beyond the sole expression of interest on ORDs. However, in the less optimistic scenarios, where scientific communities' social structures exert power and impose an academic and data culture, such an approach could fail. Other professional learning settings should be explored in tandem with the evolution of policymaking and institutional instruments supporting professional practices.

In our study's practices, we noticed separate worlds between practice and the open science agenda. We purported that the social and cultural implications of being a scholar in the digital age require further understanding of researchers' professional practices regarding social media and their digital skills. We dealt with social activity related to data, which is entangled with many motivations and “know-how” as drivers of informal learning. We purport that the depicted situation points to the need to actively explore the micro-levels of stakeholders' engagement and a more holistic approach to professional learning, to move the agenda of open science and open data forward.

Social activity (open data and linked publication reads and citations) combined with open data quality

Silhouette method to determine the number of clusters (three outliers were eliminated)

Cluster plot: Three clusters detected considering the ORD reads (ODReads) and the linked publications reads (PubReads)

Main research constructs, variables associated, metrics and procedures

Main research Constructs	Variable/s definition	Metric and procedure

Self-archived ORD	Number of items published on ResearchGate as “data”	As published by the researchers at RG
Social activity of ORD and publication	Number of reads Number of citations	As extracted from RG
Quality of the self-archived ORD	Quality measured from 0 to 4	Compliance with the four FAIR criteria scale 0 = No compliance 1 = 1 FAIR criterion covered 2 = 2 FAIR criteria covered 3 = 3 FAIR criteria covered 4 = 4 FAIR criteria covered Note: An ORD archived on RG would not be findable. We considered findable an ORD which publishes the link to the self-archived item into a specialised data platform or institutional repository. Only 1 object amongst the 752 satisfied this condition

Gender	Nominal classification	Female, male, not informed/available
Scientific domain	Nominal classification based on a division of scientific domains-8y\g ( )	Applied Sciences (medicine and health sciences, engineering and technology), Formal Sciences (computer sciences, mathematics, statistics), Humanities (arts, history, visual arts, philosophy, law) Natural Sciences (biology, chemistry, physics, space sciences, Earth sciences), Social Sciences (educational sciences, psychology, political sciences, business, economics, anthropology, archaeology), not available
Geographical region	Nominal classification based on geographical regions as used at SCOPUS, namely, Tools ( )	Africa, Asiatic Region Middle-East, Eastern Europe Western Europe, Russia Northern America, Latin America, Pacific Region, not available
Professional position	Nominal classification according to the experience and type of academic activity	Student (post-graduate, PhD), Assistant (Technical, teaching, research); Mid-position, technical (Journalist, Librarian, Technologist, Researcher practitioner)Mid-position, academic (Lecturer, Researcher, Professor), Leader (Coordinator, Manager, director), Retired Scholar
Researcher’s reputation	RG score	As calculated and extracted from the ASN platform

Social activity around open data and the linked publications

Categories	All Cases = 752
Categories	Min	Q1	PrMedian	Q3	Max	Mean	STDV
Publications reads	0	0	0	25	10,423	60.33	465.83
Publication citations	0	0	0	0	106	0.96	6.18
Open data reads	1	2	4	10	3,438	15.46	127.26
Open data citations	0	0	0	0	6	0.02	0.27

Social activity around open data and the linked publications by researchers' identity (gender, scientific domain, region, position, reputation)

Categories	All Cases = 752
Categories	Total	Publication reads	Publication citations	Open data reads	Open data citations
Total all categories	752	45,370	725	11,629	13

Female	143	5,192	310	1,439	1
Male	605	40,178	415	10,150	12
NA	4	0	0	40	0

Applied Sciences	250	27,984	268	5,638	1
Formal Sciences	77	3,345	39	1,019	12
Humanities	33	1,013	5	410	0
Natural Sciences	290	10,411	329	3,593	0
Social Sciences	102	2,617	84	969	0

Africa	30	1,672	0	204	0
Asiatic Region	136	4,260	224	4,819	0
Middle-East	39	799	52	503	0
Eastern Europe	49	11,442	20	644	0
Western Europe	293	21,646	259	2,615	7
Russia	16	359	17	417	0
Northern America	107	3,666	126	1,666	6
Latin America	55	1,273	14	514	0
Pacific Region	23	196	4	228	0
NA	4	57	9	19	0

Student (undergraduate/PhD)	77	1,431	0	675	0
Assistant (technical, teaching, research)	92	7,290	88	4,461	0
Mid-position, technical (Journalist, Librarian, Technologist, Researcher practitioner)	65	1,065	22	319	0
Mid-position, academic (Lecturer, Researcher, Professor)	357	24,907	218	4,254	13
Leader (Coordinator, Manager, director)	73	1,686	75	937	0
Retired Scholar	3	0	48	3	0

1–10	239	11,538	273	1966	0
11–20	204	6,137	143	5,829	5
21–30	144	9,150	188	1,293	0
31- …	134	16,861	117	2,272	8
NA	31	1,684	4	269	0

Social activity around open data and the linked publications by the quality of open data

Quality of the open data share on RG Compliance with the 4 FAIR criteria scale	All Cases = 752
	Total	Publication reads	Publication citations	Open data reads	Open data citations
Total	752	45,370	725	11,629	13
0 = No compliance	562	25,686	642	10,201	9
1 = 1 FAIR criterion covered	126	2,958	63	975	0
2 = 2 FAIR criterion covered	28	412	6	190	0
3 = 3 FAIR criterion covered	12	10,750	5	152	4
4 = ALL FAIR criteria covered	1	43	0	15	0
NA	23	5,521	9	96	0

Categories distribution per cluster

Categories	Cluster 1	% of total counts w category (%)	Cluster 2	% of total counts w category (%)	Cluster 3	% of total counts w category (%)	Total	Aggregated % of counts within category (%)

Female	15	2.16	101	14.53	18	2.59	134	19.28
Male	77	11.08	432	62.16	52	7.48	561	80.72
Cluster weight over total	92	13.24	533	76.69	70	10.07	695*	100

Applied Sciences	22	3.15	186	26.61	25	3.58	233	33.33
Formal Sciences	8	1.14	56	8.01	8	1.14	72	10.30
Humanities	2	0.29	24	3.43	4	0.57	30	4.29
Natural Sciences	46	6.58	195	27.90	26	3.72	267	38.20
Social Sciences	14	2.00	76	10.87	7	1.00	97	13.88
Cluster weight over total	92	13.16	537	76.82	70	10.01	699	100

Africa	5	0.72	21	3.02	3	0.43	29	4.17
Asiatic Region	23	3.31	90	12.95	15	2.16	128	18.42
Middle-East	3	0.43	26	3.74	7	1.01	36	5.18
Eastern EU	6	0.86	35	5.04	4	0.58	45	6.47
Western EU	38	5.47	213	30.65	22	3.17	273	39.28
Russia	1	0.14	11	1.58	2	0.29	14	2.01
Northern America	8	1.15	77	11.08	11	1.58	96	13.81
Latin America	6	0.86	40	5.76	6	0.86	52	7.48
Pacific Region	1	0.14	21	3.02	0	0.00	22	3.17
Cluster weight over total	91	13.09	534	76.83	70	10.07	695	100

Student (undergraduate/PhD)	10	1.75	49	8.60	4	0.70	63	11.05
Assistant (Technical, teaching, research)	10	1.75	64	11.23	10	1.75	84	14.74
Mid-position, technical (Journalist, Librarian, Technologist, Researcher practitioner)	4	0.70	23	4.04	5	0.88	32	5.61
Mid-position, academic (Lecturer, Researcher, Professor)	48	8.42	245	42.98	35	6.14	328	57.54
Leader (Coordinator, Manager, director)	6	1.05	48	8.42	6	1.05	60	10.53
Retired Scholar	6	1.05	31	5.44	2	0.35	39	6.84
Cluster weight over total	84	14.72	460	80.71	62	10.87	606	100

1–10	30	4.43	179	26.44	19	2.81	228	33.68
11–20	25	3.69	145	21.42	18	2.66	188	27.77
21–30	20	2.95	107	15.81	9	1.33	136	20.09
31- …	21	3.10	83	12.26	21	3.10	125	18.46
Cluster weight over total	96	14.18	514	75.92	67	9.90	677	100

0	73	10.78	393	58.05	53	7.83	519	76.66
1	10	1.48	97	14.33	13	1.92	120	17.73
2	3	0.44	23	3.40	1	0.15	27	3.99
3	2	0.30	7	1.03	1	0.15	10	1.48
4	0	0.00	1	0.15	0	0.00	1	0.15
Cluster weight over total	88	13.00	521	76.96	68	10.04	677	100

Note(s): *The total of cases clustered might vary according to the missed values for each category

This process was undertaken by an external researcher from the company Winged Mercury ( http://www.wingedmercury.net/ )

The supplementary material is available online for this article.

Association of European Research Libraries ( 2017 ), Implementing FAIR Data Principles: the Role of Libraries , LIBER , pp. 1 - 2 , doi: 10.1038/sdata.2016.18 .

Barthel , M. ( 2015 ), The Challenges of Using Facebook for Research , Pew Research Center , n.p. available at: https://www.pewresearch.org/fact-tank/2015/03/26/the-challenges-of-using-facebook-for-research/ .

Bates , J. ( 2018 ), “ The politics of data friction ”, Journal of Documentation , Vol. 74 No. 2 , pp. 412 - 429 , doi: 10.1108/JD-05-2017-0080 .

Berends , J. , Carrara , W. , Engbers , W. and Vollers , H. ( 2020 ), Reusing Open Data: a Study on Companies Transforming Open Data into Economic and Societal Value , Publications Office , doi: 10.2830/876679 .

Borgman , C.L. ( 2015 ), Big Data, Little Data, No Data: Scholarship in the Networked World , MIT Press , Cambridge, MA .

Carlson , J. , Fosmire , M. , Miller , C.C. and Nelson , M.S. ( 2011 ), “ Determining data information literacy needs: a study of students and research faculty ”, Portal: Libraries and the Academy , Vol. 11 No. 2 , pp. 629 - 657 , doi: 10.1353/pla.2011.0022 .

Costa , C. ( 2016 ), “ Double gamers: academics between fields ”, British Journal of Sociology of Education , Vol. 37 No. 7 , pp. 993 - 1013 , doi: 10.1080/01425692.2014.982861 .

Dabbish , L. , Stuart , C. , Tsay , J. and Herbsleb , J. ( 2012 ), “ Social coding in GitHub ”, Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work - CSCW'12 , ACM Press, New York , USA , p. 1277 , doi: 10.1145/2145204.2145396 .

Goodfellow , R. ( 2014 ), “ Scholarly, digital, open: an impossible triangle? ”, Research in Learning Technology , Vol. 21 , doi: 10.3402/rlt.v21.21366 .

Greenhow , C. , Gleason , B. and Staudt Willet , K.B. ( 2019 ), “ Social scholarship revisited: changing scholarly practices in the age of social media ”, British Journal of Educational Technology , Vol. 50 No. 3 , pp. 987 - 1004 , doi: 10.1111/bjet.12772 .

Hildebrandt , K. and Couros , A. ( 2016 ), “ Digital selves, digital scholars: theorising academic identity in online spaces ”, Journal of Applied Social Theory , Vol. 1 No. 1 , available at: https://socialtheoryapplied.com/journal/jast/article/view/16 .

Jamali , H.R. , Nicholas , D. and Herman , E. ( 2016 ), “ Scholarly reputation in the digital age and the role of emerging platforms and mechanisms ”, Research Evaluation , Vol. 25 No. 1 , pp. 37 - 49 , doi: 10.1093/reseval/rvv032 .

Koltay , T. ( 2017 ), “ Data literacy for researchers and data librarians ”, Journal of Librarianship and Information Science , Vol. 49 No. 1 , pp. 3 - 14 , doi: 10.1177/0961000615616450 .

Kuo , T. , Tsai , G.Y. , Jim Wu , Y.-C. and Alhalabi , W. ( 2017 ), “ From sociability to creditability for academics ”, Computers in Human Behavior , Vol. 75 , pp. 975 - 984 , doi: 10.1016/J.CHB.2016.07.044 .

Lämmerhirt , D. ( 2016 ), Briefing Paper : Disciplinary Differences in Opening Research Data , Pasteur4Oa , June 2013 , pp. 1 - 8 , available at: http://pasteur4oa.eu/sites/pasteur4oa/files/resource/Brief_Disciplinary differences in opening research data APS_MP_FINAL1.pdf .

Lee , J. , Oh , S. , Dong , H. , Wang , F. and Burnett , G. ( 2019 ), “ Motivations for self-archiving on an academic social networking site: a study on researchgate ”, Journal of the Association for Information Science and Technology , Vol. 70 No. 6 , pp. 563 - 574 , doi: 10.1002/asi.24138 .

Manca , S. ( 2018 ), “ Researchgate and academia.edu as networked socio-technical systems for scholarly communication: a literature review ”, Research in Learning Technology , Vol. 26 , pp. 1 - 16 , doi: 10.25304/rlt.v26.2008 .

Manca , S. and Ranieri , M. ( 2017 ), “ Exploring digital scholarship. A study on use of social media for scholarly communication among Italian academics ”, in Esposito , A. (Ed.), Research 2.0 and the Impact of Digital Technologies on Scholarly Inquiry , IGI Global , Hershey, PA , pp. 117 - 142 , doi: 10.4018/978-1-5225-0830-4.ch007 .

Manca , S. , Bocconi , S. and Gleason , B. ( 2021 ), “ ‘Think globally, act locally’: a glocal approach to the development of social media literacy ”, Computers and Education , Vol. 160 , p. 104025 , doi: 10.1016/j.compedu.2020.104025 .

McKiernan , E.C. , Bourne , P.E. , Brown , C.T. , Buck , S. , Kenall , A. , Lin , J. , McDougall , D. , Nosek , B.A. , Ram , K. , Soderberg , C.K. , Spies , J.R. , Thaney , K. , Updegrove , A. , Woo , K.H. and Yarkoni , T. ( 2016 ), How Open Science Helps Researchers Succeed , ELife, Cambridge, doi: 10.7554/eLife.16800 .

Merton , R.K. ( 1973 ), “ The normative structure of science ”, in Merton , R.K. (Ed.), The Sociology of Science: Theoretical and Empirical Investigations , University Chicago Press , Chicago, IL .

Molloy , J.C. ( 2011 ), “ The open knowledge foundation: open data means better science ”, PLoS Biology , Vol. 9 No. 12 , doi: 10.1371/journal.pbio.1001195 .

Pataraia , N. , Margaryan , A. , Falconer , I. and Littlejohn , A. ( 2013 ), “ How and what do academics learn through their personal networks ”, Journal of Further and Higher Education , Vol. 39 No. 3 , pp. 336 - 357 , doi: 10.1080/0309877X.2013.831041 .

Pouchard , L. and Bracke , M.S. ( 2016 ), “ An analysis of selected data practices: a case study of the Purdue College of agriculture ”, Issues in Science and Technology Librarianship , Vol. 2016 No. 85 , doi: 10.5062/F4057CX4 .

Quarati , A. and Raffaghelli , J.E. ( 2020 ), “ Do researchers use open research data? Exploring the relationships between usage trends and metadata quality across scientific disciplines from the Figshare case ”, Journal of Information Science , First published on line Oct 4, 2020 . doi: 10.1177/0165551520961048 .

Raffaghelli , J.E. ( 2017 ), “ Exploring the (missed) connections between digital scholarship and faculty development: a conceptual analysis ”, International Journal of Educational Technology in Higher Education , Vol. 14 No. 1 , p. 20 , doi: 10.1186/s41239-017-0058-x .

Raffaghelli , J.E. ( 2018 ), Pathways to Openness in Networked Learning Research - the Case of Open Data , available at: https://www.networkedlearningconference.org.uk/abstracts/ws_raffaghelli.htm ( accessed 30 August 2020 ).

Raffaghelli , J.E. ( 2020 ), “ «Datificación» y Educación Superior: hacia la construcción de un marco para la alfabetización en datos del profesorado universitario ”, Revista Interamericana de Investigación, Educación y Pedagogía, RIIEP , Vol. 13 No. 1 , pp. 177 - 205 , available at: https://revistas.usantotomas.edu.co/index.php/riiep/article/view/5466 .

Raffaghelli , J.E. and Manca , S. ( 2020 ), Dataset Relating the Social Activity of Open Research Data on ResearchGate (Data Set) , Zenodo, Universitat Oberta de Catalunya , Barcelona .

Raffaghelli , J.E. , Cucchiara , S. , Manganello , F. and Persico , D. ( 2016 ), “ Different views on Digital Scholarship: separate worlds or cohesive research field? ”, Research in Learning Technology , Vol. 24 , pp. 1 - 17 , doi: 10.3402/rlt.v24.32036 .

Sangrá , A. , Raffaghelli , J.E. and Guitert-Catasús , M. ( 2019 ), “ Learning ecologies through a lens: ontological, methodological and applicative issues. A systematic review of the literature ”, British Journal of Educational Technology , Vol. 50 No. 4 , pp. 1619 - 1638 , doi: 10.1111/bjet.12795 .

Schneider , R. ( 2013 ), “ Research data literacy ”, Communications in Computer and Information Science , CCIS , Vol. 397 , pp. 134 - 140 , Springer Verlag , doi: 10.1007/978-3-319-03919-0_16 .

Teal , T.K. , Cranston , K.A. , Lapp , H. , White , E. , Wilson , G. , Ram , K. and Pawlik , A. ( 2015 ), “ Data Carpentry: workshops to increase data literacy for researchers ”, International Journal of Digital Curation , Vol. 10 No. 1 , pp. 135 - 143 , doi: 10.2218/ijdc.v10i1.351 .

Thelwall , M. and Kousha , K. ( 2015 ), “ ResearchGate: disseminating, communicating, and measuring scholarship? ”, Journal of the Association for Information Science and Technology , Vol. 66 No. 5 , pp. 876 - 889 , doi: 10.1002/asi.23236 .

Trifonas , P.P. ( 2009 ), “ Deconstructing research: paradigms lost ”, International Journal of Research and Method in Education , Vol. 32 No. 3 , pp. 297 - 308 , doi: 10.1080/17437270903259824 .

Usova , T. and Laws , R. ( 2021 ), “ Teaching a one-credit course on data literacy and data visualisation ”, Journal of Information Literacy , Vol. 15 No. 1 , pp. 84 - 95 , doi: 10.11645/15.1.2840 .

Veletsianos , G. ( 2013 ), “ Open practices and identity: evidence from researchers and educators' social media participation ”, British Journal of Educational Technology , Vol. 44 No. 4 , pp. 639 - 651 , doi: 10.1111/bjet.12052 .

Vilar , P. and Zabukovec , V. ( 2019 ), “ Research data management and research data literacy in Slovenian science ”, Journal of Documentation , Vol. 75 No. 1 , pp. 24 - 43 , doi: 10.1108/JD-03-2018-0042 .

Weller , M. ( 2011 ), The Digital Scholar: How Technology is Transforming Scholarly Practice , Bloomsbury , London .

Wiorogórska , Z. , Leśniewski , J. and Rozkosz , E. ( 2018 ), “ Data literacy and research data management in two top universities in Poland. Raising awareness ”, Communications in Computer and Information Science , Springer , Cham , Vol. 810 , pp. 205 - 214 , doi: 10.1007/978-3-319-74334-9_22 .

Wouters , P. and Haak , W. ( 2017 ), Open Data: the Researcher Perspective , Elsevier - Open Science , Leiden , doi: 10.17632/bwrnfb4bvh.1 .

Acknowledgements

This research has been funded by the Project “Professional learning ecologies for Digital Scholarship: Steps for the Modernisation of Higher Education”, Spanish Ministry of Economy and Competitiveness, Programme “Ramón y Cajal” RYC-2016-19589.

Corresponding author

About the authors.

Juliana Elisa Raffaghelli is a Researcher at the Universitat Oberta de Catalunya (Spain), Faculty of Psychology and Educational Sciences. Her research interests focus on professional development for the use of technologies in teaching and diversified work contexts, with a strong presence of international / global collaboration; Open Education and Science; critical literacy for the use of technologies, with particular reference to Big and Open Data issues. She has covered roles in research, coordination of international and European projects, learning design and teaching in several universities and research institutions. She did PhD in Education and Cognitive Sciences (University of Venice).

Stefania Manca is a Research Director at the Institute of Educational Technology of the National Research Council of Italy. Her research interests include social media and social network sites in formal and informal learning, teacher education, professional development and digital scholarship and student voice-supported participatory practices at school. She is co-editor of the Italian Journal of Educational Technology and Editorial board for the internet and higher education.

Supplementary materials

OIR-05-2021-0255_suppl1.docx (19 KB)

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Different Types of Data Analysis; Data Analysis Methods and Techniques in Research Projects

International Journal of Academic Research in Management, 9(1):1-9, 2022 http://elvedit.com/journals/IJARM/wp-content/uploads/Different-Types-of-Data-Analysis-Data-Analysis-Methods-and-Tec

9 Pages Posted: 18 Aug 2022

Hamed Taherdoost

Hamta Group

Date Written: August 1, 2022

This article is concentrated to define data analysis and the concept of data preparation. Then, the data analysis methods will be discussed. For doing so, the first six main categories are described briefly. Then, the statistical tools of the most commonly used methods including descriptive, explanatory, and inferential analyses are investigated in detail. Finally, we focus more on qualitative data analysis to get familiar with the data preparation and strategies in this concept.

Keywords: Data Analysis, Data Preparation, Data Analysis Methods, Data Analysis Types, Descriptive Analysis, Explanatory Analysis, Inferential Analysis, Predictive Analysis, Explanatory Analysis, Causal Analysis and Mechanistic Analysis, Statistical Analysis.

Suggested Citation: Suggested Citation

Hamed Taherdoost (Contact Author)

Hamta group ( email ).

Vancouver Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, data science, data analytics & informatics ejournal.

Subscribe to this fee journal for more curated articles on this topic

Search Menu
Sign in through your institution
Advance Articles
Author Guidelines
Submission Site
Open Access Policy
Self-Archiving Policy
Why publish with Series A?
About the Journal of the Royal Statistical Society Series A: Statistics in Society
About The Royal Statistical Society
Editorial Board
Advertising & Corporate Services
Journals on Oxford Academic
Books on Oxford Academic

Article Contents

< Previous

Bayesian Ideas and Data Analysis—An Introduction for Scientists and Statisticians

Article contents
Figures & tables
Supplementary Data

Andrew V. Metcalfe, Bayesian Ideas and Data Analysis—An Introduction for Scientists and Statisticians, Journal of the Royal Statistical Society Series A: Statistics in Society , Volume 174, Issue 4, October 2011, Page 1181, https://doi.org/10.1111/j.1467-985X.2011.00725_2.x

Permissions Icon Permissions

If you think that a Bayesian approach to statistical analysis is nice in principle but too complicated in practice, this book may change your mind. The authors’ enthusiasm for the subject is apparent and they have taken care that the text is generally easy to read, with some occasional wry comments that make it more amusing than a typical statistics book. The emphasis is on medical and biological cases, but a range of other applications are covered.

The first quarter of the book covers the fundamental ideas of Bayesian analysis in two chapters separated by a clear introduction to Monte Carlo integration and WinBUGS14, the open source software that is used throughout the book, and preceded by a short prologue. In the prologue, the authors emphasize their conviction that data analysis should be a partnership between subject experts and statisticians, and they introduce examples from manufacturing industry, anthropology, farming and medicine. The elicitation of useful prior information is emphasized throughout the book. Chapter 3 provides practical experience by using WinBUGS for analysing binomial variables with a beta prior and discusses calculating predictive distributions, and the theoretical posterior distribution of the binomial parameter, using R. Chapter 4 has more advanced material on fundamental ideas than the general level of the book, but it can be omitted in a first reading. In contrast, Chapter 5, ‘Comparing populations’, is seen as an essential part of any course. It includes a careful discussion of inference for relative risks and odds ratios, and considers several sampling strategies. Inference for normal populations and a brief coverage of the Poisson process and sample size calculations end the chapter.

Chapter 6 is an introduction to strategies for generating pseudorandom samples from probability distributions, particularly Markov chain Monte Carlo methods. I found some of the developments here quite intricate, but, again, it can be skimmed over at a first reading.

Chapter 7 is a general overview of the regression topics that are covered in the later chapters, which include models for binomial and count data, and regression models for lifetime distributions as well as multiple regression. Chapter 10 deals with linear mixed models including repeated measures models, and Chapter 15, ‘Nonparametric models’, includes distribution-free regression methods and smoothing methods, and the proportional hazards model.

There are three useful appendices on matrices and vectors, probability, and getting started in R, which is well chosen, and includes a note on the interface between R and WinBUGS.

The exercises are an integral part of the book and are placed throughout the text, rather than at the end of chapters. They vary in difficulty; some offer practice in using WinBUGS, whereas others are more challenging and provide detail to support the development.

The book does not cover time series or spatial models. There is some overlap of topics with the excellent book by Gelman et al. ( 2004 ). However, the book by Christensen and his colleagues is more of an introduction and should appeal to scientists taking courses in statistics.

I think that the book is innovative for two reasons. Firstly, it provides an intermediate level course in statistics, using the Bayesian paradigm, that could be given to engineers and scientists requiring substantial statistical analysis, as well as material for a course in Bayesian statistics that is typically offered to statistics students. Secondly it shows how to perform the analyses by using WinBUGS, throughout the text. I would use this book as a basis for a course on Bayesian statistics. It is an excellent text for individual study, and students will find it a valuable reference later in their careers.

Gelman , A. , Carlin , J. B. , Stern , H. S. and Rubin , D. B. ( 2004 ) Bayesian Data Analysis . Boca Raton : Chapman and Hall–CRC .

Google Scholar

Google Preview

Month:	Total Views:
March 2023	8
April 2023	18
May 2023	7
June 2023	4
July 2023	5
August 2023	7
September 2023	10
October 2023	9
November 2023	3
December 2023	21
January 2024	5
February 2024	3
March 2024	4
April 2024	13
May 2024	23

Email alerts

Citing articles via.

Recommend to Your Librarian
Advertising & Corporate Services
Journals Career Network
Email Alerts

Affiliations

Online ISSN 1467-985X
Print ISSN 0964-1998
Copyright © 2024 Royal Statistical Society
About Oxford Academic
Publish journals with us
University press partners
What we publish
New features
Open access
Institutional account management
Rights and permissions
Get help with access
Accessibility
Advertising
Media enquiries
Oxford University Press
Oxford Languages
University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Copyright © 2024 Oxford University Press
Cookie settings
Cookie policy
Privacy policy
Legal notice

This Feature Is Available To Subscribers Only

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis.

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include:

Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate.
Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes.
Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic.
Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply.
Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis.

Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step.
Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others. An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario.
Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data.
Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others.
Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them.

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches.

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world:

A. Quantitative Methods

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods.

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist.

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge. When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result.

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events.

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.

8. Decision Trees

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision.

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely. Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision. In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more.

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments.

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic.

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example.

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of.

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all” and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses. When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all.

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading.

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best.

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data.

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next.

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context.

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question.

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service.

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore, to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize.

15. Narrative Analysis

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others.

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study.

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on.

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice.

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data.

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes.

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

4. Think of governance

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical.

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole.

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors.

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations.

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation.
Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving.

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in.

Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low.
External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high.
Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now.
Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps.

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource .

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail.

Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions.
Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective.
Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them.
Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them.
Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.
Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy.
Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way.
Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data.

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers.
Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate.
Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient.
SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis.
Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context.

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
94% of enterprises say that analyzing data is important for their growth and digital transformation.
Companies that exploit the full potential of their data can increase their operating margins by 60% .
We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

Cluster analysis
Cohort analysis
Regression analysis
Factor analysis
Neural Networks
Data Mining
Text analysis
Time series analysis
Decision trees
Conjoint analysis
Correspondence Analysis
Multidimensional Scaling
Content analysis
Thematic analysis
Narrative analysis
Grounded theory analysis
Discourse analysis

Top 17 Data Analysis Techniques:

Collaborate your needs
Establish your questions
Data democratization
Think of data governance
Clean your data
Set your KPIs
Omit useless data
Build a data management roadmap
Integrate technology
Answer your questions
Visualize your data
Interpretation of data
Consider autonomous technology
Build a narrative
Share the load
Data Analysis tools
Refine your process constantly

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

IMAGES

What is ResearchGate
Qualitative data analysis: conceptual and practical ...
Researchgate: How To Increase Researchgate Score?
ELISA data analysis
Interpretation of Data Analysis?
Baseline data analysis flowchart.

VIDEO

Meteorological data analysis in ncl: part 2
A very brief Introduction to Data Analysis (part 1)
Value label and variable label in R
Timeseries Data: Deploy the model in the database
Array management and data analysis in MATLAB
Data Analysis in Research

COMMENTS

(PDF) ANALYSIS OF DATA
Abstract. Data Analysis is a process of applying statistical practices to organize, represent, describe, evaluate, and interpret data. In statistical applications data analysis can be divided into ...
Learning to Do Qualitative Data Analysis: A Starting Point
The value of structuring data analysis in phases is that it creates a transparent process for both the qualitative researcher and (ultimately) the reader of a given research report. Borrowing from Lochmiller and Lester's (2017) earlier work, we offer here seven phases to engage when completing a qualitative analysis. These phases, we suggest ...
Data Science and Analytics: An Overview from Data-Driven Smart
The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various applications domains. In the area of data science ...
A practical guide to data analysis in general literature reviews
This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.
The Art of Data Analysis
The initial step is to identify whether the data you have gathered follows a normal or a skewed distribution pattern. In normal distribution data, parametric tests need to be used (e.g. mean and student t -test), while in skewed data, non-parametric tests are used (e.g. median and Mann-Whitney U test). Your data can be either continuous or ...
PDF An Introduction to Data Analysis
the performance of all the steps constituting data analysis, from data research to data mining, to publishing the results of the predictive model. Mathematics and Statistics As you will see throughout the book, data analysis requires a lot of complex math during the treatment and processing of data. You need to be competent in all of this,
Quantitative Data Analysis
This book offers a guide to choosing, executing and reporting appropriate data analysis methods to answer specific research questions.
What is Data Analysis? An Introductory Guide
An Introductory Guide. Data analysis is the process of inspecting, cleaning, transforming, and modeling data to derive meaningful insights and make informed decisions. It involves examining raw data to identify patterns, trends, and relationships that can be used to understand various aspects of a business, organization, or phenomenon.
What Is Data Analysis? (With Examples)
Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions.
Exploring the social activity of open research data on ResearchGate
A random sample of ORD items extracted from ResearchGate (752 ORDs) was analysed using quantitative techniques, including descriptive statistics, logistic regression and K-means cluster analysis., The results highlight three main phenomena: (1) Globally, there is still an underdeveloped social activity around self-archived ORDs in ResearchGate ...
Data Analysis in Qualitative Research
Concurrent data generation and analysis is a predominant feature in qualitative research. An iterative or cyclic method of data collection and analysis is emphasised in qualitative approach. What it means is that as the researcher collects data, the analysis process is also initiated.
ResearchGate
ResearchGate is a European commercial social networking site for scientists and researchers [2] to share papers, ask and answer questions, and find collaborators. [3] According to a 2014 study by Nature and a 2016 article in Times Higher Education, it is the largest academic social network in terms of active users, [4] [5] although other ...
Different Types of Data Analysis; Data Analysis Methods and ...
Abstract. This article is concentrated to define data analysis and the concept of data preparation. Then, the data analysis methods will be discussed.
Bayesian Ideas and Data Analysis—An Introduction for Scientists and
In the prologue, the authors emphasize their conviction that data analysis should be a partnership between subject experts and statisticians, and they introduce examples from manufacturing industry, anthropology, farming and medicine. The elicitation of useful prior information is emphasized throughout the book.
What is data analysis? Methods, techniques, types & how-to
Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.