U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Ind Psychiatry J
  • v.19(1); Jan-Jun 2010

Statistics without tears: Populations and samples

Amitav banerjee.

Department of Community Medicine, D Y Patil Medical College, Pune, India

Suprakash Chaudhury

1 Department of Psychiatry, RINPAS, Kanke, Ranchi, India

Research studies are usually carried out on sample of subjects rather than whole populations. The most challenging aspect of fieldwork is drawing a random sample from the target population to which the results of the study would be generalized. In actual practice, the task is so difficult that some sampling bias occurs in almost all studies to a lesser or greater degree. In order to assess the degree of this bias, the informed reader of medical literature should have some understanding of the population from which the sample was drawn. The ultimate decision on whether the results of a particular study can be generalized to a larger population depends on this understanding. The subsequent deliberations dwell on sampling strategies for different types of research and also a brief description of different sampling methods.

Research workers in the early 19th century endeavored to survey entire populations. This feat was tedious, and the research work suffered accordingly. Current researchers work only with a small portion of the whole population (a sample) from which they draw inferences about the population from which the sample was drawn.

This inferential leap or generalization from samples to population, a feature of inductive or empirical research, can be full of pitfalls. In clinical medicine, it is not sufficient merely to describe a patient without assessing the underlying condition by a detailed history and clinical examination. The signs and symptoms are then interpreted against the total background of the patient's history and clinical examination including mental state examination. Similarly, in inferential statistics, it is not enough to just describe the results in the sample. One has to critically appraise the real worth or representativeness of that particular sample. The following discussion endeavors to explain the inputs required for making a correct inference from a sample to the target population.

TARGET POPULATION

Any inferences from a sample refer only to the defined population from which the sample has been properly selected. We may call this the target population. For example, if in a sample of lawyers from Delhi High Court it is found that 5% are having alcohol dependence syndrome, can we say that 5% of all lawyers all over the world are alcoholics? Obviously not, as the lawyers of Delhi High Court may be an institution by themselves and may not represent the global lawyers′ community. The findings of this study, therefore, apply only to Delhi High Court lawyers from which a representative sample was taken. Of course, this finding may nevertheless be interesting, but only as a pointer to further research. The data on lawyers in a particular city tell us nothing about lawyers in other cities or countries.

POPULATIONS IN INFERENTIAL STATISTICS

In statistics, a population is an entire group about which some information is required to be ascertained. A statistical population need not consist only of people. We can have population of heights, weights, BMIs, hemoglobin levels, events, outcomes, so long as the population is well defined with explicit inclusion and exclusion criteria. In selecting a population for study, the research question or purpose of the study will suggest a suitable definition of the population to be studied, in terms of location and restriction to a particular age group, sex or occupation. The population must be fully defined so that those to be included and excluded are clearly spelt out (inclusion and exclusion criteria). For example, if we say that our study populations are all lawyers in Delhi, we should state whether those lawyers are included who have retired, are working part-time, or non-practicing, or those who have left the city but still registered at Delhi.

Use of the word population in epidemiological research does not correspond always with its demographic meaning of an entire group of people living within certain geographic or political boundaries. A population for a research study may comprise groups of people defined in many different ways, for example, coal mine workers in Dhanbad, children exposed to German measles during intrauterine life, or pilgrims traveling to Kumbh Mela at Allahabad.

GENERALIZATION (INFERENCES) FROM A POPULATION

When generalizing from observations made on a sample to a larger population, certain issues will dictate judgment. For example, generalizing from observations made on the mental health status of a sample of lawyers in Delhi to the mental health status of all lawyers in Delhi is a formalized procedure, in so far as the errors (sampling or random) which this may hazard can, to some extent, be calculated in advance. However, if we attempt to generalize further, for instance, about the mental statuses of all lawyers in the country as a whole, we hazard further pitfalls which cannot be specified in advance. We do not know to what extent the study sample and population of Delhi is typical of the larger population – that of the whole country – to which it belongs.

The dilemmas in defining populations differ for descriptive and analytic studies.

POPULATION IN DESCRIPTIVE STUDIES

In descriptive studies, it is customary to define a study population and then make observations on a sample taken from it. Study populations may be defined by geographic location, age, sex, with additional definitions of attributes and variables such as occupation, religion and ethnic group.[ 1 ]

Geographic location

In field studies, it may be desirable to use a population defined by an administrative boundary such as a district or a state. This may facilitate the co-operation of the local administrative authorities and the study participants. Moreover, basic demographic data on the population such as population size, age, gender distribution (needed for calculating age- and sex-specific rates) available from census data or voters’ list are easier to obtain from administrative headquarters. However, administrative boundaries do not always consist of homogenous group of people. Since it is desirable that a modest descriptive study does not cover a number of different groups of people, with widely differing ways of life or customs, it may be necessary to restrict the study to a particular ethnic group, and thus ensure better genetic or cultural homogeneity. Alternatively, a population may be defined in relation to a prominent geographic feature, such as a river, or mountain, which imposes a certain uniformity of ways of life, attitudes, and behavior upon the people who live in the vicinity.

If cases of a disease are being ascertained through their attendance at a hospital outpatient department (OPD), rather than by field surveys in the community, it will be necessary to define the population according to the so-called catchment area of the hospital OPD. For administrative purposes, a dispensary, health center or hospital is usually considered to serve a population within a defined geographic area. But these catchment areas may only represent in a crude manner with the actual use of medical facilities by the local people. For example, in OPD study of psychiatric illnesses in a particular hospital with a defined catchment area, many people with psychiatric illnesses may not visit the particular OPD and may seek treatment from traditional healers or religious leaders.

Catchment areas depend on the demography of the area and the accessibility of the health center or hospital. Accessibility has three dimensions – physical, economic and social.[ 2 ] Physical accessibility is the time required to travel to the health center or medical facility. It depends on the topography of the area (e.g. hill and tribal areas with poor roads have problems of physical accessibility). Economic accessibility is the paying capacity of the people for services. Poverty may limit health seeking behavior if the person cannot afford the bus fare to the health center even if the health services may be free of charge. It may also involve absence from work which, for daily wage earners, is a major economic disincentive. Social factors such as caste, culture, language, etc. may adversely affect accessibility to health facility if the treating physician is not conversant with the local language and customs. In such situations, the patient may feel more comfortable with traditional healers.

Ascertainment of a particular disease within a particular area may be incomplete either because some patient may seek treatment elsewhere or some patients do not seek treatment at all. Focus group discussions (qualitative study) with local people, especially those residing away from the health center, may give an indication whether serious underreporting is occurring.

When it is impossible to relate cases of a disease to a population, perhaps because the cases were ascertained through a hospital with an undefined catchment area, proportional morbidity rates may be used. These rates have been widely used in cancer epidemiology where the number of cases of one form of cancer is expressed as a proportion of the number of cases of all forms of cancer among patients attending the same hospital during the same period.

POPULATIONS IN ANALYTIC STUDIES

Case control studies.

As opposed to descriptive studies where a study population is defined and then observations are made on a representative sample from it, in case control studies observations are made on a group of patients. This is known as the study group , which usually is not selected by sampling of a defined larger group. For instance, a study on patients of bipolar disorder may include every patient with this disorder attending the psychiatry OPD during the study period. One should not forget, however, that in this situation also, there is a hypothetical population consisting of all patients with bipolar disorder in the universe (which may be a certain region, a country or globally depending on the extent of the generalization intended from the findings of the study). Case control studies are often carried out in hospital settings because this is more convenient and accessible group than cases in the community at large. However, the two groups of cases may differ in many respects. At the outset of the study, it should be deliberated whether these differences would affect the external validity (generalization) of the study. Usually, analytic studies are not carried out in groups containing atypical cases of the disorder, unless there is a special indication to do so.

Populations in cohort studies

Basically, cohort studies compare two groups of people (cohorts) and demonstrate whether or not there are more cases of the disease among the cohort exposed to the suspected cause than among the cohort not exposed. To determine whether an association exists between positive family history of schizophrenia and subsequent schizophrenia in persons having such a history, two cohorts would be required: first, the exposed group, that is, people with a family history of mental disorders (the suspected cause) and second, the unexposed group, that is, people without a family history of mental disorders. These two cohorts would need to be followed up for a number of years and cases of schizophrenia in either group would be recorded. If a positive family history is associated with development of schizophrenia, then more cases would occur in the first group than in the second group.

The crucial challenges in a cohort study are that it should include participants exposed to a particular cause being investigated and that it should consist of persons who can be followed up for the period of time between exposure (cause) and development of the disorder. It is vital that the follow-up of a cohort should be complete as far as possible. If more than a small proportion of persons in the cohort cannot be traced (loss to follow-up or attrition), the findings will be biased , in case these persons differ significantly from those remaining in the study.

Depending on the type of exposure being studied, there may or may not be a range of choice of cohort populations exposed to it who may form a larger population from which one has to select a study sample. For instance, if one is exploring association between occupational hazard such as job stress in health care workers in intensive care units (ICUs) and subsequent development of drug addiction, one has to, by the very nature of the research question, select health care workers working in ICUs. On the other hand, cause effect study for association between head injury and epilepsy offers a much wider range of possible cohorts.

Difficulties in making repeated observations on cohorts depend on the length of time of the study. In correlating maternal factors (pregnancy cohort) with birth weight, the period of observation is limited to 9 months. However, if in a study it is tried to find the association between maternal nutrition during pregnancy and subsequent school performance of the child, the study will extend to years. For such long duration investigations, it is wise to select study cohorts that are firstly, not likely to migrate, cooperative and likely to be so throughout the duration of the study, and most importantly, easily accessible to the investigator so that the expense and efforts are kept within reasonable limits. Occupational groups such as the armed forces, railways, police, and industrial workers are ideal for cohort studies. Future developments facilitating record linkage such as the Unique Identification Number Scheme may give a boost to cohort studies in the wider community.

A sample is any part of the fully defined population. A syringe full of blood drawn from the vein of a patient is a sample of all the blood in the patient's circulation at the moment. Similarly, 100 patients of schizophrenia in a clinical study is a sample of the population of schizophrenics, provided the sample is properly chosen and the inclusion and exclusion criteria are well defined.

To make accurate inferences, the sample has to be representative. A representative sample is one in which each and every member of the population has an equal and mutually exclusive chance of being selected.

Sample size

Inputs required for sample size calculation have been dealt from a clinical researcher's perspective avoiding the use of intimidating formulae and statistical jargon in an earlier issue of the journal.[ 1 ]

Target population, study population and study sample

A population is a complete set of people with a specialized set of characteristics, and a sample is a subset of the population. The usual criteria we use in defining population are geographic, for example, “the population of Uttar Pradesh”. In medical research, the criteria for population may be clinical, demographic and time related.

  • Clinical and demographic characteristics define the target population, the large set of people in the world to which the results of the study will be generalized (e.g. all schizophrenics).
  • The study population is the subset of the target population available for study (e.g. schizophrenics in the researcher's town).
  • The study sample is the sample chosen from the study population.

METHODS OF SAMPLING

Purposive (non-random samples).

  • Volunteers who agree to participate
  • Snowball sample, where one case identifies others of his kind (e.g. intravenous drug users)
  • Convenient sample such as captive medical students or other readily available groups
  • Quota sampling, at will selection of a fixed number from each group
  • Referred cases who may be under pressure to participate
  • Haphazard with combination of the above methods

Non-random samples have certain limitations. The larger group (target population) is difficult to identify. This may not be a limitation when generalization of results is not intended. The results would be valid for the sample itself (internal validity). They can, nevertheless, provide important clues for further studies based on random samples. Another limitation of non-random samples is that statistical inferences such as confidence intervals and tests of significance cannot be estimated from non-random samples. However, in some situations, the investigator has to make crucial judgments. One should remember that random samples are the means but representativeness is the goal. When non-random samples are representative (compare the socio-demographic characteristics of the sample subjects with the target population), generalization may be possible.

Random sampling methods

Simple random sampling.

A sample may be defined as random if every individual in the population being sampled has an equal likelihood of being included. Random sampling is the basis of all good sampling techniques and disallows any method of selection based on volunteering or the choice of groups of people known to be cooperative.[ 3 ]

In order to select a simple random sample from a population, it is first necessary to identify all individuals from whom the selection will be made. This is the sampling frame. In developing countries, listings of all persons living in an area are not usually available. Census may not catch nomadic population groups. Voters’ and taxpayers’ lists may be incomplete. Whether or not such deficiencies are major barriers in random sampling depends on the particular research question being investigated. To undertake a separate exercise of listing the population for the study may be time consuming and tedious. Two-stage sampling may make the task feasible.

The usual method of selecting a simple random sample from a listing of individuals is to assign a number to each individual and then select certain numbers by reference to random number tables which are published in standard statistical textbooks. Random number can also be generated by statistical software such as EPI INFO developed by WHO and CDC Atlanta.

Systematic sampling

A simple method of random sampling is to select a systematic sample in which every n th person is selected from a list or from other ordering. A systematic sample can be drawn from a queue of people or from patients ordered according to the time of their attendance at a clinic. Thus, a sample can be drawn without an initial listing of all the subjects. Because of this feasibility, a systematic sample may have some advantage over a simple random sample.

To fulfill the statistical criteria for a random sample, a systematic sample should be drawn from subjects who are randomly ordered. The starting point for selection should be randomly chosen. If every fifth person from a register is being chosen, then a random procedure must be used to determine whether the first, second, third, fourth, or fifth person should be chosen as the first member of the sample.

Multistage sampling

Sometimes, a strictly random sample may be difficult to obtain and it may be more feasible to draw the required number of subjects in a series of stages. For example, suppose we wish to estimate the number of CATSCAN examinations made of all patients entering a hospital in a given month in the state of Maharashtra. It would be quite tedious to devise a scheme which would allow the total population of patients to be directly sampled. However, it would be easier to list the districts of the state of Maharashtra and randomly draw a sample of these districts. Within this sample of districts, all the hospitals would then be listed by name, and a random sample of these can be drawn. Within each of these hospitals, a sample of the patients entering in the given month could be chosen randomly for observation and recording. Thus, by stages, we draw the required sample. If indicated, we can introduce some element of stratification at some stage (urban/rural, gender, age).

It should be cautioned that multistage sampling should only be resorted to when difficulties in simple random sampling are insurmountable. Those who take a simple random sample of 12 hospitals, and within each of these hospitals select a random sample of 10 patients, may believe they have selected 120 patients randomly from all the 12 hospitals. In statistical sense, they have in fact selected a sample of 12 rather than 120.[ 4 ]

Stratified sampling

If a condition is unevenly distributed in a population with respect to age, gender, or some other variable, it may be prudent to choose a stratified random sampling method. For example, to obtain a stratified random sample according to age, the study population can be divided into age groups such as 0–5, 6–10, 11–14, 15–20, 21–25, and so on, depending on the requirement. A different proportion of each group can then be selected as a subsample either by simple random sampling or systematic sampling. If the condition decreases with advancing age, then to include adequate number in the older age groups, one may select more numbers in older subsamples.

Cluster sampling

In many surveys, studies may be carried out on large populations which may be geographically quite dispersed. To obtain the required number of subjects for the study by a simple random sample method will require large costs and will be cumbersome. In such cases, clusters may be identified (e.g. households) and random samples of clusters will be included in the study; then, every member of the cluster will also be part of the study. This introduces two types of variations in the data – between clusters and within clusters – and this will have to be taken into account when analyzing data.

Cluster sampling may produce misleading results when the disease under study itself is distributed in a clustered fashion in an area. For example, suppose we are studying malaria in a population. Malaria incidence may be clustered in villages having stagnant water collections which may serve as a source of mosquito breeding. In villages without such water stagnation, there will be lesser malaria cases. The choice of few villages in cluster sampling may give erroneous results. The selection of villages as a cluster may be quite unrepresentative of the whole population by chance.[ 5 ]

Lot quality assurance sampling

Lot quality assurance sampling (LQAS), which originated in the manufacturing industry for quality control purposes, was used in the nineties to assess immunization coverage, estimate disease prevalence, and evaluate control measures and service coverage in different health programs.[ 6 ] Using only a small sample size, LQAS can effectively differentiate between areas that have or have not met the performance targets. Thus, this method is used not only to estimate the coverage of quality care but also to identify the exact subdivisions where it is deficient so that appropriate remedial measures can be implemented.

The choice of sampling methods is usually dictated by feasibility in terms of time and resources. Field research is quite messy and difficult like actual battle. It may be sometimes difficult to get a sample which is truly random. Most samples therefore tend to get biased. To estimate the magnitude of this bias, the researcher should have some idea about the population from which the sample is drawn. In conclusion, the following quote cited by Bradford Hill[ 4 ] elegantly sums up the benefit of random sampling:

…The actual practice of medicine is virtually confined to those members of the population who either are ill, or think they are ill, or are thought by somebody to be ill, and these so amply fill up the working day that in the course of time one comes unconsciously to believe that they are typical of the whole. This is not the case. The use of a random sample brings to light the individuals who are ill and know they are ill but have no intention of doing anything about it, as well as those who have never been ill, and probably never will be until their final illness. These would have been inaccessible to any other method of approach but that of the random sample… . J. H. Sheldon

Source of Support: Nil.

Conflict of Interest: None declared.

Guide to population vs. sample in research

Last updated

29 May 2023

Reviewed by

Miroslav Damyanov

Population data consists of information collected from every individual in a particular population. Meanwhile, sample data consists of information taken from a subset—or sample —of the population.

In this guide, we’ll discuss the differences between population and sample data, the advantages and disadvantages of each, how to collect data from a sample and a population, and common sampling techniques . By the end, you'll have a better understanding of the differences between population and sample data and when to use them.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is "population" in research?

Population data is the total number of measurements taken from every individual within a group. For example, if you were measuring the heights of all humans on Earth, you’d include all 7 billion people in your population data set. 

When analyzing population data, researchers use statistics such as the population mean, median, and standard deviation. 

Types of populations

Finite population.

A finite population is a population in which all the members are known and can be counted. Examples of this type of population include all the employees of a company, all the students in a school, or the entire population of a city. When working with a finite population, you can calculate the exact population mean, median, and standard deviation.

Infinite population

An infinite population is a population that is too large to be measured or counted. This could be the entire human population on Earth or the number of stars in the sky. Because it’s impossible to measure or count these populations, it isn’t possible to calculate their exact mean, median, and standard deviation.

Closed population

A closed population is one in which you allow no new members to join. An example of a closed population would be a country's citizens over the age of 18 who have been living there for more than 10 years. As no new members can join, the population remains constant and can easily be measured and analyzed.

Open population

An open population is one in which new members can join. For example, all people living in a certain city are considered an open population because new members can move into the city and become part of the population. This type of population is constantly changing, so it isn’t possible to measure and analyze its exact characteristics.

Advantages of population data

Representative.

It offers a complete representation of all elements in the population, which can increase the generalizability of findings.

High quality

Population data is usually very accurate and detailed because standardized data collection methods and quality control measures are in place to provide data from every element in the population.

Large sample size

The sample size is large, which can increase the statistical power of a study and help detect small but meaningful differences. 

Can address rare events

You can use population data to study rare events or diseases that wouldn’t be feasible to study through other methods.

Allows for subgroup analysis

You can use population data to examine subgroups of the population, which can help identify disparities and inform interventions. 

Disadvantages of population data

Time and cost constraints.

Collecting data from a large population is expensive and time-consuming, especially when it comes to data cleaning and preparation before using it for analysis.

Limited access

Depending on the source of population data, it can be difficult to get access to the population or convince people to participate, especially when there are privacy concerns or restrictions on the use of data.

Limited variables

Population data may have limited variables or lack information on important factors, which may not allow one to answer a particular research question if the data wasn’t originally collected for that purpose.

Difficult to analyze

Population data can be large, complex, and contain a wide variety of data or even missing data which demands advanced analytical skills and high computational requirements. 

Outdated information

Population data may become outdated, especially if it was collected some time ago, which can limit its relevance to current research questions. 

  • What is a sample in research?

Sampling is the process of selecting individuals from a larger population and is used to generate representative information about the population of interest. There are two forms of sampling: non-probability. 

Probability sampling is from a randomly selected small subset and provides statistical inferences about the whole population without bias. Non-probability sampling collects data from a selected subset chosen for its convenience or, sometimes, to control and manipulate the data collected.

Types of probability sampling

Random sampling.

This type of sampling is completely by chance. Each member of the population has an equal chance of being selected for the sample, and the results of a random sample will be statistically representative of the whole population. 

For example, if you wanted to know how people felt about a new product, you could use a random number generator to select members from a population for the study.

Stratified sampling

Stratified sampling is when the population is split into different subgroups, or strata, based on one or more characteristics. The researcher then randomly selects members from each stratum to represent the population. This allows the researcher to accurately compare data between different groups because it ensures that all subgroups are represented in the sample. 

For example, if you wanted to measure the opinion of people in different age groups, you could divide your population into groups based on age and then take random samples from each stratum.

Cluster sampling

This type of sampling divides the population into clusters or groups and then further takes a sample from each cluster. This method is often used when it isn’t possible to access the entire population. 

For example, if you wanted to measure public opinion on an issue in a large city, it wouldn’t be feasible to survey every single person. Instead, you could divide the city into neighborhoods and take random samples from each one.

Systematic sample

Systematic sampling involves selecting items from a population based on a set pattern or system. This type of sampling is useful when it’s impossible or impractical to create a list of all items in a population. It’s similar to random sampling in that it helps eliminate any bias from the selection process, but it’s more efficient because it requires fewer samples to be taken. 

If a researcher can only select 10 members from a population of 200 people, they could use systematic sampling by selecting every 20th person in the list to eliminate bias.

Types of non-probability sampling

Convenience sampling.

This form of sampling involves selecting participants based on availability and willingness to take part. This can lead to volunteer bias, meaning that individuals who are more motivated or have more time may be more likely to participate.

Quota sampling

A method of selecting participants from a larger population to match certain criteria is referred to as quota sampling. For example, market researchers might use quota sampling to select a certain number of individuals within specific age groups.

Judgemental sampling

This technique is also referred to as purposive sampling or authoritative sampling. You can use it to target specific individuals who possess a certain set of qualities like age, ethnicity, or religious beliefs. It can help researchers access important information from people with specific knowledge or experience. 

However, this kind of sampling can also lead to selection bias, which is the distortion of results due to the non-random selection of participants.

Snowball sampling

Snowball sampling is often used to reach individuals who may be difficult to access through traditional means. This type of sampling involves asking participants to refer others who fit the same criteria. It’s often used in social sciences research to identify people within a certain community or social group. For example, researchers may conduct a survey offering a reward to participants who refer their close friends or family and get them to participate.  

While this technique can be useful in reaching underserved or underrepresented populations, it also carries the risk of selection bias.

Advantages of sample data

Cost-effective.

Collecting data from a sample is typically less expensive and time-consuming than collecting data from an entire population.  

Higher quality

Collecting data from a smaller subset of a population can often result in higher-quality data when more resources are dedicated to ensuring the accuracy and completeness of the data. 

Feasibility

In some cases, it may be impossible or impractical to collect data from an entire population, making sample data a more feasible option. 

Sample data is usually smaller and more manageable than population data, which makes it easier to analyze. 

Reduced sampling bias

With appropriate sampling methods, sample data can be representative of the large population and provide valuable insights for research. 

Disadvantages of using sample data

Generalizability.

The quality of the data depends on the quality of the sample selection process. If the sample isn’t representative of the population, it leads to skewed results.

Sampling bias

A sample may not provide a complete picture of an entire population when certain groups are overrepresented or underrepresented in the sample.  

Sampling error

Because sample data is drawn from a subset of a larger population, there is always a risk of sampling error . It occurs when the sample doesn’t accurately represent the larger population, which can lead to inaccurate results.

Statistical power

A small sample size can limit the statistical power of the data analysis, making it more difficult to detect meaningful differences or relationships between studied variables. 

Limited score

Sample data may be limited in scope and may not capture the full range of variables present in an entire population. This can limit the depth and breadth of the findings.

  • Differences between population and sample

When discussing research and data analysis, it’s important to understand the differences between population and sample data. Here are some key points to consider when distinguishing between the two: 

Population vs. sample

A population is a set of all individuals or objects that share a common characteristic, while a sample is a subset of that population used to draw conclusions about the entire population. 

For example, if you wanted to research the opinions of all people living in the United States, the population would be all citizens in the US, while the sample would be a smaller subset of people surveyed to represent the opinion of the entire population.

Sample vs. population mean

The sample mean is an average of a sample's values, while the population mean is an average of all values in a population. For example, if you’re researching the average income of households in America, the sample mean would be an average of incomes from a smaller group of households selected from the population of all households in the US.

Sample vs. population standard deviation

Standard deviation measures the variation of a set of values from their mean. The sample standard deviation is based on the variation within a sample, while the population standard deviation is based on the variation within a population. 

For example, if you were researching the variation in test scores for students at a particular school, the sample standard deviation would be based on the scores of a smaller subset of students from the school, while the population standard deviation would be based on all scores from every student at the school.

  • How to collect and use data from a sample

1. Choose the right sampling technique

The most common sampling techniques include random, stratified, convenience, and cluster sampling . Selecting the right technique for your research will depend on your specific needs, resources, goals, and objectives.

2. Decide the sample size

Determining the sample size will vary depending on the goal of your research. Generally speaking, the larger the sample size, the more reliable your results will be. However, there are tradeoffs, such as the cost and resources required to collect data from larger samples.

3. Design an instrument for collecting data

Once you've chosen your sampling technique and decided on the sample size, you'll need to design an instrument for collecting data. This could include surveys , interviews, or experiments. Make sure that the instrument is valid and reliable so that it provides accurate results.

4. Determine a sample frame 

Decide who you’ll include in the sample by selecting the population or subpopulation you want to study. Consider factors like location, age, gender, behavior, and so on when choosing your sample frame.

5. Execute the sample selection process

In this step, you'll select individuals to form your sample. To ensure accuracy, it’s best to use random sampling techniques to guarantee a representative sample.

6. Collect data from a sample

Once you’ve selected the sample, you can begin collecting data. Depending on the method you chose (e.g., survey, interview, experiment), you may need to do some additional steps before you can begin collecting data:

For example, if you’re collecting data through a survey, you may need to obtain permission to conduct the survey from relevant authorities, such as a workplace or community group.

If you plan to conduct interviews as your data collection method, ensure your questions are well-formed and that your interviewees are comfortable answering them. Before the interview, you may also want to send a pre-interview questionnaire to participants to collect basic information to make the interview process more efficient.

Most experiments require a significant amount of planning and preparation to ensure that data is collected in a controlled and systematic manner. Additionally, you may need to consider the ethical implications of conducting the experiment, such as obtaining informed consent from participants and ensuring their safety throughout the experiment.

7. Analyze the data

After you've collected data from the sample, analyze it to find meaningful patterns and trends that you can use to draw conclusions about the population. Remember, since you're working with a sample, your conclusions may not apply to the entire population. 

By following these steps, you can easily collect data from a sample to gain insights about a population without having to analyze all of the data from the population itself. When used correctly, sample data can provide valuable insights that can help shape your research conclusions.

  • How to collect and use data from a population

1. Define the population

Before collecting data from a population, it’s important to first clearly define what population you’re looking to collect data from. This definition should be as specific as possible and include any relevant behavioral characteristics (e.g., shopping frequency, product use, or commute options) or demographic characteristics (e.g., age, gender, and geography).

2. Create a comprehensive list

After identifying the population in terms of traits, past experiences, outlooks, or other components, create a comprehensive list of the population you’ll be studying. Depending on the purpose of the study, this could include both people and organizations.

3. Contact population and collect data

Once you’ve defined the population and chosen your sampling method, it’s time to collect data. You can obtain this data by conducting experiments, surveys, or interviews. Make sure to collect feedback from every person or entity on the population list to generate an exhaustive population sample.

4. Analyze the data

After collecting the data, it’s important to analyze it to draw meaningful conclusions about the population. This analysis should include calculating the sample mean and sample standard deviation for the data set, as well as comparing these values to the population mean and population standard deviation.

5. Draw conclusions

Once you’ve analyzed the data, use the results to draw conclusions about the population. Make sure to be as accurate and objective as possible when making claims about the population.

  • Choosing high-quality samples

High-quality samples are essential when it comes to research. A high-quality sample will produce accurate and reliable study results. A poor-quality sample can result in incorrect or inexact data. These results can be costly and time-consuming to fix. 

A good-quality sample is representative of the population. That means the sample has similar characteristics as the population in terms of age, gender, race, and other factors. The sample should also be randomly selected so as not to bias the results. In addition, the sample should be of a large enough size to be statistically significant .

How to select a high-quality sample

Choose a probability sampling method.

Random selection is the most important part of choosing a high-quality sample. You want to ensure that the sample truly represents the population and that no bias has been introduced. You can do this through methods such as random sampling, stratified sampling, cluster sampling, and systematic sampling. 

Monitor selection process

You should monitor the selection process to ensure that no bias has been introduced during the selection process. You should also make sure that the sample size is large enough to be statistically significant. 

Test for accuracy

You should test the accuracy of your sample by comparing it to the population data. Compare the sample mean vs. population mean, sample vs. population standard deviation, and other factors. If there are any discrepancies between the two, then the sample may not be representative of the population and should be re-evaluated.

By following these steps, you can ensure that your sample is quality and that it correctly reflects the population and produces precise and accurate results.

Using sample and population data can be beneficial in many ways. For example, using sample data allows researchers to make more efficient use of resources while still being able to conclude the population. Additionally, sample data is useful in making statistical inferences about a population, such as the mean or standard deviation. 

On the other hand, population data provides an accurate representation of the whole population, which can be beneficial when researchers need detailed information. 

To ensure accurate and representative data, researchers must understand the differences between populations and weigh the advantages and risks of each sampling technique. By understanding the difference between population and sample data, researchers can gain valuable insights about their target group and use these insights to make informed decisions.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 5 March 2024

Last updated: 25 November 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

Introduction to Research Methods

7 samples and populations.

So you’ve developed your research question, figured out how you’re going to measure whatever you want to study, and have your survey or interviews ready to go. Now all your need is other people to become your data.

You might say ‘easy!’, there’s people all around you. You have a big family tree and surely them and their friends would have happy to take your survey. And then there’s your friends and people you’re in class with. Finding people is way easier than writing the interview questions or developing the survey. That reaction might be a strawman, maybe you’ve come to the conclusion none of this is easy. For your data to be valuable, you not only have to ask the right questions, you have to ask the right people. The “right people” aren’t the best or the smartest people, the right people are driven by what your study is trying to answer and the method you’re using to answer it.

Remember way back in chapter 2 when we looked at this chart and discussed the differences between qualitative and quantitative data.

One of the biggest differences between quantitative and qualitative data was whether we wanted to be able to explain something for a lot of people (what percentage of residents in Oklahoma support legalizing marijuana?) versus explaining the reasons for those opinions (why do some people support legalizing marijuana and others not?). The underlying differences there is whether our goal is explain something about everyone, or whether we’re content to explain it about just our respondents.

‘Everyone’ is called the population . The population in research is whatever group the research is trying to answer questions about. The population could be everyone on planet Earth, everyone in the United States, everyone in rural counties of Iowa, everyone at your university, and on and on. It is simply everyone within the unit you are intending to study.

In order to study the population, we typically take a sample or a subset. A sample is simply a smaller number of people from the population that are studied, which we can use to then understand the characteristics of the population based on that subset. That’s why a poll of 1300 likely voters can be used to guess at who will win your states Governor race. It isn’t perfect, and we’ll talk about the math behind all of it in a later chapter, but for now we’ll just focus on the different types of samples you might use to study a population with a survey.

If correctly sampled, we can use the sample to generalize information we get to the population. Generalizability , which we defined earlier, means we can assume the responses of people to our study match the responses everyone would have given us. We can only do that if the sample is representative of the population, meaning that they are alike on important characteristics such as race, gender, age, education. If something makes a large difference in people’s views on a topic in your research and your sample is not balanced, you’ll get inaccurate results.

Generalizability is more of a concern with surveys than with interviews. The goal of a survey is to explain something about people beyond the sample you get responses from. You’ll never see a news headline saying that “53% of 1250 Americans that responded to a poll approve of the President”. It’s only worth asking those 1250 people if we can assume the rest of the United States feels the same way overall. With interviews though we’re looking for depth from their responses, and so we are less hopefully that the 15 people we talk to will exactly match the American population. That doesn’t mean the data we collect from interviews doesn’t have value, it just has different uses.

There are two broad types of samples, with several different techniques clustered below those. Probability sampling is associated with surveys, and non-probability sampling is often used when conducting interviews. We’ll first describe probability samples, before discussing the non-probability options.

The type of sampling you’ll use will be based on the type of research you’re intending to do. There’s no sample that’s right or wrong, they can just be more or less appropriate for the question you’re trying to answer. And if you use a less appropriate sampling strategy, the answer you get through your research is less likely to be accurate.

7.1 Types of Probability Samples

So we just hinted at the idea that depending on the sample you use, you can generalize the data you collect from the sample to the population. That will depend though on whether your sample represents the population. To ensure that your sample is representative of the population, you will want to use a probability sample. A representative sample refers to whether the characteristics (race, age, income, education, etc) of the sample are the same as the population. Probability sampling is a sampling technique in which every individual in the population has an equal chance of being selected as a subject for the research.

There are several different types of probability samples you can use, depending on the resources you have available.

Let’s start with a simple random sample . In order to use a simple random sample all you have to do is take everyone in your population, throw them in a hat (not literally, you can just throw their names in a hat), and choose the number of names you want to use for your sample. By drawing blindly, you can eliminate human bias in constructing the sample and your sample should represent the population from which it is being taken.

However, a simple random sample isn’t quite that easy to build. The biggest issue is that you have to know who everyone is in order to randomly select them. What that requires is a sampling frame , a list of all residents in the population. But we don’t always have that. There is no list of residents of New York City (or any other city). Organizations that do have such a list wont just give it away. Try to ask your university for a list and contact information of everyone at your school so you can do a survey? They wont give it to you, for privacy reasons. It’s actually harder to think of popultions you could easily develop a sample frame for than those you can’t. If you can get or build a sampling frame, the work of a simple random sample is fairly simple, but that’s the biggest challenge.

Most of the time a true sampling frame is impossible to acquire, so researcher have to settle for something approximating a complete list. Earlier generations of researchers could use the random dial method to contact a random sample of Americans, because every household had a single phone. To use it you just pick up the phone and dial random numbers. Assuming the numbers are actually random, anyone might be called. That method actually worked somewhat well, until people stopped having home phone numbers and eventually stopped answering the phone. It’s a fun mental exercise to think about how you would go about creating a sampling frame for different groups though; think through where you would look to find a list of everyone in these groups:

Plumbers Recent first-time fathers Members of gyms

The best way to get an actual sampling frame is likely to purchase one from a private company that buys data on people from all the different websites we use.

Let’s say you do have a sampling frame though. For instance, you might be hired to do a survey of members of the Republican Party in the state of Utah to understand their political priorities this year, and the organization could give you a list of their members because they’ve hired you to do the reserach. One method of constructing a simple random sample would be to assign each name on the list a number, and then produce a list of random numbers. Once you’ve matched the random numbers to the list, you’ve got your sample. See the example using the list of 20 names below

population in a research example

and the list of 5 random numbers.

population in a research example

Systematic sampling is similar to simple random sampling in that it begins with a list of the population, but instead of choosing random numbers one would select every kth name on the list. What the heck is a kth? K just refers to how far apart the names are on the list you’re selecting. So if you want to sample one-tenth of the population, you’d select every tenth name. In order to know the k for your study you need to know your sample size (say 1000) and the size of the population (75000). You can divide the size of the population by the sample (75000/1000), which will produce your k (750). As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method, but its only advantage over the random sampling technique is simplicity. If we used the same list as above and wanted to survey 1/5th of the population, we’d include 4 of the names on the list. It’s important with systematic samples to randomize the starting point in the list, otherwise people with A names will be oversampled. If we started with the 3rd name, we’d select Annabelle Frye, Cristobal Padilla, Jennie Vang, and Virginia Guzman, as shown below. So in order to use a systematic sample, we need three things, the population size (denoted as N ), the sample size we want ( n ) and k , which we calculate by dividing the population by the sample).

N= 20 (Population Size) n= 4 (Sample Size) k= 5 {20/4 (kth element) selection interval}

population in a research example

We can also use a stratified sample , but that requires knowing more about the population than just their names. A stratified sample divides the study population into relevant subgroups, and then draws a sample from each subgroup. Stratified sampling can be used if you’re very concerned about ensuring balance in the sample or there may be a problem of underrepresentation among certain groups when responses are received. Not everyone in your sample is equally likely to answer a survey. Say for instance we’re trying to predict who will win an election in a county with three cities. In city A there are 1 million college students, in city B there are 2 million families, and in City C there are 3 million retirees. You know that retirees are more likely than busy college students or parents to respond to a poll. So you break the sample into three parts, ensuring that you get 100 responses from City A, 200 from City B, and 300 from City C, so the three cities would match the population. A stratified sample provides the researcher control over the subgroups that are included in the sample, whereas simple random sampling does not guarantee that any one type of person will be included in the final sample. A disadvantage is that it is more complex to organize and analyze the results compared to simple random sampling.

Cluster sampling is an approach that begins by sampling groups (or clusters) of population elements and then selects elements from within those groups. A researcher would use cluster sampling if getting access to elements in an entrie population is too challenging. For instance, a study on students in schools would probably benefit from randomly selecting from all students at the 36 elementary schools in a fictional city. But getting contact information for all students would be very difficult. So the researcher might work with principals at several schools and survey those students. The researcher would need to ensure that the students surveyed at the schools are similar to students throughout the entire city, and greater access and participation within each cluster may make that possible.

The image below shows how this can work, although the example is oversimplified. Say we have 12 students that are in 6 classrooms. The school is in total 1/4th green (3/12), 1/4th yellow (3/12), and half blue (6/12). By selecting the right clusters from within the school our sample can be representative of the entire school, assuming these colors are the only significant difference between the students. In the real world, you’d want to match the clusters and population based on race, gender, age, income, etc. And I should point out that this is an overly simplified example. What if 5/12s of the school was yellow and 1/12th was green, how would I get the right proportions? I couldn’t, but you’d do the best you could. You still wouldn’t want 4 yellows in the sample, you’d just try to approximiate the population characteristics as best you can.

population in a research example

7.2 Actually Doing a Survey

All of that probably sounds pretty complicated. Identifying your population shouldn’t be too difficult, but how would you ever get a sampling frame? And then actually identifying who to include… It’s probably a bit overwhelming and makes doing a good survey sound impossible.

Researchers using surveys aren’t superhuman though. Often times, they use a little help. Because surveys are really valuable, and because researchers rely on them pretty often, there has been substantial growth in companies that can help to get one’s survey to its intended audience.

One popular resource is Amazon’s Mechanical Turk (more commonly known as MTurk). MTurk is at its most basic a website where workers look for jobs (called hits) to be listed by employers, and choose whether to do the task or not for a set reward. MTurk has grown over the last decade to be a common source of survey participants in the social sciences, in part because hiring workers costs very little (you can get some surveys completed for penny’s). That means you can get your survey completed with a small grant ($1-2k at the low end) and get the data back in a few hours. Really, it’s a quick and easy way to run a survey.

However, the workers aren’t perfectly representative of the average American. For instance, researchers have found that MTurk respondents are younger, better educated, and earn less than the average American.

One way to get around that issue, which can be used with MTurk or any survey, is to weight the responses. Because with MTurk you’ll get fewer responses from older, less educated, and richer Americans, those responses you do give you want to count for more to make your sample more representative of the population. Oversimplified example incoming!

Imagine you’re setting up a pizza party for your class. There are 9 people in your class, 4 men and 5 women. You only got 4 responses from the men, and 3 from the women. All 4 men wanted peperoni pizza, while the 3 women want a combination. Pepperoni wins right, 4 to 3? Not if you assume that the people that didn’t respond are the same as the ones that did. If you weight the responses to match the population (the full class of 9), a combination pizza is the winner.

population in a research example

Because you know the population of women is 5, you can weight the 3 responses from women by 5/3 = 1.6667. If we weight (or multiply) each vote we did receive from a woman by 1.6667, each vote for a combination now equals 1.6667, meaning that the 3 votes for combination total 5. Because we received a vote from every man in the class, we just weight their votes by 1. The big assumption we have to make is that the people we didn’t hear from (the 2 women that didn’t vote) are similar to the ones we did hear from. And if we don’t get any responses from a group we don’t have anything to infer their preferences or views from.

Let’s go through a slightly more complex example, still just considering one quality about people in the class. Let’s say your class actually has 100 students, but you only received votes from 50. And, what type of pizza people voted for is mixed, but men still prefer peperoni overall, and women still prefer combination. The class is 60% female and 40% male.

We received 21 votes from women out of the 60, so we can weight their responses by 60/21 to represent the population. We got 29 votes out of the 40 for men, so their responses can be weighted by 40/29. See the math below.

population in a research example

53.8 votes for combination? That might seem a little odd, but weighting isn’t a perfect science. We can’t identify what a non-respondent would have said exactly, all we can do is use the responses of other similar people to make a good guess. That issue often comes up in polling, where pollsters have to guess who is going to vote in a given election in order to project who will win. And we can weight on any characteristic of a person we think will be important, alone or in combination. Modern polls weight on age, gender, voting habits, education, and more to make the results as generalizable as possible.

There’s an appendix later in this book where I walk through the actual steps of creating weights for a sample in R, if anyone actually does a survey. I intended this section to show that doing a good survey might be simpler than it seemed, but now it might sound even more difficult. A good lesson to take though is that there’s always another door to go through, another hurdle to improve your methods. Being good at research just means being constantly prepared to be given a new challenge, and being able to find another solution.

7.3 Non-Probability Sampling

Qualitative researchers’ main objective is to gain an in-depth understanding on the subject matter they are studying, rather than attempting to generalize results to the population. As such, non-probability sampling is more common because of the researchers desire to gain information not from random elements of the population, but rather from specific individuals.

Random selection is not used in nonprobability sampling. Instead, the personal judgment of the researcher determines who will be included in the sample. Typically, researchers may base their selection on availability, quotas, or other criteria. However, not all members of the population are given an equal chance to be included in the sample. This nonrandom approach results in not knowing whether the sample represents the entire population. Consequently, researchers are not able to make valid generalizations about the population.

As with probability sampling, there are several types of non-probability samples. Convenience sampling , also known as accidental or opportunity sampling, is a process of choosing a sample that is easily accessible and readily available to the researcher. Researchers tend to collect samples from convenient locations such as their place of employment, a location, school, or other close affiliation. Although this technique allows for quick and easy access to available participants, a large part of the population is excluded from the sample.

For example, researchers (particularly in psychology) often rely on research subjects that are at their universities. That is highly convenient, students are cheap to hire and readily available on campuses. However, it means the results of the study may have limited ability to predict motivations or behaviors of people that aren’t included in the sample, i.e., people outside the age of 18-22 that are going to college.

If I ask you to get find out whether people approve of the mayor or not, and tell you I want 500 people’s opinions, should you go stand in front of the local grocery store? That would be convinient, and the people coming will be random, right? Not really. If you stand outside a rural Piggly Wiggly or an urban Whole Foods, do you think you’ll see the same people? Probably not, people’s chracteristics make the more or less likely to be in those locations. This technique runs the high risk of over- or under-representation, biased results, as well as an inability to make generalizations about the larger population. As the name implies though, it is convenient.

Purposive sampling , also known as judgmental or selective sampling, refers to a method in which the researcher decides who will be selected for the sample based on who or what is relevant to the study’s purpose. The researcher must first identify a specific characteristic of the population that can best help answer the research question. Then, they can deliberately select a sample that meets that particular criterion. Typically, the sample is small with very specific experiences and perspectives. For instance, if I wanted to understand the experiences of prominent foreign-born politicians in the United States, I would purposefully build a sample of… prominent foreign-born politicians in the United States. That would exclude anyone that was born in the United States or and that wasn’t a politician, and I’d have to define what I meant by prominent. Purposive sampling is susceptible to errors in judgment by the researcher and selection bias due to a lack of random sampling, but when attempting to research small communities it can be effective.

When dealing with small and difficult to reach communities researchers sometimes use snowball samples , also known as chain referral sampling. Snowball sampling is a process in which the researcher selects an initial participant for the sample, then asks that participant to recruit or refer additional participants who have similar traits as them. The cycle continues until the needed sample size is obtained.

This technique is used when the study calls for participants who are hard to find because of a unique or rare quality or when a participant does not want to be found because they are part of a stigmatized group or behavior. Examples may include people with rare diseases, sex workers, or a child sex offenders. It would be impossible to find an accurate list of sex workers anywhere, and surveying the general population about whether that is their job will produce false responses as people will be unwilling to identify themselves. As such, a common method is to gain the trust of one individual within the community, who can then introduce you to others. It is important that the researcher builds rapport and gains trust so that participants can be comfortable contributing to the study, but that must also be balanced by mainting objectivity in the research.

Snowball sampling is a useful method for locating hard to reach populations but cannot guarantee a representative sample because each contact will be based upon your last. For instance, let’s say you’re studying illegal fight clubs in your state. Some fight clubs allow weapons in the fights, while others completely ban them; those two types of clubs never interreact because of their disagreement about whether weapons should be allowed, and there’s no overlap between them (no members in both type of club). If your initial contact is with a club that uses weapons, all of your subsequent contacts will be within that community and so you’ll never understand the differences. If you didn’t know there were two types of clubs when you started, you’ll never even know you’re only researching half of the community. As such, snowball sampling can be a necessary technique when there are no other options, but it does have limitations.

Quota Sampling is a process in which the researcher must first divide a population into mutually exclusive subgroups, similar to stratified sampling. Depending on what is relevant to the study, subgroups can be based on a known characteristic such as age, race, gender, etc. Secondly, the researcher must select a sample from each subgroup to fit their predefined quotas. Quota sampling is used for the same reason as stratified sampling, to ensure that your sample has representation of certain groups. For instance, let’s say that you’re studying sexual harassment in the workplace, and men are much more willing to discuss their experiences than women. You might choose to decide that half of your final sample will be women, and stop requesting interviews with men once you fill your quota. The core difference is that while stratified sampling chooses randomly from within the different groups, quota sampling does not. A quota sample can either be proportional or non-proportional . Proportional quota sampling refers to ensuring that the quotas in the sample match the population (if 35% of the company is female, 35% of the sample should be female). Non-proportional sampling allows you to select your own quota sizes. If you think the experiences of females with sexual harassment are more important to your research, you can include whatever percentage of females you desire.

7.4 Dangers in sampling

Now that we’ve described all the different ways that one could create a sample, we can talk more about the pitfalls of sampling. Ensuring a quality sample means asking yourself some basic questions:

  • Who is in the sample?
  • How were they sampled?
  • Why were they sampled?

A meal is often only as good as the ingredients you use, and your data will only be as good as the sample. If you collect data from the wrong people, you’ll get the wrong answer. You’ll still get an answer, it’ll just be inaccurate. And I want to reemphasize here wrong people just refers to inappropriate for your study. If I want to study bullying in middle schools, but I only talk to people that live in a retirement home, how accurate or relevant will the information I gather be? Sure, they might have grandchildren in middle school, and they may remember their experiences. But wouldn’t my information be more relevant if I talked to students in middle school, or perhaps a mix of teachers, parents, and students? I’ll get an answer from retirees, but it wont be the one I need. The sample has to be appropriate to the research question.

Is a bigger sample always better? Not necessarily. A larger sample can be useful, but a more representative one of the population is better. That was made painfully clear when the magazine Literary Digest ran a poll to predict who would win the 1936 presidential election between Alf Landon and incumbent Franklin Roosevelt. Literary Digest had run the poll since 1916, and had been correct in predicting the outcome every time. It was the largest poll ever, and they received responses for 2.27 million people. They essentially received responses from 1 percent of the American population, while many modern polls use only 1000 responses for a much more populous country. What did they predict? They showed that Alf Landon would be the overwhelming winner, yet when the election was held Roosevelt won every state except Maine and Vermont. It was one of the most decisive victories in Presidential history.

So what went wrong for the Literary Digest? Their poll was large (gigantic!), but it wasn’t representative of likely voters. They polled their own readership, which tended to be more educated and wealthy on average, along with people on a list of those with registered automobiles and telephone users (both of which tended to be owned by the wealthy at that time). Thus, the poll largely ignored the majority of Americans, who ended up voting for Roosevelt. The Literary Digest poll is famous for being wrong, but led to significant improvements in the science of polling to avoid similar mistakes in the future. Researchers have learned a lot in the century since that mistake, even if polling and surveys still aren’t (and can’t be) perfect.

What kind of sampling strategy did Literary Digest use? Convenience, they relied on lists they had available, rather than try to ensure every American was included on their list. A representative poll of 2 million people will give you more accurate results than a representative poll of 2 thousand, but I’ll take the smaller more representative poll than a larger one that uses convenience sampling any day.

7.5 Summary

Picking the right type of sample is critical to getting an accurate answer to your reserach question. There are a lot of differnet options in how you can select the people to participate in your research, but typically only one that is both correct and possible depending on the research you’re doing. In the next chapter we’ll talk about a few other methods for conducting reseach, some that don’t include any sampling by you.

3. Populations and samples

Populations, unbiasedness and precision, randomisation, variation between samples, standard error of the mean.

population in a research example

A population is defined as a group of individuals of the same species living and interbreeding within a given area. Members of a population often rely on the same resources, are subject to similar environmental constraints, and depend on the availability of other members to persist over time. Scientists study a population by examining how individuals in that population interact with each other and how the population as a whole interacts with its environment. As a tool for objectively studying populations, population ecologists rely on a series of statistical measures, known as demographic parameters , to describe that population (Lebreton et al . 1992). The field of science interested in collecting and analyzing these numbers is termed population demographics, also known as demography.

Broadly defined, demography is the study of the characteristics of populations. It provides a mathematical description of how those characteristics change over time. Demographics can include any statistical factors that influence population growth or decline, but several parameters are particularly important: population size, density, age structure, fecundity (birth rates), mortality (death rates), and sex ratio (Dodge 2006). We introduce each of these in turn.

Population Size

Populations display distinctive behaviors based on their size. Small populations face a greater risk of extinction (Caughley 1994). Individuals in these populations can have a hard time finding quality mates so, fewer individuals mate and those that do risk inbreeding (Hamilton 1967). Additionally, individuals in small population are more susceptible to random deaths. Events like fire, floods, and disease have a greater chance of killing all individuals in the population.

Large populations experience their own problems. As they approach the maximum sustainable population size, known as carrying capacity, large populations show characteristic behavior. Populations nearing their carrying capacity experience greater competition for resources, shifts in predator-prey relationships , and lowered fecundity. If the population grows too large, it may begin to exceed the carrying capacity of the environment and degrade available habitat (Figure 1).

Figure 1: Swarms of locusts exceed carrying capacity with huge population sizes. These short-lived spikes in population size produce swarms capable of destroying farms as they move across the agricultural landscapes, eating everything in their path. Photo courtesy of Compton Tucker/NASA GSFC.

Population Density

A more complete description of a population's size includes the population density — the size of a population in relation to the amount of space that it occupies. Density is usually expressed as the number of individuals per unit area or volume (Lebreton et al . 1992). For example: the number of crows per square kilometer or the number of plankton per liter (Andren 1992, Sterner 1986). Like all population properties, density is a dynamic characteristic that changes over time as individuals are added to or removed from the population. Closely related species of Gannet birds will maintain very different densities (Figure 2 ). Birth and immigration — the influx of new individuals from other areas — can increase a population's density, while death and emigration — the movement of individuals out of a population to other areas — can decrease its density (Lebreton et al . 1992).

Figure 2: Gannets can persist at very high densities. They have developed exaggerated territorial behavior as an adaptation to sustain these densely packed colonies. Photo courtesy of Follash via Wikimedia Commons.

Similar to population size, population density displays distinctive characteristics at both high and low values. Density-dependent factors , including competition, predation , migration and disease, intensify within populations as density increases. In contrast, density-independent factors , such as weather, fire regimes, and flooding, impact populations regardless of their specific densities (Lebreton et al . 1992).

Age Structure

Age structure can be represented graphically with a population pyramid (Figure 3). Although a population's age structure is not always pyramidal in shape, most populations have younger cohorts that are larger than older cohorts. For example, Sherman and Morton's studies of the Tioga Pass Belding's ground squirrels revealed birth cohorts larger than 300 individuals and less than 10 individuals in cohorts over the age of six (Sherman & Morton 1984).

View Terms of Use

While maximum fecundity is a constant for populations, realized fecundity varies over time based on the size, density, and age structure of the population. External conditions, such as food and habitat availability, can also influence fecundity. Density-dependent regulation provides a negative feedback if the population grows too large, by reducing birth rates and halting population growth through a host of mechanisms (Lebreton et al . 1992). In white-footed mice, for example, populations regulate their reproductive rate via a stress hormone . As population densities increase, so do aggressive interactions between individuals (even when food and shelter are unlimited). High population densities lead to frequent aggressive encounters, triggering a stress syndrome in which hormonal changes delay sexual maturation , cause reproductive organs to shrink, and depress the immune system (Krohne 1984).

To visualize mortality and fecundity within a population, ecologists create life tables to display age-specific statistical summaries of a population's survival patterns. First developed by Roman actuaries, life tables were used to estimate how long individuals of a particular age category were expected to live in order to value life insurance products (Trenerry 1926). Raymond Pearl (1928) first introduced the life table to biology when he applied it to laboratory studies of the fruit fly, Drosophila . Life tables are particularly useful for species with discrete developmental stages and mortality rates that vary widely from one stage to the next (Figure 5).

Nezara viridula ) assumes different body forms through metamorphosis between different discrete life stages." /> Figure 5: The Green Stink Bug ( Nezara viridula ) assumes different body forms through metamorphosis between different discrete life stages. Photo courtesy of Jovo26 via Wikimedia Commons

Interestingly, sex ratio is not always random but can be manipulated at birth by environmental or physiological mechanisms. All crocodiles and many reptiles utilize a strategy called environmental sex determination, wherein incubation temperature determines the sex of each individual (Delmas et al . 2008). For example, low temperatures will produce males and high temperatures will produce females. In times of limited resources or high population densities, females can manipulate the sex ratios of their clutch by spending more or less time incubating their eggs (Girondot et al . 2004).

age-specific : The age of the individual is important for statistical purposes.

clutch size : The number of offspring one female produces in one reproductive cycle.

cohort : Group of all individuals sharing a statistical factor (such as age or developmental stage)

density-dependent factors : Depending on the local density of the population

density-independent factors : Not linked to the local density of the population

discrete developmental stages : Non-overlapping and structurally distinct growth stages. E.g. tadpoles are one discrete developmental stage and adult frogs are another.

ecosystem : A natural system including the interaction of all living and non-living elements.

extinction : No longer existing.

extrapolating : Estimating an unknown value by assuming that a known value can translate (without distortion) to the scale of the unknown value.

growth rate : The rate of change of population size over time.

inbreeding : Breeding of closely related individuals, often with negative genetic consequences.

incubated : Provided with a heat source during embryonic development.

life tables : Specific format of statistical summary of demographic parameters.

migration : Populations moving from one geographic location to another.

objectively : To study without bias and by measurable and repeatable metrics.

offspring : The individual produced from the reproduction of its parents.

parameter : A value in an equation that does not vary. These values can change between different equations of similar form.

predator-prey relationships : How populations of predators are interacting with populations of prey.

predation : The act of killing another living organism for food.

physiological : The parts and functions of living organisms.

reproductive organs : Specialized collection of cells used to exchange gametes between sexually reproducing organisms.

rates : A mathematical term for the number of things or events happening in a given amount of time.

rearing : To invest energy in the growth and development of offspring after they are born.

subjectively designated geographic range : A parcel of land, the size of which is chosen without using standardized criteria. Picked at the discretion of the researcher.

sustainable : System able to be maintained itself indefinitely without supplement.

sexual maturation : An individual reaching a stage of development where it is able to sexually reproduce.

stress hormone : Chemical compounds synthesized in the body to chemically communicate a stress reaction to various systems within that organism.

statistic : A number acting as a description for more numbers.

References and Recommended Reading

Andren, H. Corvid density and nest predation in relation to forest fragmentation: A landscape perspective. Ecology 73, 794-804 (1992).

Bull, J. Evolution of environmental sex determination from genotypic sex determination. Heredity 47, 173-184 (1981).

Caughley, G. Directions in conservation biology. Journal of Animal Ecology 63, 215-244 (1994).

Delmas, V., Pieau, C. & Girondot, M. A mechanistic model of temperature-dependent sex determination in a chelonian, the European pond turtle. Functional Ecology 22, 84-93 (2008).

Dodge, Y. The Oxford Dictionary of Statistical Terms. Oxford, UK: Oxford University Press, 2006.

Benrey, B & Denno, R. F. The slow-growth-high-mortality hypothesis: A test using the cabbage butterfly. Ecology 78, 987-999 (1997).

Girondot, M. et al . "Implications of temperature-dependent sex determination for population dynamics," Temperature-Dependent Sex Determination in Vertebrates , 148-155, eds. N. Valenzuela & V. Lance. Smithsonian Books, 2004.

Hamilton, W. D. Extraordinary sex ratios. Science 156, 477-488 (1967).

Harcombe, P. A. Tree life tables. BioScience 37, 557-568 (1987).

Hutchinson, G. E. Population studies: Animal ecology and demography. Bulletin of Mathematical Biology 53, 193-213 (1991).

Jiguet, F. et al . Bird population trends are linearly affected by climate change along species thermal ranges. Proceedings of the Royal Society B: Biological Sciences 277, 3601-3608 (2010).

Krohne, D. T., Dubbs, B. A. & Baccus, R. An analysis of dispersal in an unmanipulated population of Peromyscus leucopus. American Midland Naturalist 112, 146-156 (1984).

Lebreton, J-D. et al . Modeling survival and testing biological hypotheses using marked animals: A unified approach with case studies. Ecological Monographs 62, 67-118 (1992).

Martin, T. E. Avian life history evolution in relation to nest site, nest predation, and food. Ecological Monographs 65, 101-127 (1995).

Pearl, R. The Rate of Living, Being an Account of Some Experimental Studies on the Biology of Life Duration . New York, NY: Alfred A. Knopf, 1928.

Stearns, S. C. Life history tactics: A review of the ideas. The Quarterly Review of Biology 51, 3-47 (1976).

Sterner, R. Herbivores' direct and indirect effects on algal populations. Science 231, 605-607 (1986).

Trenerry, C. F. The Origin and Early History of Insurance, Including the Contract of Bottomry. London, UK: P. S. King & Son, 1926.

Flag Inappropriate

Google Plus+

StumbleUpon

Email your Friend

population in a research example

  •  |  Lead Editor: 

Topic Rooms

Within this Subject (22)

  • Basic (12)
  • Intermediate (5)
  • Advanced (5)

Other Topic Rooms

  • Ecosystem Ecology
  • Physiological Ecology
  • Population Ecology
  • Community Ecology
  • Global and Regional Ecology
  • Conservation and Restoration
  • Animal Behavior
  • Teach Ecology
  • Earth's Climate: Past, Present, and Future
  • Terrestrial Geosystems
  • Marine Geosystems
  • Scientific Underpinnings
  • Paleontology and Primate Evolution
  • Human Fossil Record
  • The Living Primates

ScholarCast

© 2014 Nature Education

  • Press Room |
  • Terms of Use |
  • Privacy Notice |

Send

Visual Browse

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Population vs Sample | Definitions, Differences & Examples

Population vs Sample | Definitions, Differences & Examples

Published on 3 May 2022 by Pritha Bhandari . Revised on 5 December 2022.

Population vs sample

A population is the entire group that you want to draw conclusions about.

A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population.

In research, a population doesn’t always refer to people. It can mean a group containing elements of anything you want to study, such as objects, events, organisations, countries, species, or organisms.

Table of contents

Collecting data from a population, collecting data from a sample, population parameter vs sample statistic, practice questions: populations vs samples, frequently asked questions about samples and populations.

Populations are used when your research question requires, or when you have access to, data from every member of the population.

Usually, it is only straightforward to collect data from a whole population when it is small, accessible and cooperative.

For larger and more dispersed populations, it is often difficult or impossible to collect data from every individual. For example, every 10 years, the federal US government aims to count every person living in the country using the US Census. This data is used to distribute funding across the nation.

However, historically, marginalised and low-income groups have been difficult to contact, locate, and encourage participation from. Because of non-responses, the population count is incomplete and biased towards some groups, which results in disproportionate funding across the country.

In cases like this, sampling can be used to make more precise inferences about the population.

Prevent plagiarism, run a free check.

When your population is large in size, geographically dispersed, or difficult to contact, it’s necessary to use a sample. With statistical analysis , you can use sample data to make estimates or test hypotheses about population data.

Ideally, a sample should be randomly selected and representative of the population. Using probability sampling methods (such as simple random sampling or stratified sampling ) reduces the risk of sampling bias and enhances both internal and external validity .

For practical reasons, researchers often use non-probability sampling methods . Non-probability samples are chosen for specific criteria; they may be more convenient or cheaper to access. Because of non-random selection methods, any statistical inferences about the broader population will be weaker than with a probability sample.

Reasons for sampling

  • Necessity : Sometimes it’s simply not possible to study the whole population due to its size or inaccessibility.
  • Practicality : It’s easier and more efficient to collect data from a sample.
  • Cost-effectiveness : There are fewer participant, laboratory, equipment, and researcher costs involved.
  • Manageability : Storing and running statistical analyses on smaller datasets is easier and reliable.

When you collect data from a population or a sample, there are various measurements and numbers you can calculate from the data. A parameter is a measure that describes the whole population. A statistic is a measure that describes the sample.

You can use estimation or hypothesis testing to estimate how likely it is that a sample statistic differs from the population parameter.

Sampling error

A sampling error is the difference between a population parameter and a sample statistic. In your study, the sampling error is the difference between the mean political attitude rating of your sample and the true mean political attitude rating of all undergraduate students in the Netherlands.

Sampling errors happen even when you use a randomly selected sample. This is because random samples are not identical to the population in terms of numerical measures like means and standard deviations .

Because the aim of scientific research is to generalise findings from the sample to the population, you want the sampling error to be low. You can reduce sampling error by increasing the sample size.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

A sampling error is the difference between a population parameter and a sample statistic .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, December 05). Population vs Sample | Definitions, Differences & Examples. Scribbr. Retrieved 15 April 2024, from https://www.scribbr.co.uk/research-methods/population-versus-sample/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, sampling methods | types, techniques, & examples, a quick guide to experimental design | 5 steps & examples, what is quantitative research | definition & methods.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Market Research
  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • What Is A Research Panel?
  • Population and Samples

Try Qualtrics for free

Population and samples: the complete guide.

9 min read What are the differences between populations and samples? In this guide, we’ll discuss the two, as well as how to use them effectively in your research.

When we hear the term population, the first thing that comes to mind is a large group of people.

In market research, however, a population is an entire group that you want to draw conclusions about and possesses a standard parameter that is consistent throughout the group.

It’s important to note that a population doesn’t always refer to people, it can mean anything you want to study: objects, organizations, animals, chemicals and so on.

For example, all the countries in the world are an example of a population — or even the number of males in the UK. The size of the population can vary according to the target entities in question and the scope of the research.

When do you need to collect data from a population?

You use populations when your research calls for or requires you to collect data from every member of the population. Note: it’s normally far easier to collect data from whole populations when they’re small and accessible.

For larger and more diverse populations, on the other hand — e.g. a regional study on people living in Europe — while you would get findings representative of the entire population (as they’re all included in the study), it would take a considerable amount of time.

It’s in these instances that you use sampling. It allows you to make more precise inferences about the population as a whole, and streamline your research project. They’re typically used when population sizes are too large to include all possible members or inferences.

Let’s talk about samples.

What is a sample?

In statistical methods, a sample consists of a smaller group of entities, which are taken from the entire population. This creates a subset group that is easier to manage and has the characteristics of the larger population.

This smaller subset is then surveyed to gain information and data. The sample should reflect the population as a whole, without any bias towards a specific attribute or characteristic. In this way, researchers can ensure their results are representative and statistically significant.

To remove unconscious selection bias, a researcher may choose to randomize the selection of the sample.

population in a research example

Types of samples

There are two categories of sampling generally used – probability sampling and non-probability sampling :

  • Probability sampling , also known as random sampling, is a kind of sample selection where randomization is used instead of deliberate choice.
  • Non-probability sampling techniques involve the researcher deliberately picking items or individuals for the sample based on their research goals or knowledge

These two sampling techniques have several methods:

Probability sampling types include:

  • Simple random sampling Every element in the population has an equal chance of being selected as part of the sample. Find out more about simple random sampling.
  • Systematic sampling Also known as systematic clustering, in this method, random selection only applies to the first item chosen. A rule then applies so that every nth item or person after that is picked. Find out more about systematic sampling .
  • Stratified random sampling Sampling uses random selection within predefined groups. Find out more about stratified random sampling .
  • Cluster sampling Groups rather than individual units of the target population are selected at random.

Non-probability sampling types include:

  • Convenience sampling People or elements in a sample are selected based on their availability.
  • Quota sampling The sample is formed according to certain groups or criteria.
  • Purposive sampling Also known as judgmental sampling. The sample is formed by the researcher consciously choosing entities, based on the survey goals.
  • Snowball sampling Also known as referral sampling. The sample is formed by sample participants recruiting connections.

Find out more about sampling methods with our ultimate guide to sampling methods and best practices

Calculating sample size

Worried about sample sizes? You can also use our sample size calculator to determine how many responses you need to be confident in your data.

population in a research example

Go to sample size calculator

When to use sampling

As mentioned, sampling is useful for dealing with population data that is too large to process as a whole or is inaccessible. Sampling also helps to keep costs down and reduce time to insight.

Advantages of using sampling to collect data

  • Provide researchers with a representative view of the population through the sample subset.
  • The researcher has flexibility and control over what kind of sample they want to make, depending on their needs and the goals of the research.
  • Reduces the volume of data, helping to save time.
  • With proper methods, researchers can achieve a higher level of accuracy
  • Researchers can get detailed information on a population with a smaller amount of resource
  • Significantly cheaper than other methods
  • Allows for deeper study of some aspects of data — rather than asking 15 questions to every individual, it’s better to use 50 questions on a representative sample

Disadvantages of using sampling to collect data

  • Researcher bias can affect the quality and accuracy of results
  • Sampling studies require well-trained experts
  • Even with good survey design, there’s no way to eliminate sampling errors entirely
  • People in the sample may refuse to respond
  • Probability sampling methods can be less representative in favor of random allocation.
  • Improper selection of sampling techniques can affect the entire process negatively

How can you use sampling in business?

Depending on the nature of your study and the conclusions you wish to draw, you’ll have to select an appropriate sampling method as mentioned above. That said, here are a few examples of how you can use sampling techniques in business.

Creating a new product

If you’re looking to create a new product line, you may want to do panel interviews or surveys with a representative sample for the new market. By showing your product or concept to a sample that represents your target audience (population), you ensure that the feedback you receive is more reflective of how that customer segment will feel.

Average employee performance

If you wanted to understand the average employee performance for a specific group, you could use a random sample from a team or department (population). As every person in the department has a chance of being selected, you’ll have a truly random — yet representative sample. From the data collected, you can make inferences about the team/department’s average performance.

Store feedback

Let’s say you want to collect feedback from customers who are shopping or have just finished shopping at your store. To do this, you could use convenience sampling. It’s fast, affordable and done at a point of convenience. You can use this to get a quick gauge of how people feel about your store’s shopping experience — but it won’t represent the true views of all your customers.

Manage your population and sample data easily

Whatever the sample size of your target audience, there are several things to consider:

  • How can you save time in conducting the research?
  • How do you analyze and compare all the responses?
  • How can you track and chase non-respondents easily?
  • How can you translate the data into a usable presentation format?
  • How can you share this easily?

These questions can make the task of supporting internal teams and management difficult.

This is where the Qualtrics CoreXM technology solution can help you progress through research with ease.

It includes:

  • Advanced AI and machine learning tools to easily analyze data from open-text responses and data, giving you actionable insights at scale.
  • Intuitive drag-and-drop survey building with powerful logic, 100+ question types, and pre-built survey templates . For more information on how to get started on your survey creation, visit our complete guide on creating a survey.
  • Stylish, accessible and easy-to-understand reporting that automatically updates in real time, so everyone in your organization has the latest insights at their fingertips.
  • Powerful automation to get up and running quickly with out-of-the-box workflows, including guided setup and proactive recommendations to help you connect with other teams and react fast to changes.

Also, the Qualtrics online research panels and samples help you to:

  • Choose a target audience and get access to a representative sample
  • Boost the accuracy of your research with a sample methodology that’s 47% more consistent than standard sampling methods
  • Get dedicated support at every stage, from launching your survey to reporting on the results.

Want to learn more?

Related resources

Panels & Samples

Representative Samples 13 min read

Reward survey participants 15 min read, panel management 14 min read, what is a research panel 10 min read.

Analysis & Reporting

Data Saturation In Qualitative Research 8 min read

How to determine sample size 12 min read.

Market Segmentation

User Personas 14 min read

Request demo.

Ready to learn more about Qualtrics?

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

1.5: Populations and Samples

  • Last updated
  • Save as PDF
  • Page ID 22002

  • Michelle Oja
  • Taft College

Because social scientists want to make people’s lives better, they see what works on a small group of people, and then apply it to everyone. The small group of people is the sample, and “everyone” is the population.

Definition: Sample

People who participate in a study; the smaller group that the data is gathered from.

A sample is the small group of people that scientists test stuff on. We want at least 30 people in each group, so a study that has two groups will need about 60 people in the sample.

Definition: Population

The biggest group that your sample can represent.

A population is the “everyone” that we want to apply the results to. Sometimes, “everyone” can be a pretty small group; if I measured the GPA of one of my Behavioral Statistics classes, then the sample would be the class and the population could be students in all Behavioral Statistics classes at the college. (GPA would be the DV.)

A sample is a concrete thing. You can open up a data file, and there’s the data from your sample. A population, on the other hand, is a more abstract idea. It refers to the set of all possible people, or all possible observations, that you want to draw conclusions about, and is generally much bigger than the sample. In an ideal world, the researcher would begin the study with a clear idea of what the population of interest is, since the process of designing a study and testing hypotheses about the data that it produces does depend on the population about which you want to make statements. However, that doesn’t always happen in practice: usually the researcher has a fairly vague idea of what the population is and designs the study as best he/she can on that basis.

In our Scientific Method example, the sample would be the class from which we got the data from, and the population would be the biggest group that they could represent. There’s often more than one possible population, but I might say all college students could be a good population for this sample.

You might have heard the phrase “random sample.” This means that everyone in the population has an equal chance of being chosen to be in the sample; this almost never happens.

Exercise \(\PageIndex{1}\)

Let’s say I want to know if there’s a relationship between intelligence and reading science fiction books. If I survey 100 of my Introduction to Psychology students on their intelligence and reading of science fiction:

  • Who is the Sample?
  • Who could be the population? In other words, what is the biggest group that this sample could represent?

Add texts here. Do not delete this text first.

  • Who is the Sample? 100 of my Introduction to Psychology students
  • Who could be the population? In other words, what is the biggest group that this sample could represent? There are many possible populations, but all Introduction to Psychology students could work, or all Introduction to Psychology students at my college might make sense, too.

Sometimes it’s easy to state the population of interest. In most situations the situation is much less simple. In a typical a psychological experiment, determining the population of interest is a bit more complicated. Suppose Dr. Navarro ran an experiment using 100 undergraduate students as participants. Her goal, as a cognitive scientist, is to try to learn something about how the mind works. So, which of the following would count as “the population”:

  • All of the undergraduate psychology students her university in Australia?
  • Undergraduate psychology students in general, anywhere in the world?
  • Australians currently living?
  • Australians of similar ages to my sample?
  • Anyone currently alive?
  • Any human being, past, present or future?
  • Any biological organism with a sufficient degree of intelligence operating in a terrestrial environment?
  • Any intelligent being?

Each of these defines a real group of mind-possessing entities, all of which might be of interest to me as a cognitive scientist, and it’s not at all clear which one ought to be the true population of interest. Maybe surprisingly for you, there's no "right" answer! Although some the suggestions get a little vague, they all could potentially be a population that her sample represents. Irrespective of how the population is defined, the critical point is that the sample is a subset of the population. The goal of researchers is to use our knowledge of the sample to draw inferences about the properties of the population. More on that in later chapters!

Actual drug use is much higher than drug arrests suggest, so you might want to measure how many people use marijuana. If you send out a survey asking about their drug use to everyone with a driver’s license in California, but only 30% fill it out:

  • Who is the Sample? 30% of Californians with driver's licenses
  • Who could be the population? In other words, what is the biggest group that this sample could represent? There are many possible populations, but Californians who have driver's licenses might make the most sense here.

This last example shows that sometimes our sample limits who we can generalize our results about, who could be our population.

In almost every situation of interest, what we have available to us as researchers is a sample of data. We might have run experiment with some number of participants; a polling company might have phoned some number of people to ask questions about voting intentions; etc. Regardless: the data set available to us is finite, and incomplete. We can’t possibly get every person in the world to do our experiment; a polling company doesn’t have the time or the money to ring up every voter in the country etc.

Study Population

  • Reference work entry
  • pp 6412–6414
  • Cite this reference work entry

Book cover

3169 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Babbie, E. R. (2010). The practice of social research . Belmont, CA: Wadsworth Publishing Company.

Google Scholar  

Bickman, L., & Rog, D. J. (1998). Handbook of applied social research methods . Thousand Oaks, CA: Sage Publications.

Friedman, L. M., Furberg, C. D., & DeMets, D. L. (2010). Fundamentals of clinical trials . New York: Springer.

Gerrish, K., & Lacey, A. (2010). The research process in nursing . West Sussex: Wiley-Blackwell.

Henry, G. T. (1990). Practical sampling . Newbury Park, CA: Sage Publications.

Kumar, R. (2011). Research methodology: A step-by-step guide for beginners . London: Sage Publications Limited.

Riegelman, R. K. (2005). Studying a study and testing a test: How to read the medical evidence . Philadelphia: Lippincott Williams & Wilkins.

Download references

Author information

Authors and affiliations.

Sociology Department, National University of Singapore, 11 Arts Link, 117570, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shu Hu .

Editor information

Editors and affiliations.

University of Northern British Columbia, Prince George, BC, Canada

Alex C. Michalos

(residence), Brandon, MB, Canada

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Dordrecht

About this entry

Cite this entry.

Hu, S. (2014). Study Population. In: Michalos, A.C. (eds) Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0753-5_2893

Download citation

DOI : https://doi.org/10.1007/978-94-007-0753-5_2893

Publisher Name : Springer, Dordrecht

Print ISBN : 978-94-007-0752-8

Online ISBN : 978-94-007-0753-5

eBook Packages : Humanities, Social Sciences and Law

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety

population in a research example

Research Population

All research questions address issues that are of great relevance to important groups of individuals known as a research population.

This article is a part of the guide:

  • Non-Probability Sampling
  • Convenience Sampling
  • Random Sampling
  • Stratified Sampling
  • Systematic Sampling

Browse Full Outline

  • 1 What is Sampling?
  • 2.1 Sample Group
  • 2.2 Research Population
  • 2.3 Sample Size
  • 2.4 Randomization
  • 3.1 Statistical Sampling
  • 3.2 Sampling Distribution
  • 3.3.1 Random Sampling Error
  • 4.1 Random Sampling
  • 4.2 Stratified Sampling
  • 4.3 Systematic Sampling
  • 4.4 Cluster Sampling
  • 4.5 Disproportional Sampling
  • 5.1 Convenience Sampling
  • 5.2 Sequential Sampling
  • 5.3 Quota Sampling
  • 5.4 Judgmental Sampling
  • 5.5 Snowball Sampling

A research population is generally a large collection of individuals or objects that is the main focus of a scientific query. It is for the benefit of the population that researches are done. However, due to the large sizes of populations, researchers often cannot test every individual in the population because it is too expensive and time-consuming. This is the reason why researchers rely on sampling techniques .

A research population is also known as a well-defined collection of individuals or objects known to have similar characteristics. All individuals or objects within a certain population usually have a common, binding characteristic or trait.

Usually, the description of the population and the common binding characteristic of its members are the same. "Government officials" is a well-defined group of individuals which can be considered as a population and all the members of this population are indeed officials of the government.

population in a research example

Relationship of Sample and Population in Research

A sample is simply a subset of the population. The concept of sample arises from the inability of the researchers to test all the individuals in a given population. The sample must be representative of the population from which it was drawn and it must have good size to warrant statistical analysis.

The main function of the sample is to allow the researchers to conduct the study to individuals from the population so that the results of their study can be used to derive conclusions that will apply to the entire population. It is much like a give-and-take process. The population “gives” the sample, and then it “takes” conclusions from the results obtained from the sample.

population in a research example

Two Types of Population in Research

Target population.

Target population refers to the ENTIRE group of individuals or objects to which researchers are interested in generalizing the conclusions. The target population usually has varying characteristics and it is also known as the theoretical population.

Accessible Population

The accessible population is the population in research to which the researchers can apply their conclusions. This population is a subset of the target population and is also known as the study population. It is from the accessible population that researchers draw their samples.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Explorable.com (Nov 15, 2009). Research Population. Retrieved Apr 19, 2024 from Explorable.com: https://explorable.com/research-population

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

population in a research example

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

8 facts about Americans with disabilities

A bus passenger in Miami Beach, Florida, uses a handicapped access ramp. (Jeffrey Greenberg/Education Images/Universal Images Group via Getty Images)

July is Disability Pride Month in the United States, commemorating the passage of the Americans with Disabilities Act more than 30 years ago. Overall, there are about 42.5 million Americans with disabilities, making up 13% of the civilian noninstitutionalized population, according to U.S. Census Bureau data from 2021. This group includes people with hearing, vision, cognitive, walking, self-care or independent living difficulties.

Here are eight facts about Americans with disabilities, based on government data and recent Pew Research Center surveys.

Pew Research Center conducted this analysis to share key facts about Americans with disabilities for Disability Pride Month.

The analysis includes data from the U.S. Census Bureau’s American Community Survey, which  defines disability status  by asking about six types of disabilities: serious difficulty with hearing, vision, cognition, walking or climbing stairs, and difficulty with self-care and independent living. Other surveys with different definitions have estimated that a considerably larger share of Americans have disabilities .

Occupational data by disability status comes from the Bureau of Labor Statistics .

Federal education data comes from the National Center for Education Statistics. For the purposes of this analysis, disabled students include those ages 3 to 21 who are served under the federal  Individuals with Disabilities Education Act (IDEA). Through IDEA, children with disabilities are granted a free appropriate public school education and are ensured special education and related services.

Hispanic Americans in this analysis are of any race. All other racial categories include those who are not Hispanic and identify as only one race.

The public opinion findings in this analysis are based on Pew Research Center surveys. Details about each survey’s questions and methodology are available through the links in this analysis.

Due to the nature of the live telephone surveys, some Americans with disabilities are likely underrepresented in this analysis. The figures reported on technology adoption and internet use are from a phone survey that was conducted via landlines and cellphones and likely under-counted adults who are deaf or have difficulty speaking. Our surveys also do not cover those living in institutionalized group quarters, which may include some individuals who are severely disabled.

Older Americans are significantly more likely than younger adults to have a disability. Some 46% of Americans ages 75 and older and 24% of those ages 65 to 74 report having a disability, according to estimates from the Census Bureau’s 2021 American Community Survey (ACS). This compares with 12% of adults ages 35 to 64 and 8% of adults under 35.

A bar chart showing that 46% of Americans ages 75 and older report having a disability.

Americans in certain racial and ethnic groups are more likely to have a disability. American Indians and Alaska Natives (18%) are more likely than Americans of other racial and ethnic backgrounds to report having a disability, according to the 2021 ACS estimates. Asian and Hispanic Americans are least likely to say they have a disability (8% and 10%, respectively). The shares of White and Black Americans who report living with a disability fall in the middle (14% each).

The most common types of disability in the U.S. involve difficulties with walking, independent living or cognition. Some 7% of Americans report having serious ambulatory difficulties – struggling with walking or climbing stairs – according to the ACS estimates. Adults ages 75 and older and those ages 65 to 74 are the most likely to report having this kind of disability (30% and 15%, respectively). Much smaller shares of those ages 35 to 64 (6%) and those ages 18 to 34 (1%) say they have an ambulatory disability.

About 6% of Americans report difficulties with independent living – struggling to do errands alone because of physical, mental or emotional problems. And a similar share (5%) report cognitive difficulties – that is, having trouble remembering, concentrating or making decisions. Each of these disabilities is more common among older Americans than among younger age groups.

Americans with disabilities tend to earn less than those who do not have a disability. Those with a disability earned a median of $28,438 in 2021, compared with $40,948 among those without a disability, according to the Census Bureau . (These figures represent employed civilian noninstitutionalized Americans ages 16 and older. They reflect earnings in the previous 12 months in 2021 inflation-adjusted dollars.)

On average, people with disabilities accounted for 4% of employed Americans in 2022, according to the Bureau of Labor Statistics (BLS). They were most likely to be employed in management occupations (12%) and office and administrative support occupations (11%), according to annual averages compiled by the BLS, which tracks 22 occupational categories. Meanwhile, an average of about 10% of workers in transportation and material moving jobs had a disability in 2022.

Disabled Americans have lower rates of technology adoption for some devices. U.S. adults with a disability are less likely than those without a disability to say they own a desktop or laptop computer (62% vs. 81%) or a smartphone (72% vs. 88%), according to a Center survey from winter 2021 . The survey asked respondents if any “disability, handicap, or chronic disease keeps you from participating fully in work, school, housework, or other activities, or not.” (It’s important to note that there are a range of ways to measure disability in public opinion surveys.)

A dot plot showing that Americans with a disability are less likely than those without one to have traditional computer, smartphone.

Similar shares of Americans with and without disabilities say they have high-speed home internet. Even so, disabled Americans are less likely than those without a disability to report using the internet daily (75% vs. 87%). And Americans with disabilities are three times as likely as those without a disability to say they never go online (15% vs. 5%).

The percentage of U.S. public school students who receive special education or related services has increased over the last decade, according to data from the National Center for Education Statistics. During the 2021-22 school year, there were 7.3 million students receiving special education or related services in U.S. public schools , making up 15% of total enrollment. This figure rose since 2010-11, when 6.4 million disabled students made up 13% of public school enrollment.

In 2021-22, the share of disabled students in public schools varied by state, from about 20% in New York, Pennsylvania and Maine to about 12% in Idaho and Texas. These disparities are likely the result of inconsistencies in how states determine which students are eligible for special education services and some of the challenges involved with diagnosing disabilities in children.

A map that shows New York, Pennsylvania and Maine public schools serve the highest shares of students with disabilities.

Disabled Americans are much more likely than other Americans to have faced psychological distress during the COVID-19 pandemic, according to a winter 2022 Center analysis that examined survey responses from the same Americans over time.

A bar chart showing that Americans with disabilities experienced high levels of distress during the COVID-19 pandemic.

About two-thirds (66%) of adults who have a disability or health condition that keeps them from participating fully in work, school, housework or other activities reported a high level of distress at least once across four surveys conducted between March 2020 and September 2022. That compares with 34% of those who do not have a disability.

Employed Americans generally think their workplace is accessible for people with physical disabilities. Among those who don’t work fully remotely, 76% say their workplace is at least somewhat accessible for people with physical disabilities, according to a Center survey from February . This includes 51% who say it is extremely or very accessible. Another 17% say their workplace is not too or not at all accessible, while 8% are not sure.

Whether or not they consider their own workplace accessible, half of workers say they highly value physical accessibility in the workplace. Workers with disabilities are about as likely as those without disabilities to say this. (Workers are defined as those who are not self-employed and work at a company or organization with more than 10 people.)

Note: This is an update of a post originally published July 27, 2017.

  • Health Care
  • Medicine & Health

Portrait photo of staff

9 facts about Americans and marijuana

5 facts about black americans and health care , 5 facts about hispanic americans and health care, inflation, health costs, partisan cooperation among the nation’s top problems, by more than two-to-one, americans say medication abortion should be legal in their state, most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

IMAGES

  1. Population vs. Sample

    population in a research example

  2. (PDF) CONCEPT OF POPULATION AND SAMPLE

    population in a research example

  3. (DOC) Research Population

    population in a research example

  4. (PDF) A Synopsis on Population and Sample: Quantitative Research

    population in a research example

  5. Population vs Sample in Research

    population in a research example

  6. Examining Populations and Samples in Research

    population in a research example

VIDEO

  1. Aspects of Social Change

  2. Research Population

  3. STATISTICS FOR DATA SCIENCE EP:03 Sample And Population

  4. Population, Target and Accessible population, Sample, sampling, Census study

  5. Writing About Population Research for Non-Scientists

  6. Population vs Sample

COMMENTS

  1. Population vs. Sample

    A population is the entire group that you want to draw conclusions about.. A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population. In research, a population doesn't always refer to people. It can mean a group containing elements of anything you want to study, such as objects, events, organizations, countries ...

  2. Population vs. Sample

    Total: 2) Research population and sample serve as the cornerstones of any scientific inquiry. They hold the power to unlock the mysteries hidden within data. Understanding the dynamics between the research population and sample is crucial for researchers. It ensures the validity, reliability, and generalizability of their findings.

  3. Population vs Sample: Uses and Examples

    Population and Sample Examples. For an example of population vs sample, researchers might be studying U.S. college students. This population contains about 19 million students and is too large and geographically dispersed to study fully. However, researchers can draw a subset of a manageable size to learn about its characteristics.

  4. Statistics without tears: Populations and samples

    A population is a complete set of people with a specialized set of characteristics, and a sample is a subset of the population. The usual criteria we use in defining population are geographic, for example, "the population of Uttar Pradesh". In medical research, the criteria for population may be clinical, demographic and time related.

  5. Population vs. Sample

    Population data consists of information collected from every individual in a particular population. Meanwhile, sample data consists of information taken from a subset—or sample —of the population. In this guide, we'll discuss the differences between population and sample data, the advantages and disadvantages of each, how to collect data ...

  6. 7 Samples and Populations

    So if you want to sample one-tenth of the population, you'd select every tenth name. In order to know the k for your study you need to know your sample size (say 1000) and the size of the population (75000). You can divide the size of the population by the sample (75000/1000), which will produce your k (750).

  7. What Is the Big Deal About Populations in Research?

    A population is a complete set of people with specified characteristics, while a sample is a subset of the population. 1 In general, most people think of the defining characteristic of a population in terms of geographic location. However, in research, other characteristics will define a population.

  8. 8.1: Samples, Populations and Sampling

    Defining a population. A sample is a concrete thing. You can open up a data file, and there's the data from your sample. A population, on the other hand, is a more abstract idea.It refers to the set of all possible people, or all possible observations, that you want to draw conclusions about, and is generally much bigger than the sample. In an ideal world, the researcher would begin the ...

  9. 3. Populations and samples

    Answers Chapter 3 Q3.pdf. Populations In statistics the term "population" has a slightly different meaning from the one given to it in ordinary speech. It need not refer only to people or to animate creatures - the population of Britain, for instance or the dog population of London. Statisticians also speak of a population.

  10. Sampling Methods

    Population vs. sample. First, you need to understand the difference between a population and a sample, and identify the target population of your research. The population is the entire group that you want to draw conclusions about. The sample is the specific group of individuals that you will collect data from.

  11. 1.3: Populations and Samples

    You must choose 400 names for the sample. Number the population 1-20,000 and then use a simple random sample to pick a number that represents the first name in the sample. Then choose every fiftieth name thereafter until you have a total of 400 names (you might have to go back to the beginning of your phone list).

  12. PDF Describing Populations and Samples in Doctoral Student Research

    The sampling frame intersects the target population. The sam-ple and sampling frame described extends outside of the target population and population of interest as occa-sionally the sampling frame may include individuals not qualified for the study. Figure 1. The relationship between populations within research.

  13. Introduction to Population Demographics

    Much like population size, sex ratio is a simple concept with major implications for population dynamics. For example, stable populations may maintain a 1:1 sex ratio and therefore keep their ...

  14. Populations, Parameters, and Samples in Inferential Statistics

    In both cases, your sample or population is defined by the scope of your research question or area of interest. The distinction between a sample and a population isn't a fixed, objective attribute of a set of data, but rather a perspective that depends on the particular context and research goals. I hope this provides some clarity on your ...

  15. Samples & Populations in Research

    Population and sample in research are often confused with one another, so it is important to understand the differences between the terms population and sample. A population is an entire group of ...

  16. Population vs Sample

    A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population. In research, a population doesn't always refer to people. It can mean a group containing elements of anything you want to study ...

  17. Population and Samples: the Complete Guide

    For example, all the countries in the world are an example of a population — or even the number of males in the UK. The size of the population can vary according to the target entities in question and the scope of the research. ... Boost the accuracy of your research with a sample methodology that's 47% more consistent than standard ...

  18. 1.5: Populations and Samples

    Definition: Population. The biggest group that your sample can represent. A population is the "everyone" that we want to apply the results to. Sometimes, "everyone" can be a pretty small group; if I measured the GPA of one of my Behavioral Statistics classes, then the sample would be the class and the population could be students in all ...

  19. Study Population

    Study population is a subset of the target population from which the sample is actually selected. It is broader than the concept sample frame.It may be appropriate to say that sample frame is an operationalized form of study population. For example, suppose that a study is going to conduct a survey of high school students on their social well-being. ...

  20. Research Population

    A research population is generally a large collection of individuals or objects that is the main focus of a scientific query. It is for the benefit of the population that researches are done. However, due to the large sizes of populations, researchers often cannot test every individual in the population because it is too expensive and time ...

  21. Identifying a sample and population (video)

    I feel some of these problems on population/sample are ambiguously worded. ... Yes, a research question and a hypothesis are distinct concepts. A research question is a broad inquiry into a topic, seeking to understand or explore a phenomenon. On the other hand, a hypothesis is a specific, testable statement that predicts the relationship ...

  22. Unraveling Research Population and Sample: Understanding their ...

    The research population, also known as the target population, refers to the entire group or set of individuals, objects, or events that possess specific characteristics and are of interest to the researcher. It represents the larger population from which a sample is drawn. The research population is defined based on the research objectives and the

  23. PDF Understanding Population and Sample in Research: Key Concepts for Valid

    significance of population and sample in research lies in their role in making valid and reliable inferences about a larger group of interest. By studying the sample, researchers can draw meaningful conclusions that can be generalized to the larger population, making research more feasible, cost -effective, and time- efficient. The accuracy

  24. Sustainability

    The research found the following: First, between 1986 and 2022, the development of urban-rural integration in the Greater Bay Area steadily progressed, with gradually emerging effects, and industrial integration and population integration made significant contributions to the development of urban-rural integration.

  25. 8 facts about Americans with disabilities

    Some 46% of Americans ages 75 and older and 24% of those ages 65 to 74 report having a disability, according to estimates from the Census Bureau's 2021 American Community Survey (ACS). This compares with 12% of adults ages 35 to 64 and 8% of adults under 35. Americans in certain racial and ethnic groups are more likely to have a disability.

  26. The population genetics of Strongyloides papillosus in ...

    Strongyloides nematodes are parasites of livestock, and S. papillosus infects ruminant livestock that can cause disease. Recent genomic analysis of several Strongyloides species is now facilitating the population genomic analyses of natural Strongyloides infections, for example finding that S. ratti in wild UK rats exists as an assemblage of long-lived, asexual lineages.

  27. Long-term Care Market To Reach USD 2.1 Trillion By 2032

    The long-term care market size was valued at USD 1.2 Trillion in 2023 and is expected to reach a market size of USD 2.1 Trillion by 2032 at a CAGR of 6.6%. Fort Collins, Colorado, April 20, 2024 ...