Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.
Updated August 19, 2024 Fact checked by Fact checked by Katharine BeerKatharine Beer is a writer, editor, and archivist based in New York. She has a broad range of experience in research and writing, having covered subjects as diverse as the history of New York City's community gardens and Beyonce's 2018 Coachella performance.
Stratified random sampling is a method of sampling that involves the division of a population into smaller subgroups known as strata. In stratified random sampling, or stratification, the strata are formed based on members’ shared attributes or characteristics, such as income or educational attainment.
Stratified random sampling has numerous applications and benefits, such as studying population demographics and life expectancy. It is also called proportional random sampling or quota random sampling.
When completing an analysis or research on a group of entities with similar characteristics, a researcher may find that the population size is too large to complete research on it. To save time and money, and to make the research more feasible, an analyst may select a small group from the overall population. This is known as sampling.
The small group is referred to as a sample size, which is a subset of the population used to represent the entire population. A sample may be selected from a population in many ways, one of which is the stratified random sampling method.
Stratified random sampling involves dividing the entire population into homogeneous groups called strata (the plural of stratum). Random samples are then selected from each stratum to analyze the different experiences or outcomes associated with each demographic groups represented by the strata.
Stratified sampling is used to highlight differences among groups in a population. This is different from simple random sampling, which treats all members of a population as equal, with an equal likelihood of being sampled.
For example, consider an academic researcher who would like to know the number of MBA students in a specific graduating year who received a job offer within three months of graduation. The researcher will soon find that there were almost 200,000 MBA graduates for the year. They could take a simple random sample of 50,000 graduates and run a survey. To learn more, however, they could divide the population into strata and take a random sample from the strata.
To do this, they would create population groups based on gender, age range, race, country of nationality, and career background. A random sample from each stratum is taken in a number proportional to the stratum’s size compared with the population. These subsets of the strata are then pooled to form a random sample, which the researcher would analyze for differences in which groups receive job offers after graduation.
Simple random samples and stratified random samples are both statistical measurement tools. A simple random sample is used to represent the entire data population. A stratified random sample divides the population into smaller groups, or strata, based on shared characteristics. However, stratified sampling is more complicated, time-consuming, and potentially more expensive to carry out than simplified random sampling.
The simple random sample is often used when:
For instance, a candy company may want to study the buying habits of its customers to determine the future of its product line. If there are 10,000 customers, the company may choose 100 of those customers as a random sample. It can then apply what it finds from those 100 customers to the rest of its base.
With stratified random sampling, those 100 customers would be divided into strata based on age, income, or other characteristics. But with only 100 people in the sample, there won't be many in each strata, or there might not be many differences between the strata. In this case, it would make more sense to use simple random sampling and sample 100 members purely at random without defining their individual characteristics.
Stratified random sampling ensures that each subgroup of a given population is adequately represented within the whole sample population of a research study. Stratification can be proportionate or disproportionate.
With proportionate stratification, the sample size of each stratum is proportionate to the population size of the stratum. This type of stratified random sampling is often a more precise metric because it’s a better representation of the overall population.
For example, suppose a researcher is looking at a population of 180,000 people and wants to use a sample of 50,000, stratified by using age range. The researcher would use the formula:
Proportionate stratified random sample = (Sample size / Population size) × Stratum size
Age group | 24–28 | 29–33 | 34–37 | Total |
---|---|---|---|---|
Number of people in stratum | 90,000 | 60,000 | 30,000 | 180,000 |
Strata sample size | 25,000 | 16,667 | 8,333 | 50,000 |
The strata sample size in the age range of 24 to 28 years old is calculated as:
(50,000/180,000) × 90,000 = 25,000
The same method is used for the other age-range groups. Now that the strata sample size is known, the researcher can perform simple random sampling in each stratum to select their survey participants.
In other words, 25,000 people ages 24 to 28 will be selected randomly from the entire population, 16,667 people ages 29 to 33, and so on.
In a disproportional stratified sample, the size of each stratum is not proportional to its size in the population. The researcher may decide to sample half of the graduates within the 34 to 37 age group and one-third of the graduates within the 29 to 33 age group.
It is important to note that one person cannot fit into multiple strata. Each entity must only fit in one stratum. Having overlapping subgroups means that some individuals will have higher chances of being selected for the survey, which negates the concept of stratified sampling as a type of probability sampling.
Portfolio managers can use stratified random sampling to create portfolios by replicating an index such as a bond index.
The main advantage of stratified random sampling is that it captures key population characteristics in the sample. Similar to a weighted average, this method of sampling produces characteristics in the sample that are proportional to the overall population. Stratified random sampling works well for populations with a variety of attributes in which subgroups can be formed.
Stratification gives a smaller error in estimation and greater precision than the simple random sampling method. The greater the differences among the strata, the greater the gain in precision.
Unfortunately, this method of research cannot be used in every study. To use it, researchers must be able to identify every member of a population being studied and classify each of them into one, and only one, subpopulation. If researchers can’t confidently classify every member of the population into a subgroup, stratified random sampling can't be used. This can be especially difficult if a definitive list of an entire population isn't available.
Overlapping can be an issue if there are subjects that fall into multiple subgroups. When simple random sampling is performed, those who are in multiple subgroups are more likely to be chosen. The result could be a misrepresentation or inaccurate reflection of the population. If the sorting process is too difficult, stratified random sampling becomes ineffective.
Suppose a research team wants to determine the grade point average (GPA) of college students across the United States. The research team has difficulty collecting data from all 21 million college students and decides to take a random sample of the population by using 4,000 students.
Now assume that the team looks at the different attributes of the sample participants and wonders if there are any differences in GPAs relative to students’ majors. Suppose it finds that 560 students are English majors, 1,135 are science majors, 800 are computer science majors, 1,090 are engineering majors, and 415 are math majors. The team wants to use a proportional stratified random sample where the stratum of the sample is proportional to the random sample in the population.
Assume the team researches the demographics of college students in the U.S. and finds that: 12% major in English, 28% major in science, 24% major in computer science, 21% major in engineering, and 15% major in mathematics. Thus, five strata are created from the stratified random sampling process.
The team then needs to confirm that the stratum of the population is in proportion to the stratum in the sample; however, they find the proportions are not equal. The team would need to resample 4,000 students from the population and randomly select 480 English, 1,120 science, 960 computer science, 840 engineering, and 600 mathematics students.
With those groups, the research team has a proportionate stratified random sample of college students, which provides a better representation of students’ college majors in the U.S. The researchers can then highlight specific strata and investigate GPA with added information about the students' majors.
Stratified random sampling is often used when researchers want to know about different subgroups or strata based on the entire population being studied. For instance, researchers might want to explore outcomes for groups based on differences in race, gender, or education.
The best method of sampling to use will depend on the nature of the analysis and the data being used. In general, simple random sampling is often the easiest and cheapest to use. However, stratified sampling can produce a more accurate sample relative to the population under study.
There are two main types of stratified random sampling: proportionate and disproportionate sampling. Proportionate sampling takes each stratum in the sample as proportionate to the population size of the stratum. In disproportionate sampling, the analyst will over- or under-sample certain strata based on the research question or study design being employed.
The strata will depend on the subgroups in which you are interested that appear in your population. These subgroups are based on shared characteristics among participants such as gender, race, educational attainment, geographic location, or age group.
Stratified random sampling is the process of creating subgroups in a dataset according to various factors, such as age, gender, income level, or education. Subsequently, a random sample is taken from each of the strata, which allows researchers to obtain samples from various subgroups, including those that may be under-represented.
In this way, a stratified random sample may provide a more comprehensive picture of a broader dataset. However, using this method may not be possible across all studies depending on the population or sample size, level of information available about the population, and the time and resources available. Overall, the benefit of stratified random sampling is that it allows for a more accurate and nuanced representation of a population, compared with a simple sampling method.