Sample determination is the act of selecting the amount of observation or replication to be included in the statistical sample. The sample size is an important feature of any empirical study in which the aim is to draw conclusions about a population from the sample. In practice, the sample size used in the study is determined based on the cost of data collection, and the need to have sufficient statistical power. In complex studies there may be several different sample sizes involved in this study: for example, in multilevel surveys there will be different sample sizes for each strata. In the census, data is collected on the entire population, then the sample size is the same as the population size. In an experimental design, in which a study can be divided into different treatment groups, this may be a different sample size for each group.
Sample size can be selected in several different ways:
- Experience - The choice of small sample sizes, although sometimes necessary, can result in confidence intervals or the risk of extensive error in testing statistical hypotheses.
- uses the target variance for estimates derived from the ultimately obtained sample, ie if high precision is needed (narrow confidence interval) it is translated to the low target variance of the estimator.
- uses targets for statistical test powers to be applied after samples are collected.
- using the level of confidence, ie the greater the level of confidence required, the larger the sample size (given the constant precision requirements).
Video Sample size determination
Introduction
Larger sample sizes generally lead to increased precision when estimating unknown parameters. For example, if we want to know the proportion of certain fish species infected with pathogens, we will generally have a more appropriate proportion estimate if we take samples and examine 200 rather than 100 fish. Some fundamental facts of mathematical statistics describe this phenomenon, including the law of large numbers and the central limit theorem.
In some situations, increased accuracy for larger sample sizes is minimal, or even absent. This can result from systematic errors or strong dependencies in the data, or if the data follows a heavy-tailed distribution.
The sample size is assessed based on the quality of the resulting estimates. For example, if the proportion is being estimated, one might want to have a 95% confidence interval of less than 0.06 units. Alternatively, the sample size can be assessed on the strength of the hypothesis test. For example, if we compare support for certain political candidates among women with support for candidates among men, we may want to have an 80% power to detect differences in the 0.04 support level unit.
Maps Sample size determination
Estimates
Estimated proportions
A relatively simple situation is an approximate proportion. For example, we might want to estimate the proportion of people in communities who are at least 65 years old.
, di mana X adalah jumlah pengamatan 'positif' (mis. Jumlah orang yang keluar dari n > sample orang-orang yang berusia setidaknya 65 tahun). Ketika pengamatan independen, estimator ini memiliki distribusi binomial (scala) (dan juga merupakan sampel data dari distribusi Bernoulli). Variable maximum should be distributed to adalah 0.25/ n , yang terjadi ketika parameter yang benar adalah p = 0.5. Dalam prakteknya, karena p tidak diketahui, a varied maximum sering digunak untuk penilaian ukuran sampel.
Untuk cukup besar n , distribusi akan didekati secara dekat oleh distribusi normal. Deny menggunakan pendekatan ini, dapat ditunjukkan bahwa sekitar 95% dari kemungkinan distribusi ini terletak dalam 2 standar deviasi dari mean. Menggunakan metode Wald untuk distribute binomial, intermediate bentuk
-
Akan membentuk interval kepercayaan 95% untuk proporsi yang sebenarnya. Jika interval ini harus tidak lebih dari W unit yang luas, persamaannya
-
can be solved for n , producing n Ã, = Ã, 4/ W 2 Ã, = Ã, 1/ B 2 where B is an error bound to the estimate, that is, the estimate is usually given as in Ã, à ± B . So for B = 10%, someone needs n = 100, because B = 5% need n = 400, for B = 3% the terms are close to n = 1000, whereas for B = 1% sample size n = 10,000 required. These figures are often quoted in news reports from polls and other sample surveys.
Average estimates â ⬠<â â¬
Adalah kasus khusus yang berarti offered. Ketika memperkirakan populasi berarti menggunakan independen dan terdistribusi secara identik (iid) sample ukuran n , di mana setiap nilai data memiliki varians ? 2 , kesalahan standar dari mean sampla adalah:
-
Ekspresi ini menggambarkan secara kuantitatif bagaimana perkiraan menjadi lebih tepat ketika ukuran sampel meningkat. Menggunakan teorema batas pusat untuk membenarkan perkiraan sampel dengan distribusi normal menghasilkan perkiraan interval kepercayaan 95% dari formulir
-
Jika kita ingin memiliki interval kepercayaan yang lebar W , kita akan menyelesaikannya
-
for n , generate a sample size n Ã, = Ã, 16 ? 2 /W 2 .
For example, if we are interested in estimating the number of drugs that lower the blood pressure of the subject with a confidence interval of six units wide, and we know that the standard deviation of blood pressure in the population is 15, then the required sample size is 100.
src: i.ytimg.com
Sample size required for hypothesis testing
A common problem faced by statisticians is to calculate the sample size needed to generate a certain strength for the test, given the specified Type I error rate. As follows, this can be predicted by a predetermined table for certain values, by the Mead source equation, or, more generally, by the cumulative distribution function:
Table
The table shown on the right can be used in a two-sample t test to estimate the sample size of experimental groups and control groups of equal size, ie the total number of individuals in a trial is twice the amount given, and the desired level of significance is 0.05. The parameters used are:
- The desired statistical power of the trial, shown in the column to the left.
- Cohen d (= effect size), which is the expected difference between the mean value of the target between the experimental group and the control group, divided by the expected standard deviation.
Mead resource equations
The Mead resource equation is often used to estimate the size of laboratory animal samples, as well as many other laboratory experiments. It may not be as accurate as using other methods of estimating sample sizes, but rather giving an indication of exactly what sample sizes in which parameters such as expected standard deviation or expected differences in intergroup values ââare unknown or very difficult to estimate.
All the parameters in the equation are actually the degrees of freedom of some of their concepts, and hence, their number is reduced by 1 before insertion into the equation.
Adamah Persamaannya:
-
Where:
- N is the total number of individuals or units in the study (minus 1)
- B is the blocking component , representing the allowed environmental effect in the design (minus 1)
- T is the maintenance component , according to the number of treatment groups (including the control group) used, or the number of questions asked (minus 1)
- E is the degree of freedom of the error component , and should be between 10 and 20.
For example, if a study using laboratory animals is planned with four treatment groups (T = 3), with eight animals per group, making 32 total animals ( N = 31) , without further stratification ( B = 0), then E will be equal to 28, which is above cutoff 20, indicating that the sample size may be a bit too large, and six animals per group may be more appropriate.
Cumulative distribution function
Biarkan X i , i = 1, 2,..., n adalah pengamatan independen yang diambil dari normal distribusi dengan mean tidak diketahui? give variant yang dikenal? 2 . Mari kita pertimbangkan second hypothesis, hypothesis nol:
-
give hypothesis alternatif:
-
for some 'smallest significant differences'? * & gt; 0. This is the smallest value we care about by observing the difference. Now, if we want (1) to deny H 0 with probability at least 1-? when H a is correct (ie power 1-?), and (2) reject H 0 with probability? when H 0 is correct, then we need the following:
Jika z ? ada di atas? titik persentase dari distribusi normal standar, lalu
-
give sebagainya
- 'Tolak H 0 jika rata-rata sampel kami ( ) lebih dari '
is the decision rule that satisfies (2). (Note, this is a 1-tail test)
Sekarang kita berharap ini terjadi dengan probabilitas setidaknya 1-? ketika H a benar. Dalam hal ini, rata-rata sampel kami akan berasal dari distribusi Normal dengan mean? * . Karena itu, kami membutuhkan
-
Melalui manipulasi yang hati-hati, ini dapat ditunjukkan (lihat Kekuatan statistik # Contoh) untuk terjadi kapan
-
di mana adalah fungsi distribusi kumulatif normal.
src: i.ytimg.com
Ukuran sampel bertingkat
With more complicated sampling techniques, such as multilevel sampling, samples can often be divided into sub-samples. Usually, if there are H sub-samples (from H different strata) then each of them will have sample size n h , h = 1, 2,..., H . This n h must comply with the n 1 n 2 rule ... n = n (ie that the total sample size is given by the number of sizes sub-samples). Choosing this n h optimally can be done in various ways, using (for example) the optimal allocation of Neyman.
There are many reasons for using stratified sampling: to reduce the variance of sample estimates, to use partial non-random methods, or to study strata individually. A useful, partially non-random method is to take individual samples where easily accessible, but, where not, sample clusters to save on travel costs.
Secara umum, untuk H strata, mean sampel berbobot adalah
-
denounce
-
yang dapat dibuat minimum jika tingkat sampling dalam setiap strata dibuat sebanding dengan standar deviasi dalam setiap strata: , di mana dan adalah constant seperti .
Suatu "alokasi optimal" tercapai ketika sampling rate dalam strata dibuat berbanding lurus dengan standar deviasi dalam strata dan berbanding terbalik dengan akar kuadrat dari biaya sampling per elemen dalam strata, :
-
di mana adalah constant seperti , atau, lebih umum, kapan
-
src: www.sub.edu.bd
Penelitian kualitatif
Sample determination in qualitative studies takes a different approach. This is generally a subjective assessment, taken as a result of research. One approach is to continue inserting participants or further material until saturation is reached. The amount required to achieve saturation has been studied empirically.
There is a lack of reliable guidance in estimating sample sizes before starting the study, with various suggestions provided. Tools similar to quantitative power calculations, based on negative binomial distributions, have been suggested for thematic analysis.
src: i.ytimg.com
See also
- Experiment design
- Sample surface of the engineering response under Stepwise regression
- Cohen h
src: i.ytimg.com
Note
src: slideplayer.com
References
- Bartlett, J. E., II; Kotrlik, J. W.; Higgins, C. (2001). "Organizational Research: Determining the right sample size for survey research" (PDF) . Information Technology, Learning, and Performance Journals . 19 (1): 43-50.
- Kish, L. (1965). Survey Sampling . Wiley. ISBNÃ, 0-471-48900-X.
- Smith, Scott (April 8, 2013). "Determining Sample Size: How to Ensure You Get the Correct Sample Size | Qualtrics". Qualtrics . Retrieved November 15 2016 .
src: i.ytimg.com
Further reading
- NIST: Selecting Sample Size
- ASTM E122-07: Standard Practice for Calculating Sample Size to Estimate, With Specified Accuracy, Average for Multiple Characteristics or Processes
Source of the article : Wikipedia