醫(yī)學統(tǒng)計學-電子教材:Basics

來源：南方醫(yī)科大學精品課程網(wǎng) 精品課程網(wǎng)

醫(yī)學統(tǒng)計學:電子教材 Basics:ContentBasics.1Workbook..2Excellinks.4Calculator..4Analysis.7Statistics.7Pvalues.8ConfidenceIntervals.10Degreesoffreedom..11Epidemiology..12Causality..13Bias.14Confounding..15Prospect

Content

Basics. 1

Workbook.. 2

Excel links. 4

Calculator.. 4

Analysis. 7

Statistics. 7

P values. 8

ConfidenceIntervals. 10

Degrees of freedom.. 11

Epidemiology.. 12

Causality.. 13

Bias. 14

Confounding.. 15

Prospective vs. retrospectivestudies 15

Basics

StatsDirect combinesnumerical tools with help to assist you in the design, analysis, inference andpresentation of quantitative research. It is not a substitute for astatistician. The best statistical practice is achieved by co-operation ofinvestigator and statistician, starting at the planning stage of a study. StatsDirect can help an investigator to understand thebasics of statistical practice and to carry out the most commonly usedanalyses. In this way, StatsDirect improvescommunication between investigator and statistician. We appreciate that manyinvestigators do not have the resources to consult with a statistician;therefore, we have used this help text to address statistical misconceptionsthat investigators commonly present to us. StatsDirectdata input and result screens use as little statistical jargon as possible inorder to improve understanding by the non-statistician.

The StatsDirect helpsystem is not meant to replace statistical textbooks or emulate their style ofpresentation. This help system is designed for on-screen use within StatsDirect. For further reading we recommend that you seekout key references listed in the reference list.

Sections of StatsDirecthelp relating to common situations are:

Statistics

Epidemiology

UnderstandingP values and degrees of freedom

Understandingconfidence intervals

Causality, biasand confounding

Retrospective vs. prospective studies

Reference list

Statistical method selection

Contacts

Workbook

The StatsDirectworkbook operates like spreadsheet software such as Microsoft Excel. If youknow how to use Excel then you will find your way around the StatsDirect worksheet easily.

This page gives the most basicinformation that you need to get started using the StatsDirectworkbook. Please also read the following sections in order to get the most outof managing your data in StatsDirect. You may wish toprint out and digest the first three of following sections.

WORKBOOK AND WORKSHEET BASICS

WORKING WITH DATA IN WORKSHEETS

FORMATTING WORKSHEETS

WORKSHEET FUNCTIONS

INTRODUCTION

A workbook consists of worksheets that aredivided into rows and columns forming a matrix of cells. You may enter data orformulae into cells. The active cell is highlighted with a rectangular border,you can move the active cell by clicking on another cell with the mouse or byusing the cursor keys (arrows).

ENTERING DATA

Numbers entered can range from -1E-307 to1E+307. Text is any sequence of letters and numbers which the workbook does notrecognise as another form of data. If you want toenter numbers as labels then you must put quotes around them, i.e."2" is treated as a string and 2 is treated as a number. See dates and times for more information on thesedata forms. Logical data are true or false, true is represented by 1 and falseby 0 in any analysis you perform.If you want to enter coded data such as M, F, Male, female etc. then please usethe search and replace function to convert them tonumerical codes before analysis.

COPYING AND MOVING DATA

Most spreadsheet software enables you to selectcells and copy/paste/delete/move them. Selected cells are displayed with ablack background instead of white. To select an entire sheet you can click onthe top left hand cell where row headings intersect with column headings. Toselect a single range of cells hold down the left mouse button and drag themouse over them, or hold down the shift key and use the arrow keys to highlightthe range. To select more than one range at a time hold down the Ctrl key andselect ranges as described. Once you have the range(s) you want selected youcan copy, delete, paste or move the data. To copy the selected range(s) tomemory, hold down the Ctrl key and press the C key. You can then hold down thecontrol (Ctrl) key and press the V key to retrieve the range and paste theminto the same workbook at a different location, another StatsDirectworkbook or an external application such as Microsoft Excel. Data can be copiedfrom external spreadsheets into StatsDirect in thisway also. To move a selected range, position the mouse cursor over a border ofthe range and drag the range using the mouse. You can also use cut and pasteoperations to move ranges. The delete key clears a selected range. If youaccidentally delete or move data then press Ctrl+Z orselect "Undo" from the edit menu to undo the change.

LABELLING COLUMNS

StatsDirect analysesmost data in columns. The label for a column can be put in the first row or inany other area of that column which might be selected as a sub-column to analyse, i.e. one column of the workbook can contain morethan one column for analysis. If a selected range of cells does not contain astring at the top then StatsDirect uses the workbookcolumn label as its heading otherwise the first string in a selected range isused as the label for that selection. The test data file supplied with StatsDirect uses text strings in row 1 as labels for eachcolumn. You can set the header label of a column by double clicking on it withthe left mouse button. Row labels can be set in the same way.

ENTERING FORMULAE

If you are familiar with using formulae inspreadsheets such as Microsoft Excel then you will be able to use formulae in StatsDirect workbooks without further instruction. An entryis treated as a formula if you start with the equal to sign "=", i.e.=A1-B1 gives the subtraction of column 1 row 1 from column 2 row 2. A range ofcells is represented by a colon, i.e. A1:B10 is the rage from column 1 row 1 tocolumn 2 row 10. If you want to repeat a formula butmake it relevant to the columns below or to the right then enter the firstformula and move the mouse cursor to the bottom right hand corner of the cell.Now hold down the left mouse button and drag down; you will see that the formulaebelow are not just copies of the parent formulae but that all cell referenceshave been translated to those relevant to the particular row/column you havedragged to. Here the cell references change because they are relative, if youwant to make them absolute then put a dollar sign "$" before eitherpart of the reference, i.e. $A1 is absolute column 1 relative row 1 and $A$1 isabsolute column 1 row 1. See Worksheet Functions for information on thefunctions you can use in formulae.

INTS

1. Confidenceintervals (CI) are very useful in statistical inference. StatsDirect places strong emphasis on confidence intervalanalysis. Wherever possible, the most exact method for the CI has been used.Before calculation of a CI a dialogue box asks you to select a coefficient ofconfidence. The default 95% confidence level is selected routinely when youpress the enter key. You can turn off this dialogue box using the optionsselection of the analysis menu; in this situation a 95% CI is selectedautomatically.

2. Some of the StatsDirectfunctions are time consuming. When a process is taking an appreciable amount oftime the mouse pointer changes from an arrow to an hourglass and a progressmeter is often displayed.

Excel links

StatsDirect can sharedata with Microsoft Excel in two ways:

A. Reads Excel compatible files directly

B. Links via the StatsDirect-Excellink add-in (for Excel 5, 95, 97 or 2000).

Reading Excel filesinto StatsDirect

StatsDirect can readMicrosoft Excel compatible files directly: use the Open item in the File menu.

Using the StatsDirect-Excel link add-in with Microsoft Excel 5, 95,97 or 2000

The first time you run StatsDirectit will look to see if you have a compatible version of Microsoft Excel, if youdo then StatsDirect will install an add-in into Excelthat gives you a new menu in Excel called "StatsDirect".This add-in provides a data link between Excel and StatsDirect.If you are working in Excel and wish to analyse datain StatsDirect then all you need do is select "StatsDirect" from the "StatsDirect"menu in Excel. When a workbook is transferred from Excel to StatsDirectit will be labelled "~Excel" in StatsDirect.

If the automatic installation of the Excel-StatsDirect data link add-in fails then you can add itmanually: start Excel, goto the "Tools"section of the menu, then to "Add-Ins", then to "Browse"and search for StatsDirectExcelLink.xla in thedirectory where you installed StatsDirect (usuallyC:\Program Files\StatsDirect).

The StatsDirectExcelLink.xlafile is copied to the current user’s startup directory for Excel, which meansthat it loads when Excel starts up. You can switch this on or off manually viathe Tools_Setup Tools menu.

Calculator

Menu location: Tools_StatsDirect Calculator.

The StatsDirectcalculator can be used both within StatsDiref1411.cn/pharm/ct andindependently as a replacement for the Windows calculator. The calculatorevaluates expressions in the form of simple arithmetic or more complex algebra.

All calculations are performed in IEEE doubleprecision.

The "save" button copies theexpression currently evaluated and its result to a list from which you canselect saved expressions to paste into new ones.

When you close the calculator it will pastesaved expressions and results into a report in StatsDirectif the "save results to report on exit" box at the bottom left of thecalculator is checked.

Constants
PI	3.14159265358979323846 ()
EE	2.71828182845904523536 (e)

Arithmetic Functions
ABS	absolute value
CLOG	common (base 10) logarithm
CEXP	anti log (base 10)
EXP	anti log (base e)
LOG	natural (base e, Naperian) logarithm
LOGIT	logit: log(p/(1-p), p=proportion
ALOGIT	antilogit: exp(l)/1+exp(l), l=logit
SQR or SQRT	square root
!	factorial (maximum 170.569)
LOG!	log factorial
IZ	normal deviate for a p value
UZ	upper tail p for a normal deviate
LZ	lower tail p for a normal deviate
TRUNC or FIX	integer part of a real number
CINT	real number rounded to nearest integer
INT	real number truncated to integer closest to zero

Please note that the largest factorial allowedis 170.569398315538748, but you can work with Log factorials via the LOG! function, e.g. LOG!(272).

Arithmetic Operators
^	exponentiation (to the power of)
+	addition
-	subtraction
*	multiplication
/	division
\	integer division

Calculations give an order of priority toarithmetic operators, this must be considered whenentering expressions. For example, the result of the expression "6 -3/2" is 4.5 and not 1.5 because division takes priority over subtraction.

Priority of arithmetic operators in descendingorder

1. Exponentiation(^)

2. Negation (-X)(Exception = x^-y; i.e. 4^-2 is 0.0625 and not -16)

3. Multiplicationand Division (*, /)

4. Integer Division(\)

5. Addition andSubtraction (+, -)

Trigonometric Functions
ARCCOS	arc cosine
ARCCOSH	arc hyperbolic cosine
ARCCOT	arc cotangent
ARCCOTH	arc hyperbolic cotangent
ARCCSC	arc cosecant
ARCCSCH	arc hyperbolic cosecant
ARCTANH	arc hyperbolic tangent
ARCSEC	arc secant
ARCSECH	arc hyperbolic secant
ARCSIN	arc sine
ARCSINH	arc hyperbolic sine
ATN	arc tangent
COS	cosine
COT	cotangent
COTH	hyperbolic cotangent
CSC	cosecant
CSCH	hyperbolic cosecant
SIN	sine
SINH	hyperbolic sine
SECH	hyperbolic secant
SEC	secant
TAN	tangent
TANH	hyperbolic tangent

To convert degrees to radians, multiply degreesby pi/180. To convert radians to degrees, multiply radians by180/pi.

Logical Functions
AND	logical AND
NOT	logical NOT
OR	logical OR
<	less than
=	equal to
>	greater than

Analysis

Available under the Analysis menu section at all times:

EXACT TESTS ON COUNTS

CHI-SQUARE TESTS

PROPORTIONS

RATES

SAMPLE SIZE

DISTRIBUTIONS

RANDOMIZATION

MISCELLANEOUS

Also available under theAnalysis menu section when aworkbook is active:

DESCRIPTIVE STATISTICS

PARAMETRIC METHODS

NON-PARAMETRIC METHODS

ANALYSIS OF VARIANCE

REGRESSION AND CORRELATION

 AGREEMENT ANALYSIS

SURVIVAL ANALYSIS

META-ANALYSIS

CROSSTABS

FREQUENCIES

GRAPHICS

Statistics

Statistics with an upper case letter S refers tothe science and discipline of Statistics, which can be defined as the measurement of uncertainty.

Statistics with a lower case letter s refers tonumbers that summarise other numbers in some way. Forexample the arithmetic mean or average value of a sample ofnumbers is a statistic commonly used to describe the central location of the distributionof numbers in the population from which the sample was drawn.

The terms sample and population are veryimportant in the language used by Statisticians. Many statistical methods arebased upon drawing a sample at random from a population because it would beimpractical to study the whole population. Samples drawn at random havemathematical properties that have enabled Statisticians to create numericalmethods that measure how uncertain an investigator should be that their samplerepresents the population they are studying.

You should be familiar with the basic conceptsof Statistics before you use this software. Please digest some introductorylearning materials, such as Bland (2000) or selected web sites.

The following are basic elements of Statisticsthat you should understand:

UnderstandingP values and degrees of freedom

Understandingconfidence intervals

Basics

P values

The P value or calculated probability is theestimated probability of rejecting the nullhypothesis (H0) of a study question when that hypothesis is true.

The null hypothesis is usually an hypothesis of "no difference" e.g. nodifference between blood pressures in group A and group B. Define a nullhypothesis for each study question clearly before the start of your study.

The only situation in which you should use a one sided P value is when a largechange in an unexpected direction would have absolutely no relevance to yourstudy. This situation is unusual; if you are in any doubt then use a two sided P value.

The term significancelevel (alpha) is used to refer to a pre-chosen probability and the term"P value" is used to indicate a probability that you calculate aftera given study.

The alternativehypothesis (H1) is the opposite of the null hypothesis; in plainlanguage terms this is usually the hypothesis you set out to investigate. Forexample, question is "is there a significant (not due to chance)difference in blood pressures between groups A and B if we give group A thetest drug and group B a sugar pill?" and alternative hypothesis is "there is a difference in blood pressures between groups A and B if we givegroup A the test drug and group B a sugar pill".

If your P value is less than the chosensignificance level then you reject the null hypothesis i.e. accept that yoursample gives reasonable evidence to support the alternative hypothesis. It doesNOT imply a "meaningful" or "important" difference; that isfor you to decide when considering the real-world relevance of your result.

The choice of significance level at which youreject H0 is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong), 1% and 0.1%(P < 0.05, 0.01 and 0.001) levels have been used. These numbers can give afalse sense of security.

In the ideal world, we would be able to define a"perfectly" random sample, the most appropriate test and onedefinitive conclusion. We simply cannot. What we can do is try to optimise all stages of our research to minimisesources of uncertainty. When presenting P values some groups find it helpful touse the asterisk rating system as well as quoting the P value:

P < 0.05 *

P < 0.01 **

P < 0.001

Most authors refer to statistically significant as P < 0.05 and statistically highly significant as P< 0.001 (less than one in a thousand chance of being wrong).

The asterisk system avoids the woolly term"significant". Please note, however, that many statisticians do notlike the asterisk rating system when it is used without showing P values. As arule of thumb, if you can quote an exact P value then do. You might also wantto refer to a quoted exact P value as an asterisk in text narrative or tablesof contrasts elsewhere in a report.

At this point, a wordabout error. Type I erroris the false rejection of the null hypothesis and type II error is the false acceptance of the null hypothesis. Asan aid memoir: think that our cynical society rejects before it accepts.

The significance level (alpha) is theprobability of type I error. The power of a test is one minus the probabilityof type II error (beta). Power should be maximisedwhen selecting statistical methods. If you want to estimate sample sizes then you must understand all of theterms mentioned here.

The following table shows the relationshipbetween power and error in hypothesis testing:

	DECISION
TRUTH	Accept H₀	Reject H₀
H₀ is true	correct decision P	type I error P
	1-alpha	alpha (significance)
H₀ is false	type II error P	correct decision P
	Beta	1-beta (power)

H₀ = null hypothesis
P = probability

If you are interested in further details ofprobability and sampling theory at this point then please refer to one of thegeneral texts listed in the reference section.

You must understand confidence intervals if you intend to quote P values in reportsand papers. Statistical referees of scientific journals expect authors to quoteconfidenceintervals with greater prominence than P values.

Notes about Type I error:

isthe incorrect rejection of the null hypothesis

maximum probability is set in advance as alpha

isnot affected by sample size as it is set in advance

increaseswith the number of tests or end points (i.e. do 20 tests and 1 is likely to bewrongly significant)

Notes about Type II error:

isthe incorrect acceptance of the null hypothesis

probability is beta

beta depends upon sample size and alpha

can'tbe estimated except as a function of the true population effect

beta gets smaller as the sample size gets larger

beta gets smaller as the number of tests or end pointsincreases

Confidence Intervals

Statisticians stress the importance of usingconfidence intervals (CIs). There is, however, debateover which type of CIs to use and how to best defineand interpret them. In spite of this confusion, you should use CIs to express the results of statistical tests becausethey convey more information than P values alone.

StatsDirectdocumentation uses the common (see below) interpretation of CIs.The CI included with each StatsDirect function isdiscussed in the help text for that function. In order to understand how CIs relate to specific statistical methods, read theinterpretation of CI in the worked examples of StatsDirecthelp text.

The confidencelevel sets the boundaries of a confidence interval,this is conventionally set at 95% to coincide with the 5% convention ofstatistical significance in hypothesis testing. In some studies wider (e.g.90%) or narrower (e.g. 99%) confidence intervals will be required. This ratherdepends upon the nature of your study. You should consult a statistician beforeusing CI's other than 95%.

You will hear the terms confidence interval andconfidence limit used. The confidence interval is the range Q-X to Q+Y where Qis the value that is central to the study question, Q-X is he lower confidence limit and Q+Y is the upper confidence limit.

Familiarise yourselfwith alternative CI interpretations:

Common

A 95% CI is theinterval that you are 95% certain contains the true population value as itmight be estimated from a much larger study.

The value in question can be a mean, differencebetween two means, a proportion etc. The CI is usually, but not necessarily,symmetrical about this value.

Pure Bayesian

The Bayesian concept of a credible interval is sometimes putforward as a more practical concept than the confidence interval. For a 95%credible interval, the value of interest (e.g. size of treatment effect) lieswith a 95% probability in the interval. This interval is then open tosubjective moulding of interpretation. Furthermore,the credible interval can only correspond exactly to the confidence interval ifprior probability is so called "uninformative".

Pure frequentist

Most pure frequentistssay that it is not possible to make probability statements, such CIinterpretation, about the study values of interest in hypothesis tests.

Neymanian

A 95% CI is the interval which will contain thetrue value on 95% of occasions if a study were repeated many times usingsamples from the same population.

Neyman originatedthe concept of CI as follows: If we test a large number of different nullhypotheses at one critical level, say 5%, then we cancollect all of the rejected null hypotheses into one set. This set usuallyforms a continuous interval that can be derived mathematically and Neyman described the limits of this set as confidencelimits that bound a confidence interval. If the critical level (probability ofincorrectly rejecting the null hypothesis) is 5% then the interval is 95%. Anyvalues of the treatment effect that lie outside the confidence interval areregarded as "unreasonable" in terms of hypothesis testing at thecritical level.

Degrees of freedom

The concept of degrees of freedomis central to the principle of estimating statistics of populations fromsamples of them. "Degrees of freedom" is commonly abbreviated to df.

In short, think of df as a mathematical restrictionthat we need to put in place when we calculate an estimate one statisticfrom an estimate of another.

Let us take an example of datathat have been drawn at random from a normal distribution. Normal distributionsneed only two parameters (mean and standard deviation) for their definition;e.g. the standard normal distribution has a mean of 0 and standard deviation (sd) of 1. The population values ofmean and sd are referred toas mu and sigma respectively, and the sampleestimates are x-bar and s.

In order to estimate sigma, wemust first have estimated mu. Thus, mu is replaced by x-bar in the formula for sigma. In otherwords, we work with the deviations from mu estimatedby the deviations from x-bar. At this point, we need to apply the restrictionthat the deviations must sum to zero. Thus, degrees of freedom are n-1 in the equation for s below:

Standard deviation in a population is:

[x is a value from thepopulation,  is the mean of all x, n is the number of x in the population, is the summation]

The estimate of population standard deviationcalculated from a random sample is:

[x is an observationfrom the sample, x-bar is the sample mean, n is the sample size,  is thesummation]

When this principle of restrictionis applied to regression and analysis of variance, the general result is thatyou lose one degree of freedom for each parameter estimated prior to estimatingthe (residual) standard deviation.

Another way of thinking about therestriction principle behind degrees of freedom is to imagine contingencies.For example, imagine you have four numbers (a, b, c and d) that must add up toa total of m; you are free to choose the first three numbers at random, but thefourth must be chosen so that it makes the total equal to m - thus your degreeof freedom is three.

Epidemiology

Epidemiology is the study of the distribution and determinants ofhealth-related states and events in specified populations.

Last's Dictionary of Epidemiology (2000)

Epidemiologists use a richlanguage to describe how they apply statistical methods to the study ofpopulations in order to work out, for example, the causes of diseases.

If a population is exposed to somefactor, called the exposure, the Epidemiologists usually study the relationshipbetween the exposure and relevant heath outcomes, for example cigarette smokingand lung cancer.

A very important question thatEpidemiologists must ask themselves when thinking about a numerical associationbetween some exposure(s) and outcome(s) is "how might I be wrong".The answer is by:

Chance

Bias

Confounding

You should understand the basicconcepts of causality, chance,bias and confounding in order to start to work with epidemiological problems.You should also understand the basic principles of study design, for example prospective vs. retrospective studies.

There are several introductorytext books, either under a cover title of Epidemiology or Statistics (e.g. Bland 2000) and web sites.

Basics

Causality

Lots of things can be associatedwith outcomes that we wish to study but few of them aref1411.cn meaningful causes.

In Epidemiology, the followingcriteria due to Bradford-Hill are used as evidence to support a causalassociation:

1. Plausibility (known path)

2. Consistency (same results if repeat in differenttime, place person)

3. Temporal relationship

4. Strength (with or without a dose responserelationship)

5. Specificity (causal factor relates only tothe outcome in question - not often)

6. Change in risk factor (i.e. incidence drops ifrisk factor removed)

Elwood's criteria are a modernextension of this concept:

1. Descriptive evidence
exposure or intervention
design
population
main result

2. Non-causal explanation
chance
bias
confounding

3. Positive features
time
strength
dose-response
consistency
specificity

4. Generalisability
to eligible population
to source population
to other populations

5. Comparison with other evidence
consistency
specificity
plausibility and coherence

Downloada free 10 day StatsDirect trial

Bias

Bias is a systematic error thatleads to an incorrect estimate of effect or association. Many factors can biasthe results of a study such that they cancel out, reduce or amplify a realeffect you are trying to describe.

Epidemiology categorisestypes of bias, examples are:

Selectionbias - e.g. study of car ownership in central Londonis not representative of the UK

Observationbias (recall and information) - e.g. on questioning, healthy people are morelikely to under report their alcohol intake than people with a disease.

Observationbias (interviewer) - e.g. different interviewer styles might provoke differentresponses to the same question.

Observationbias (misclassification) - tends to dilute an effect

Lossesto follow up - e.g. ill people may not feel able to continue with a studywhereas health people tend to complete it.

Some strategies to combat bias:

multiple control groups

standardised observations (e.g. blinding (don't know ifplacebo or active intervention) of subject, observer, both subject and observer(double blind) or subject, observer and analyst (triple blind))

corroboration of multiple information sources

use of dummy variables with known associations

Confounding

In Epidemiology a confounder is:

notpart of the real association between exposure and disease

predictsdisease

unequallydistributed between exposure groups

A researcher can only control astudy or analysis for confounders that are:

known

measurable

Example: Grey hair predicts heart disease if itis put into a multiple regression model because it is unequally distributedbetween people who do have heart disease (the elderly) and those who don't (theyoung). Grey hair confounds thinking about heart disease because it is not a cause of heart disease.

Strategies to reduce confounding are:

randomisation (aim is randomdistribution of confounders between study groups)

restriction (restrict entry to study of individuals withconfounding factors - risks bias in itself)

matching (of individuals or groups, aim for equaldistribution of confounders)

stratification (confounders are distributed evenly withineach stratum)

adjustment (usually distorted by choice of standard)

multivariate analysis (only works if you can identify andmeasure the confounders)

Downloada free 10 day StatsDirect trial

Prospective vs. retrospective studies

Prospective

A prospective study watches for outcomes, suchas the development of a disease, during the study period and relates this toother factors such as suspected risk or protection factor(s). The study usuallyinvolves taking a cohort of subjects and watching them over a long period. Theoutcome of interest should be common; otherwise, the number of outcomesobserved will be too small to be statistically meaningful (indistinguishablefrom those that may have arisen by chance). All efforts should be made to avoidsources of bias such as the loss of individuals to follow up during the study.Prospective studies usually have fewer potential sources of bias andconfounding than retrospective studies.

Retrospective

A retrospective study looks backwards andexamines exposures to suspected risk or protection factors in relation to anoutcome that is established at the start of the study. Many valuablecase-control studies, such as Lane and Claypon's 1926investigation of risk factors for breast cancer, were retrospectiveinvestigations. Most sources of error due to confounding and bias are morecommon in retrospective studies than in prospective studies. For this reason,retrospective investigations are often criticised. Ifthe outcome of interest is uncommon, however, the size of prospectiveinvestigation required to estimate relative risk is often too large to befeasible. In retrospective studies the odds ratio provides an estimate ofrelative risk. You should take special care to avoid sources of biasand confounding in retrospective studies.

Prospective investigation is required to makeprecise estimates of either the incidence of an outcome or the relative risk ofan outcome based on exposure.

Case-Control studies

Case-Control studies are usually but notexclusively retrospective, the opposite is true forcohort studies. The following notes relate case-control to cohort studies:

outcome is measured before exposure

controls are selected on the basis of not having the outcome

good for rare outcomes

relatively inexpensive

smaller numbers required

quicker to complete

prone to selection bias

prone to recall/retrospective bias

relatedmethods are risk (retrospective), chi-square 2 by 2 test, Fisher's exact test, exact confidence interval for odds ratio, odds ratio meta-analysis and conditional logistic regression.

Cohort studies

Cohort studies are usually but not exclusively prospective, the opposite is true for case-control studies.The following notes relate cohort to case-control studies:

outcome is measured after exposure

yieldstrue incidence rates and relative risks

mayuncover unanticipated associations with outcome

best for common outcomes

expensive

requireslarge numbers

takesa long time to complete

prone to attrition bias (compensate by using person-time methods)

prone to the bias of change in methods over time

relatedmethods are risk (prospective), relative risk meta-analysis, risk difference meta-analysis and proportions

...

南方醫(yī)科大學醫(yī)學考試網(wǎng)