Essentials of Biostatistics. 12. Multiple measurements and their simultaneous consideration

Indrayan, A. ; Satyanarayana, L. (2001) Essentials of Biostatistics. 12. Multiple measurements and their simultaneous consideration Indian Pediatrics, 38 . pp. 741-756. ISSN 0019-6061

Full text not available from this repository.

Official URL: http://www.indianpediatrics.net/july2001/july-741-...

Abstract

An example of multiple measurements such as body fat, weight, height and triceps skinfold thickness in children was discussed in the context of multiple regression setup in the previous Article (1) of this series. That relationship was between one dependent variable (y) and a set of independent variables (xs). How to proceed if there were more than one dependent variable? For example, thyroid function is evaluated by simultaneous consideration of T3, T4 and TSH. These might be related to age, diet, exercise, stress, etc. Here we have a set of three dependent variables - the three thyroid parameters. The classical multiple regression discussed earlier has only one dependent variable. Simulta-neous consideration of many dependent variables requires multivariate methods. Let us further clarify the distinction between a multivariate setup and a univariate setup. In the univariate multiple regression situation also the number of variables or measurements is more than one. However, only one is dependent and the others are independent or regressors. If we are able to find how pulse rate depends on body temperature and diastolic blood pressure (DBP) level in children with pyrexia of unknown origin using a multiple regression, then the question it is supposed to answer is: What pulse rate is expected in a child with temperature 101F and DBP 60 mmHg? Thus the regressors are considered fixed and known in this situation. Only the response y, which is pulse rate in our example, is considered to be subject to sampling fluctuation. Although regression can be used in a cross-sectional study where both y and xs are simultaneously observed (and thus both are subject to fluctuation), it is interpreted as if xs were fixed. Since only one variable y is considered stochastic(2), the regression of the previous Article is essentially a univariate technique though the data are multivariate. For a genuine multivariate setup, it is essential that there are several stochastic variables. The second essential requirement for a valid multivariate setup is that these stochastic variables are interrelated. Physical growth of infants is assessed by measurements that include weight (y1), length (y2), and head circumference (y3). These are interrelated. Maternal weight (x1), maternal height (x2), breastfeeding (x3), infec-tions (x4) could be among the determinants of infant growth. If the ys are not interrelated, analyses using univariate technique for weight, length, and head circumference can be separately done. But conclusions so arrived at would be valid separately for weight, height, and head circumference but not jointly for growth. The reasons for this are: (a) every univariate conclusion is subject to defined chance of Type I error such as 0.05 when based on statistical test of hypothesis. Individual conclusions combined together would have much higher chance of error than specified threshold 0.05. This situation is same as that of multiple comparison discussed earlier (3); (b) individual univariate analyses ignore the correlated nature of the measure-ments in case they are correlated. Special methods are required that ensure that the total probability of Type I error remains within the limit and due consideration is given to the correlation structure. Multivariate methods take care of both these problems. Multivariate analysis is an intricate process. We are trying to explain it in simple terms. Our objective in this Article is only to apprise you of the situations where these methods could and should be used. The kind of conclusions that can be reached by such methods is discussed. This could help you to consult a statistician when needed. The statistical tests of significance in a multivariate setup are based on a criterion such as Wilks' L or Pillai's trace. These are analogous to the F-test in a univariate setup. The details of the multivariate test criteria are not given in this Article. Computer packages are available for analysis. It is important that the correct method is used for the problem in hand. Computer packages still are not given that kind of intelligence. Thus, a proper discretion is required while using computer packages. Consider a multivariate situation such as thyroid functions (T3, T4 and TSH) being investigated for their dependence on age, protein intake and body mass index. The objective is to find the form of dependence. A type of analysis called multivariate multiple regression is needed for this setup in which there is a set of dependent quantitative variables and a set of independent quantitative variables. In another situation if dependents are quantitative and independents are quali-tative then the method used is multivariate analysis of variance (MANOVA). Investiga-tion of dependence of thyroid functions on gender and degree of malnutrition is an example of MANOVA setup. As mentioned later, this really amounts to finding that the average values of different thyroid functions is same in the two genders and in different grades of malnutrition. These two techniques are discussed in Section 12.1. When the set of dependents is qualitative then the technique called multivariate logistic regression is used. We do not discuss this technique in this Article. Consider another situation. Suppose different measurements of maternal nutrition are available for different clinical groups of nutrition status in newborn children. A rule is required for discriminating among the groups so that any newborn could be assigned to the most appropriate malnutrition or healthy group on the basis of maternal nutrition measurements. This type of setup is called discriminant analysis. This is discussed in Section 12.2. Another multivariate situation arises when there is no distinction such as dependent or independent among measurements. For example, this can arise when the subjects are to be divided into clinical entities on the basis of signs-symptoms and measurements. The interest is to search for a correlation structure among subjects or among variables that can explain the observations. Such a problem can be addressed by the techniques of cluster analysis and factor analysis. These are discussed in Section 12.3. Some investigations involve long series of measurements made at successive points of time. For example, monthly pediatric admis-sions of various diseases such as poliomyelitis, diarrhea, respiratory infection, etc., constitute a long series for a period, say, of last 72 months. These longitudinal measurements also correlate with one-another. Analysis involving such measurements is called time series analysis. This is discussed in Section 12.4. In Section 12.5 we give some concluding remarks for this series that ends with this Article. You might be able to note after reading this Article that multivariate methods are not easy to adopt nor easy to interpret. That explains their limited use. Another limitation of these methods is that they require intricate calculations. It is sometimes a challenge to choose a correct computer programme. We advise you to use these methods only in consultation with an expert biostatistician.

Item Type:	Article
Source:	Copyright of this article belongs to Indian Academy of Pediatrics.
ID Code:	73514
Deposited On:	06 Dec 2011 05:20
Last Modified:	06 Dec 2011 05:20

Repository Staff Only: item control page

PlumX Metrics