Structural equation modeling to identify the risk factors of diabetes in the adult population of North India

Background A non-communicable disease risk factor survey (based on World Health Organization STEP approach to Surveillance, i.e., WHO-STEPS) was done in the state of Punjab, India in a multistage stratified sample of 5127 individuals. The study subjects were administered the WHO STEPS questionnaire and also underwent anthropometric and biochemical measurements. This study aimed at exploring the risk factors of diabetes using a Structural Equation Modeling (SEM) approach in the North Indian state of Punjab. Results Overall prevalence of diabetes mellitus among the study participants was found out to be 8.3% (95% CI 7.3–9.4%). The final SEM had excellent fit considering the model parameters. The following risk factors deemed to have a direct statistically significant effect on blood sugar status: family history of diabetes (4.5), urban residence (3.1), triglycerides (0.46), increasing waist circumference (0.18), systolic blood pressure (0.11), and increasing age (0.05). There are specific indirect effects of alcohol use (1.43, p = 0.001), family h/o diabetes (0.844, p = 0.001), age (0.156, p < 0.001), waist circumference (0.028, p = < 0.001) and weekly fruit intake (− 0.009, p = 0.034) on fasting blood glucose. Indirect effects of waist circumference, alcohol intake and age on blood sugar levels are mediated by raised blood pressure. Waist circumference mediates the indirect effects of age, family h/o of diabetes, alcohol intake and weekly fruit intake on blood sugar levels. Triglycerides also mediated the indirect effects between age and diabetes. Conclusions Family history of diabetes, urban residence, alcohol use, increasing age, and waist circumference are the key variables affecting diabetes status in the Indian population. The results of this study further strengthens the evidence that lifestyle changes in the form of physical activity and healthy diet are required to prevent and control diabetes. Those with family h/o diabetes constitute a high risk group and should be targeted with regular screening and lifestyle intervention package.


Background
Globally, in 2017 around 425 million adults aged 20-79 years had diabetes mellitus, and this number is expected to rise to 629 million by 2045 [1]. More than three-fourth of the subjects with diabetes belongs to lowand middle-income countries (LMICs). In the South-East Asian region, current estimates indicate that 8.5% of the adult population has diabetes which is likely to increase to 11.1% by 2045. The region has the second highest number of deaths attributable to diabetes among the seven regions with 1.1 million deaths estimated in 2017 [1].
Diabetes is growing at an alarming pace in India due to the rise in various risk factors in both urban and rural areas [2]. India is home to more than 74 million people with diabetes which is second only to China [1]. The burden of diabetes mellitus continues to increase and is primarily driven by nutrition, lifestyle and demographic transitions, unhealthy dietary habits, and physical inactivity, in the context of a stronger genetic predisposition to diabetes [3]. Against this background, a better understanding of the changing epidemiology of diabetes in India is required.
Diabetes mellitus is a condition caused by the complex interplay of various factors simultaneously with some having direct and some having indirect (i.e., mediator) effects. Several risk factors have been associated with type 2 diabetes: age, obesity, high blood pressure, lipid abnormalities, family history of diabetes, unhealthy diet, and physical inactivity [1,4,5]. Most available literature used traditional regression models which allow us to treat each covariate in the model as an independent direct effect on diabetes. Very few studies have examined all these factors simultaneously as a network of multiple pathways leading to diabetes [6].
Structural equation modeling (SEM) or path analysis is a very powerful multivariate technique that enables measurement of both direct and indirect effects of variables and incorporate models with multiple dependent variables by using several regression equations simultaneously [7]. Thus, the present study employed SEM to test a hypothesized model of variables affecting diabetes status in the North Indian adult population using data from a large representative household survey.

Study design
This study analyses data from a large household survey conducted in the state of Punjab, India using WHO-STEPwise approach to surveillance (STEPS) approach [8].

Study setting
The survey was carried out in Punjab, which is a prosperous state in the northern part of India bordering Pakistan with a population of 27 million according to 2011 national census. It ranks higher than most other states in terms of Human Development Index with a per capita income twice that of the national average [9, 10].

Study sampling strategy
A state-wide non-communicable disease (NCD) risk factor survey based on WHO-STEPS approach was undertaken in Punjab in 2014-2015. The survey adopted a multistage stratified sampling approach using the 2011 census sampling frame. A three-stage design was employed in urban areas whereas in rural areas a two-stage sampling design was followed. A total of 100 primary sampling units (PSUs) were selected (60 villages from rural areas and 40 Census Enumeration Blocks from urban areas) by probability proportional to size (PPS) method. From each selected PSU, 54 households were selected using systematic random sampling. The ultimate sampling units were the households and one individual in the age group of 18-69 years residing in the selected household was selected using the KISH method. The details of the methodology can be found in another paper [11].

Data collection instrument
A culturally adapted, Punjabi (local language) translated and pre-tested version of the WHO STEPS questionnaire (version 3.1) was used with minor adaptations [12]. As part of the household survey, sociodemographic and behavioral information on tobacco and alcohol use, diet, physical activity, history of chronic diseases, family history of chronic conditions, health screening, and health care expenditure were collected in step 1. Physical measurements such as height, weight, blood pressure, and waist circumference were done in step 2. Biochemical tests were conducted to measure fasting blood glucose, total cholesterol, and triglycerides in step 3.

Data collection and operational definitions used
A team of trained investigators collected the survey data. SECA adult portable stadiometer was used to measure height after removing shoes, socks, slippers, and any head gear. It was measured in centimeters up to 0.1 cm. SECA digital weighing scale was used to measure weight of the individuals. The scale was regularly calibrated against a standard weight. The participants were asked to remove footwear and socks, and weight was recorded in kilograms up to 0.1 kg. Waist circumference was measured using a SECA constant tension tape to the nearest 0.1 cm at the level of the midpoint between the inferior margin of the last rib and the iliac crest in the mid-axillary plane. The measurement was taken at the end of a normal expiration with the arms relaxed at the sides.
One serving of vegetable was considered to be one cup of raw green leafy vegetables or 1/2 cup of other vegetables (cooked or chopped raw). One serving of fruit was considered to be one medium size piece of apple, banana, or orange; 1/2 cup of chopped, canned fruit; or 1/2 cup of fruit juice.
Physical activity was assessed using the Global Physical Activity Questionnaire (GPAQ), which has been developed by the World Health Organization. This questionnaire assesses physical activity behavior in three different domains: work, transport, and during leisure time. Activities are classified into three intensity levels: vigorous, moderate, and light based on the physical effort it requires. Participants were classified as sufficiently active who exceed the minimum duration of physical activity per week recommended by WHO, i.e., 150 min of moderate intensity physical activity or 75 min of vigorous intensity physical activity or an equivalent combination of moderate-and vigorous-intensity physical activity achieving at least 600 MET-minutes per week with each activity performed in bouts of at least 10-min duration [13]. Body mass index (BMI) was calculated as weight in kilograms/height in meters squared. Show cards (pictorial, adapted to the local context) were used to explain to the participants the type of physical activity, servings of fruits and vegetables, and salty food intake. Obesity was defined as a BMI ≥ 27.5 kg/m 2 for both genders (based on the World Health Organization Expert Consultation for Asian populations) [14]. Abdominal obesity was defined as a waist circumference ≥ 90 cm for men and ≥ 80 cm for women [15].
For blood pressure measurement, electronic equipment (OMRON HEM 7120, Omron Corporation, Kyoto, Japan) was used. After resting for 5 min, blood pressure was recorded in the sitting position in the right arm supported at the level of the heart. Three blood pressure measurements were taken at 3 min interval each. The final reading was recorded as the average of last two readings.
Biochemical measurements (step 3): every alternate individual (50%) of the initial sample was subjected to biochemical assessment. For blood glucose, dry chemistry method was used by blood glucose measurement device (Optium H, Freestyle). For lipid profile, i.e., cholesterol and triglycerides measurements, blood samples were drawn on individuals after 10-12 h fasting. 5 ml of venous blood was taken in sitting position, was centrifuged immediately to separate serum, and was transferred under cold chain condition to the Central Reference Laboratory of Department of Biochemistry, Post Graduate Institute of Medical Education and Research, Chandigarh, India which is a tertiary medical care institute.

Sample size
Taking the estimated prevalence of physical activity as 50%, 5% margin of error and 95% confidence interval, a design effect of 1.5, a sample size of 4609 was derived which was adequate to present results by two age groups (18-44, 45-69), both sexes (male, female) and residence (urban, rural). Assuming a response rate of 85%, sample size was raised to 5400 for this study. Every second individual was subjected to step 3, i.e., biochemical assessment. Out of 2700 respondents eligible for step 3, 2499 (93%) gave consent to blood sampling for biochemical assessment.

Statistical analysis
The conceptual a priori model that specifies the relations among variables operationalized in this study is based on the model proposed by Bardenheier et al [17]. We used structural equation modeling with path analysis, which includes the direct and indirect effects of factors previously reported to be associated with diabetes ( Fig. 1). Direct effects are depicted as an arrow originating from an independent variable (exposure) leading and pointing to a dependent variable (outcome). For example, see the arrow between waist circumference and systolic blood pressure. An indirect effect is not only depicted as a mediating variable having an arrow pointing to it from an independent variable but also pointing to yet another dependent variable. For example, waist circumference mediates the effect of alcohol intake on blood sugar levels. A confounder, according to the use of these arrows, is depicted as a variable with direct effects on both the exposure and the dependent variable.
In this study, we report standardized path coefficients, their standard errors and p values. As indices of the models' statistical fit to the data, we used standard criteria, including comparative fit index (CFI) > 0.90, root mean square error of approximation (RMSEA) < 0.08, and the standardized root mean square residual (SRMSR) < 0.06. Model building and estimation was done using STATA/IC version 12 (StataCorp LP, USA).

Variables assessed in SEM
We selected the sociodemographic, behavioral, anthropometric, and metabolic variables to be included in our SEM based on a literature review of previous theoretical models of diabetes. We assessed 13 variables including age (in years), sex, residence (rural and urban), highest level of education (no formal schooling, up to primary schooling, up to secondary schooling, up to higher secondary, graduate, and postgraduate degree), marital status (never married, currently married, divorced/separated/widowed), current smoking status (was defined as positive if the subject smoked in the last 30 days), current alcohol status (was defined as positive if the subject consumed alcohol in the last 365 days), BMI in kg/m 2 , family history of diabetes (yes or no), waist circumference (in centimeters), systolic blood pressure (in mm of Hg), fasting blood glucose (in mg/dl), triglycerides (in mg/dl), total cholesterol (in mg/dl), physical activity (self-reported hours of vigorous-intensity work and sports activities/recreation and hours of walking/ cycling), and weekly fruit/vegetable intake (number of servings of fruits/vegetable). We have reported pathways only for statistically significant standardized path coefficients at p < 0.05 level.

Ethics approval
The Institute Ethics Committee of Post Graduate Institute of Medical Education and Research, Chandigarh approved the study (reference number P-727, dated 21 July 2014). Informed written consent was taken from all participants.

SEM model
The final model had excellent fit, and the fully standardized path coefficients are presented in Fig. 1. Non-significant paths were removed and a few additional paths were added to improve model fit. Specifically, education, marital status, BMI, physical activity, smoking status, and serum cholesterol were dropped from the model for better fit. Table 2 shows the coefficients and standard error of the direct and indirect effects of variables on diabetes status. The goodness-of-fit statistics of the model are shown in Table 3.
Waist circumference, alcohol intake, and increasing age impact blood sugar levels mediated by raised blood pressure. Waist circumference mediates the indirect effects of age, family h/o of diabetes, alcohol intake, and weekly fruit intake on blood sugar levels. Triglycerides also mediate the indirect effect of age on diabetes status (Fig. 2).

Discussion
This is the first study in India to use SEM to analyze non-modifiable and modifiable sociodemographic, behavioral, and metabolic determinants of diabetes in India. The following risk factors were found to have a direct statistically significant effect on blood sugar status: family history of diabetes, urban residence, increasing age and waist circumference, blood pressure, and triglycerides. This model also found that there are specific indirect effects of waist circumference, alcohol use, weekly fruit intake, age, and family h/o diabetes on fasting blood glucose.
SEM is a second generation multivariate method that allows causal modeling of diseases with a complex interplay of several factors resulting in measurement of direct The prevalence of diabetes in the study was found to be 8.3% which is slightly lower than the figure in another recent large nationally representative survey (INDIAB study) which reported prevalence to be 10% in the same state of India [5]. Place of residence has a direct association with diabetes status supported by other studies which show that the prevalence of diabetes continues to be higher in urban areas than in rural areas [5,18,19], although some studies show no urban-rural gap [4].
Overall, family h/o diabetes had the strongest effect on the risk of the disease, both direct and indirect. This finding is well supported by the results of the INDIAB study in both rural and urban areas [5]. Family h/o diabetes also indirectly affected diabetes status through raised waist circumference. Thus, those with family h/o diabetes constitute a high risk group requiring a package of services which includes regular screening for blood    glucose and a lifestyle behavioral intervention package in terms of healthy diet and physical activity. Raised triglyceride level is a common dyslipidemic feature accompanying type 2 diabetes. It is one of five accepted criteria for defining individuals at high risk for cardio-metabolic diseases including 2 diabetes, termed the "metabolic syndrome" [16,20,21]. The present study, besides an independent effect, also showed that triglyceride levels mediate the effect of age on diabetes as triglyceride levels increase with age [22].
Body fat distribution (excess abdominal fat) is associated with an increased risk of cardio-metabolic diseases such as diabetes, hypertension, dyslipidemia, and coronary heart disease. Waist circumference is often   [23,24]. The present study showed that waist circumference not only had a direct effect on diabetes but also mediated the indirect effects of age, family h/o of diabetes, alcohol intake, and weekly fruit intake on blood sugar levels, thereby strengthening the evidence. The study also adds to the evidence base which states that waist circumference is a stronger predictor of diabetes than BMI, apart from other cardio-metabolic diseases [24]. It provides a unique indicator of body fat distribution, which can identify patients who are at increased risk for diabetes and other obesity-related cardio-metabolic disease, above and beyond the measurement of BMI. Waist circumference will help clinicians determine which patients should be evaluated for the presence of cardio-metabolic risk factors, such as dyslipidemia, raised blood pressure and hyperglycemia and also response to lifestyle behavioral practices. Thus, we recommend that measuring waist circumference measurement should be part of the routine clinical care guidelines. Although previous studies in India did not find alcohol as an independent risk factor for diabetes [4,5], the present study showed that the effect of alcohol intake on diabetes was mediated through waist circumference and blood pressure. According to the existing evidence in medical literature, alcohol intake may be a risk factor for obesity or rise in waist circumference [25][26][27], though contradictory findings do exist [28][29][30]. Also, several possible mechanisms have been proposed to establish the association between alcohol consumption and hypertension which are described elsewhere [31].

Strengths and limitations
The strengths of the study include a large multistage stratified community-based sample, high participant response rate, use of standardized STEPS questionnaire, and adherence to STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines for reporting the findings of the study [32]. The study also employed a robust statistical technique, i.e., SEM which allows measurement of both direct and indirect effects of variables and incorporate models with multiple dependent variables by using several regression equations simultaneously.
There were a few limitations in this study. First, the main outcome measure, i.e., fasting blood glucose which defined diabetes status was assessed at one single point in time when in reality it can naturally fluctuate. Second, capillary blood glucose estimation was done in place of the ideal venous plasma glucose estimation due to logistic constraints such as non-availability of qualitycontrolled laboratories, storage and transport of blood specimens, varied methods of estimation, and poor compliance to venous blood collection due to its invasive nature. However, it has been reported that capillary blood collection is a feasible and valid alternative to venous blood collection for screening in large epidemiological studies [33,34]. Third, the cross-sectional nature of the survey prevents us from making causal inferences about the outcome.

Conclusions
Family history of diabetes, urban residence, alcohol use, increasing age, and waist circumference are the key variables affecting diabetes status in the Indian population. The results of this study further strengthens the evidence that lifestyle changes in the form of physical activity and healthy diet are required to prevent and control diabetes. Those with family h/o diabetes constitute a high risk group and should be targeted with a regular screening and lifestyle intervention package.