Analysis of Qualitative Data 

Qualitative data are data in the form of words. Examples of qualitative data are interview notes, transcripts of focus groups, answers to open-ended questions, transcriptions of vide, recordings, accounts of experiences with a product on the Internet, news articles, and the like Qualitative data can come from a wide variety of primary sources and/or secondary sources, such as individuals, focus groups, company records, government publications, and the Internet. The analysis of qualitative data is aimed at making valid inferences from the often overwhelming amount of collected data.

Earlier in this book we explained that you can search the Internet for books, journals articles, conference proceedings, company publications, and the like. However, the Internet is more than a mere source of document; it is also a rich source of textual information for qualitative research. For instance, there are many social networks on the Internet structured around products and services such as computer games, mobile telephones, movies, books, and music. Through an analysis of these social networks researchers may learn a lot about the needs of consumers, about the amount of time consumers spend in group communication, or about the social network that underlies the virtual community. In this way, social networks on the Internet may provide researchers and marketing and business strategists with valuable, strategic information.

The possibilities for qualitative research on the Internet are unlimited, as the following example illustrates. In an effort to find out what motivates consumers to construct protest websites, Ward and Ostrom (2006) examined and analyzed protest websites. A content analysis revealed that consumers construct complaint websites to demonstrate their power, to influence others, and to gain revenge on the organization that betrayed them. This example illustrates how the Internet can be a valuable source of rich, authentic qualitative information. With increasing usage of the Internet, it will undoubtedly become even more important as a source of qualitative and quantitative information.

Qualitative research may involve repeated sampling, collection of data, and analysis of data. As a result, qualitative data analysis may start after only some of the data have been collected. The analysis of qualitative data is not easy. The problem is that, in comparison with quantitative data analysis, there are relatively few well-established and commonly accepted rules and guidelines for analyzing qualitative data. Over the years, however, some general approaches for the analysis of qualitative data have been developed. The approach discussed in this chapter is largely based on work of Miles and Huberman (1994). According to Miles and Huberman, there are generally three steps in qualitative data analysis data reduction, data display, and the drawing of conclusions.

The first step in qualitative data analysis is concerned with data reduction. Data reduction refers to the process of selecting, coding, and categorizing the data. Data display refers to ways of presenting the data. A selection of quotes, a matrix, a graph, or a chart illustrating patterns in the data may help the researcher (and eventually the reader) to understand the da ta. In this way, data displays may help you to draw conclusions based on patterns in the reduced set of data.

Note that qualitative data analysis is not a step-by-step, linear process. Instead, data coding may help you simultaneously to develop ideas on how the data may be displayed, as well as to draw some preliminary conclusions. In turn, preliminary conclusions may feed back into the way the raw data are coded, categorized, and displayed.

This chapter will discuss data reduction, data display, and drawing and verifying conclusions in some detail. To illustrate these steps in qualitative data analysis, we will introduce a case. We will use the case, by means of boxes throughout the chapter, to illustrate key parts of the qualitative research process.

Data Reduction

Qualitative data collection produces large amounts of data. The first step in data analysis is therefore the reduction of data through coding and categorization. Coding is the analytic process through which the qualitative data that you have gathered are reduced, rearranged, and integrated to form theory. The purpose of coding is to help you to draw meaningful conclusions about the data. Codes are labels given to units of text which are later grouped and turned into categories. Coding is often an iterative process you may have to return to your data repeatedly to increase your understanding of the data (that is, to be able to recognize Patterns in the data, to discover connections between the data, and to organize the data into coherent categories).

Coding begins with selecting the coding unit. Indeed, qualitative data can be analyzed at many levels. Examples of coding unit include words, sentences, paragraphs, and themes. The smallest unit that is generally used is the word. A larger, and often more useful, unit of content analysis is the theme “a single assertion about a subject” (Kassarjian, 1977, p. 12). When you are using the 1990). Thus, you might assign a code to a text unit of any size, as long as that unit of text represents a single theme or issue.

Data analysis

Unit of analysis. Since the term “critical incident” can refer to either the overall story of a participant or to discrete behaviors contained within this story, the first step in data analysis is to determine the appropriate unit of analysis (Kassarjian, 1977). In this study, critical behavior was chosen as the unit of analysis. For this reason, 600 critical incidents were coded into 886 critical behaviors. For instance, a critical incident in which a service provider does not provide prompt service and treats a customer in a rude manner was coded as containing two critical behaviors (“unresponsiveness” and “insulting behavior”).

Multivariate Tests and Analyses

We will now briefly describe three multivariate technique-multivariate analysis of variance (MANOVA), discriminant analysis, and canonical correlations. We will also describe, in brief, some of the other multivariate techniques, such as factor analysis, cluster analysis, and multidimensional scaling. Multivariate analyses examine several variables and their relationships simultaneously, in contrast to bivariate analyses which examine relationships between two variables, and univariate analyses where one variable at a lime is examined for generalization from the sample to the population. The multivariate techniques are now presented superficially to enable you to have some idea of their use.

MANOVA is similar to ANOVA, with the difference that ANOVA tests the mean differences of more than two groups on one dependent variable, whereas MANOVA tests mean differences among groups across several dependent variables simultaneously, by using sums of squares and cross-product matrices. Just as multiple 1-tests would bias the results (as explained earlier), multiple ANOVA tests, and using one dependent variable at a time, would also bias the results, since the dependent variables are likely to be interrelated. MANOV\ circumvents [his bias by simultaneously testing alt the dependent variables, cancelling out the effects of any inter-correlations among them.

In MANOVA tests, the independent variable is measured on a nominal scale and [he dependent variables on an interval or ratio scale.

The null hypothesis tested by MANOVA is:

The alternate hypothesis is:

            Discriminant analysis helps to identify [he independent variables chat discriminate a nominally scaled dependent variable of interest-say those who are high on a variable from those who are low on it. The linear combination of independent variables indicates the discriminating function showing the large difference that exists in the two group means. In other words, the independent variables measured on an interval or ratio scale discriminate the groups of interest of the study.

Canonical correlation examines the relationship between two or more dependent variables and several independent variables for example, the correlation between a set of job behaviors (such as engrossment in work, timely completion of work, and number of absences) and their influence on a set of performance factors (such as quality of work, [he output, and rate of rejects). The focus here is on delineating the Job behavior profiles associated with performance that result in high-quality production.

Other types of statistical analyses such as factor analysis, cluster analysis and multidimensional scaling help us to understand how the variables under study form a pattern or structure, in contrast to focusing on predicting the dependent variable or tracing relationships.

Factor analysis helps to reduce a vast number of variables (for example, all the questions tapping several variables of interest in a questionnaire) to a meaningful, interpretable, and manageable set of factors. A principal-component analysis transforms all the variables into a set of composite variables that are not correlated to one another. Suppose we have measured in a questionnaire the four concepts of mental health, job satisfaction, life satisfaction, and job involvement, with 7 questions tapping each. When we factor analyze these 28 items, we should find four factors with the right variables loading on each factor, confirming that we have measured the concepts correctly.

Cluster analysis is used to classify objects or individuals into mutually exclusive and collectively exhaustive groups with high homogeneity within clusters and low homogeneity between clusters. In other words, cluster analysis helps to identify objects that are similar to one another, based on some specified criterion. For instance, if our sample consists of a mix of respondents with different brand preferences for a product, cluster analysis will cluster individuals by their preferences for each of the different brands.

Multidimensional scaling groups objects in multidimensional space. Objects that are perceived by respondents to be different are distanced, and the greater the perceptual differences, the greater the distance between the objects in the multidimensional space. In other words, multidimensional scaling provides a spatial portrayal of respondents’ perception of products, services, or other items of interest, and highlights the perceived similarities and differences.

In sum, multivariate techniques such as MANOVA, discriminant analysis, and canonical correlation help us to analyze the influence of independent variables on the dependent variable in different ways. Other multivariate techniques such as factor analysis, cluster analysis, and multidimensional scaling offer meaningful insights into the data set by forming patterns of the data in one form or the other.

It is advantageous that several univariate, bivariate, and multivariate techniques are available to analyze sample data, so we can generalize the results obtained from the sample to the population at large. It is, however, important to pay attention to what each hypothesis is, and use the correct statistical technique to test it, rather than apply advanced inappropriate techniques.

 

 Analysis of Quantitative Data

Two-way ANOVA

Two-way ANOVA can be used to examine the effect of two nonmetric independent variables on a single metric dependent variable. Note that, in this context, an independent variable is often referred to as a factor and this is why a design that aims to examine the effect of two nonmetric independent variables on a single metric dependent variable is often called a factorial design. The factorial design is very popular in the social sciences. Two-way ANOVA enables us to examine main effects (the effects of the independent variables on the dependent variable) but also interaction effects that exist between the independent variables (or factors). An interaction effect exists when the effect of one independent variable (or one factor) on the dependent variable depends on the level of the other independent variable (factor).

MANOVA

MANOVA is similar to ANOVA, with the difference that ANOVA tests the mean differences of more than two groups on one dependent variable, whereas MANOVA tests mean differences among groups across several dependent variables simultaneously, by using sums of squares and cross-product matrices. Just as multiple t-tests would bias the results (as explained earlier), multiple ANOVA tests, using one dependent variable at a time, would also bias the results, since the dependent variables are likely to be interrelated. MANOVA circumvents this bias by simultaneously testing all the dependent variables, cancelling out the effects of any inter-correlations among them.

In MANOVA tests, the independent variable is measured on a nominal scale and the dependent variables on an interval or ratio scale.

 

Descriptive Statistics

Descriptive statistics involve transformation of raw data into a form that would provide information to describe a set of factors in a situation. This is done through ordering and manipulation of the raw data collected. Descriptive statistics are provided by frequencies, measures of central tendency, and dispersion. These are now described.

Frequencies

Frequencies simply refer to the number of times various subcategories of a certain phenomenon occurs, from which the percentage and the cumulative percentage of their occurrence can be easily calculated. An example will make this cleat. Let us say the president of a company wants to know how many African Americans, Hispanics, Asians, Whites, and “others”(subcategories of the phenomenon “employees”) are on its payroll. A frequency count of these distinct subcategories of employees would provide the answer and might look some- thing like the figures in Table M 1.

The president now knows that there are 8 African Americans, 2 Hispanics, 6 Asians, 182 whites, and 2 Native Americans (others) in the company. He also has the percentages and cumulative percentages for each category. This information can also be presented in the form of a histogram or a bar chart. If the president desires to have at least 10% African Americans without increasing the total number of employees, then at a minimum, 12 more African Americans have to be recruited, and a decision has to be made as to which 12 of the other employees should have their services terminated.

Other instances where frequency distributions would be useful are when (1) a marketing manager wants [o know how many units (and what proportions or percentages) of each brand of coffee are sold in a particular region during a given period, (2) a tax consultant desires to keep count of the number of times different sizes of firms (small, medium, large) are audited by the IRS, and (3) the financial analyst wants to keep track of the number of times the shares of manufacturing, industrial, and utility companies loss or gain more than 10 points on the New York Stock Exchange over a 6-month period.

In all the foregoing cases, it may be noted that we desire to obtain the frequencies on a nominally scaled variable. That is, these variables will be grouped into various nonoverlapping subcategories, such as the different brands of coffee, sizes of firms, and types of companies. The number of occurrences under each category and their respective percentages will then be determined. In management research, frequencies are generally obtained for nominal variables such as gender and educational level.

 

Measures of Central Tendencies and Dispersion

It is often useful to describe a series of observations in a data set parsimoniously, and in a meaningful way, which would enable individuals to get an idea of, or “a feel” for the basic characteristics of the data. Measures of central tendencies and dispersions enable us [o achieve this goal. There are three measures of central tendencies the mean, the median, and the mode. Measures of dispersion include the range, the standard deviation, and the variance (where the mea- sure of central tendency is the mean), and the interquartile range (where the measure of central tendency is the median).

Measures of Central Tendency

The Mean. The mean or the average is a measure of central tendency that offers a general picture of the data without unnecessarily inundating one with each of the observations in a data set. For example, the production department might keep detailed records on how many units of a product are being produced each day. However, to estimate the raw materials inventory, all that the manager might want to know is how many units per month, on an average, the department has been producing over the past 6 months. This measure of central tendency, that is, the mean, might offer the manager a good idea of the quantity of materials that need to be stocked.

Likewise, a marketing manager might want to know how many cans of soup are being sold, on an average, each week, or a banker might be interested in the number of new accounts that are opened each month, on an average. The mean or average of a set of say, 10 observations, is the sum of the 10 individual observations divided by 10 (the total number of observations).

The Median. The median is the central item in a group of observations when they are arrayed in either an ascending or a descending order. Let us take an example to examine how the median is determined as a measure of central tendency. Let us say the annual salaries of nine employees in a department are $65, 000, $30, 000, $25, 000, $64, 000, $35, 000, $63, 000, $32, 000, $60, 000, and $61, 000. The mean salary here works out to be about $48, 333, but the median is $60, 000. That is, when arrayed in the ascending order, the figures will be as follows : $25, 000, $30, 000, $32, 000, $35, 000, $60, 000, $61, 000, $63, 000, $64, 000, $65, 000, and the figure in the middle is $60, 000. If there are an even number of employees, then the median will be the average of the middle two salaries.

The Mode. In some cases, a set of observations would not lend itself to a meaningful representation through either the mean or the median, but can be signified by the most frequently occurring phenomenon. For instance, in a department where there are 10 White women, 24 White men, 3 African American women, and 2 Asian women, the most frequently occurring group-the mode-is the white men. Neither a mean nor a median is calculable or applicable in this case. There is also no way of indicating any measure of dispersion.

As is evident from the above, nominal data lend themselves to description only by the mode as a measure of central tendency. It is possible that a data set could contain bimodal observations. For example, using the foregoing scenario, there could also be 24 Asian men who are specially recruited for a project. Then we have two modes, the White men and the Asian men.

We have illustrated how the mean, median, and the mode can be useful measures of central tendencies, based on the type of data we have. We will now examine dispersions.

 

 

Inferential Statistics

Thus far, we have discussed descriptive statistics. Many times, however, we would be interested in inferential statistics. That is, we might be interested to know or infer from the data through analysis (1) the relationship between two variables (e. g., between advertisement and sales), (2) differences in a variable among different subgroups (e. g., whether women or men buy more of a product), (3) how several independent variables might explain the variance in a dependent variable (e. g., how investments in the stock market are influenced by the level of unemployment, perceptions of the economy, disposable incomes, and dividend expectations). We will now discuss some of these inferential statistics.

Correlations

In a research project that includes several variables, beyond knowing the means and standard deviations of the dependent and independent variables, we would often like to know how one variable is related to another. That is, we would like to see the nature, direction, and significance of the bivariate relationships of the variables used in the study (that is, the relationship between any two variables among the variables tapped in the study). A Pearson correlation matrix will pm vide this information, that is, it will indicate the direction, strength, and significance of the bivariate relationships of all the variables in the study.

The correlation is derived by assessing the variations in one variable as another variable also varies. For the sake of simplicity, let us say we have collected data on two variable price and sale for two different products. The volume of sales at every price level can be plotted for each product, as shown in the scatter diagrams in Figure M 2a and M 2b

Explanatory Statistics

Explanatory statistics is also called inferential statistics or statistical induction and deals with inferences about the population from the characteristics of a random sample, i.e., with making (probability) statements about usually unknown parameters of a population. Before we talk variables we should to give example the of explanatory variables. An explanatory variable is a type of independent variable. The two terms are often used interchangeably. But there is a subtle difference between the two. When a variable is independent, it is not affected at all by any other variables. When a variable isn’t independent for certain, it’s an explanatory variable.

The line between independent variables and explanatory variables is usually so unimportant that no one ever bothers. That’s unless you’re doing some advanced research involving lots of variables that can interact with each other. It can be very important in clinical research. For most cases, especially in statistics, the two terms are basically the same. In simple way we can say that Predictive modelling is all about “what is likely to happen?”, whereas explanatory modelling is all about “what can we do about it?”

 

Rating Scales; Scaling, Reliability, Validity

Now that we have learned how to operationally define (or operationalize) dimensions and elements of a variable, we need to measure them in some manner. We will examine in this chapter the types of scales that can be applied to measure different variables and subsequently see how we actually apply them. There are two main categories of attitudinal scales (not to be confused with the four different types of scales, discussed first in this chapter) – the rating scale and the ranking scale. Rating scales have several response categories and are used to elicit responses with regard to the object, event, or person studied. Ranking scales, on the other hand, make comparisons between or among objects, events, or persons and elicit the preferred choices and ranking among them. Both scales are discussed below.

Scale yolles 4 sekaram 7

 

A scale is a tool or mechanism by which individuals are distinguished as to how they differ from one another on the variables of interest to our study. The scale or tool may be a gross one in the sense that it only broadly categorizes individuals on certain variables, or it may be a fine-tuned tool that differentiates individuals on the variables with varying degrees of sophistication. There are four basic types of scales nominal, ordinal, interval, and ratio.

Nominal scale

 

A nominal scale is one that allow the researcher to assign subjects to certain categories or respect to the variable of gender, respondents can be grouped into female. These two groups can be assigned code numbers 1 and 2. These number serve as simple and convenient category labels with no intrinsic value, other than to assign respondents to one of two nonoverlapping, or mutually exclusive, categories Note that the categories are also collectively exhaustive. In other words, there is no third category into which respondents would normally fall. Thus, nominal scales categorize individuals or objects into mutually exclusive and collectively exhaustive groups. Other than this marginal information, such scaling tells us nothing more about the two groups. Thus the nominal scale give some basic, categorical, gross information.

Ordinal scale

 

An ordinal scale not only categorizes the variables in such a way to denote differences among the various categories, it is also rank-orders the categories in some meaningful way. With any variable for which the categories are to be ordered according to some preference, the ordinal scale would be used. The preference would be ranked (e.g., from best to worst; first to last) and numbered 1, 2 and so on. For example, respondents might be asked to indicate their preferences by ranking the importance they attach to five distinct characteristics in a job that the researcher might be interested in studying.

We can now see that the ordinal scale provides more information than the nominal scale. The ordinal scale goes beyond differentiating the categories to providing information on how respondents distinguish them by ranking-ordering them.

Interval scale

 

An interval scale allows us to perform certain arithmetical operations on the data collected from the respondents. Whereas the nominal scale allows us only to qualitatively distinguish groups by categorizing them into mutually exclusive and collectively exhaustive sets, and the ordinal scale to rank-order the preferences, the interval scale lets us measure the between any two points on the scale. This helps us to compute the means and the deviations of the responses on the variables. In other words, the interval scale not only groups individuals according to certain categories and taps the order of these groups, it also measure the magnitude of the differences in the preference among the individuals.

Ratio scale

 

The ratio scale overcomes the disadvantage of the arbitrary origin point of the interval scale, in that it has in absolute (in contrast to an arbitrary) zero point, which is a meaningful measurement point. Thus, the ratio scale not only measures the magnitude of the differences between points on the scale but also taps the proportions in the differences. It is the most powerful of the four scales because it has a unique zero origin (not an arbitrary origin) and subsumes all the properties of the other three scales. The weighing balance is a good example of a ratio scale. It has an absolute (and not arbitrary) zero origin calibrated on it, which allows us to calculate the ratio of the weights of two individual.

Figure 14 Properties of four scales

Validity

We will examine the terms internal validity and external validity in the context of experimental designs. That is, we will be concerned about the issue of the authenticity of the cause-and-effect relationships (internal validity), and their generalizability to the external environment (external validity). For now, we are going to examine the validity of the measuring instrument itself. That is, when we ask a set of questions (I. e., develop a measuring instrument) with the hope that we are tapping the concept, how can we be reasonably certain that we are indeed measuring the concept we set out to measure and not something else? This can be determined by applying certain validity tests.

Several types of validity test are used to test the goodness of measures and writers use different terms to denote them. For the sake of clarity, we may group validity tests under three broad headings content validity, criterion-related validity, and construct validity.

Reliability

The reliability of a measure indicates the extent to which it is without bias (error free) and hence ensures consistent measurement across time and across the various items in the instrument. In other words, the reliability of a measure is ail indication of the stability and consistency with which the instrument measures the concept and helps to assess the “goodness” of a measure.

Stability of measures

The ability of a measure to remain the same over time-despite uncontrollable testing conditions or the state of the respondents themselves-is indicative of its stability and low vulnerability to changes in the situation. This attests to its “goodness” because the concept is stably measured, 11o matter when it is done. Two tests of stability are test-retest reliability and parallel-form reliability.

Test-retest reliability

The reliability coefficient obtained by repetition of the same measure on a second occasion is called the test-retest reliability. That is, when a questionnaire containing some items that are supposed to measure a concept is administered to a set of respondents now, and again to the same respondents, say several weeks to six months later, then the correlation between the scores obtained at the two different times from one and the same set of respondents is called the test-retest coefficient. The higher it is, the better the test-retest reliability and, consequently, the stability of the measure across time.

Parallel-form reliability

When responses on two comparable sets of measures tapping the same construct are highly correlated, we have parallel-form reliability. Both forms have similar items and the same response format, the only changes being the wording and the order or sequence of the questions. What we try to establish here is the error variability resulting from wording and ordering of the questions. If two such comparable forms are highly correlated (say 8 and above), we may be fairly certain that the measures are reasonably reliable, with minimal error variance caused by wording, ordering, or other factors.

Statistical Terms and Tests

In research, we seek scientific data, which on analysis, provide answers to the research questions. Data refer to the available raw information gathered through interviews, questionnaires, observations, or secondary databases. By organizing the data in some fashion, analyzing them, and making sense of the results, we find the answers we seek.

In most organizational research, at the very minimum, it is of interest to know how frequently certain phenomena occur (frequencies), and the mean or average score of a set of data collected, as well as the extent of variability in the set (I. e., the central tendencies and dispersions of the dependent and independent variables). These are known as descriptive statistics (statistics that describe the phenomena of interest). Beyond this, we might want to know how variables relate to one another, whether there are any differences between two or more groups, and the like. These are called inferential statistics (I. e., statistical results that let us draw inferences from a sample to the population, as discussed in Chapter 11). Inferential statistics can be categorized as parametric or nonparametric. The use of parametric statistics is based on the assumption that the population from which the sample is drawn is normally distributed and data are collected on an interval or ratio scale. Nonparametric statistics, on the other hand, make no explicit assumption regarding the normality of distribution in the population and are used when the data are collected on a nominal or ordinal scale.

Both descriptive and inferential statistics can be obtained by using PC software programs designed to enter data, edit and analyze them, and produce results for various types of data analyses. Programs such as SPSS, SAS, MINITAB, Excel, and others, are used in social science research. Before discussing data analysis, it would be useful to quickly refresh your memory regarding some of the statistical concepts and their applications.

We will very briefly explain some of the terms and tests such as frequencies, measures of central tendencies and dispersions, correlation, t-test, regression analysis, and the like. The idea is to give an overview of these and their relevance, rather than offer a tutorial in statistical formulas and interpretations, which you might have studied earlier in a course on statistics.

The research Results and the Research Report

When you start your research proposal, you must to take under the consideration three (3) phases.

Phase 1: Initial Assessment of a Problem:

This will require you to identify a product or service and a related problem/interest. This should be addressed by you through a process of gathering and identifying the data required – Stakeholder evaluation is advised. You may also wish to pull together a focus group and undertake secondary research of literature to better inform your approach.

Phase 2: Objectives, Plan and design Research Proposal:

From the identification of your product/service you should project plan your approach to the problem or issue with time-scales (some will be given). This will of course involve an initial focus group, secondary and primary sources of data, acquisition of new data and fieldwork.

Phase 3:  Design Appropriate Collection/Analysis of New Data

In a proposal this can be achieved through explaining the processes surrounding design,  distribution and coding (into a computer package)  questionnaires (quantitative), interview transcripts and/or documents (qualitative).

However, in a thesis/ dissertation, you will need to analyse the coded data IT IS NOT ENOUGH THAT YOU EXPLAIN CODED DATA – everybody can read it. Central importance of your research is analysis. The research and the analyses you have undertaken should then enable you to prepare a comprehensive report which may be for a client or organization.

Data Analysis and Interpretation 

Type of investigation: causal versus correlational

A manager should determine whether a causal or a correlational study is needed to find an answer to the issue at hand. The former is done when it is necessary to establish a definitive cause-and-effect relationship. However, if all that the manager wants is a mere identification of the important factors “associated with” the problem, then a correlational study is called for. In the former case, the researcher is keen on delineating one or more factors that are undoubtedly causing the problem. In other words, the intention of the researcher conducting a causal study is to be able to state that variable X causes variable Y. So, when variable X is removed or altered in some way, problem Y is solved. Quite often, however, it is not just one or more variables that cause a problem in organizations. Given the fact that most of the time there are multiple factors influencing one another and the problem in a chainlike fashion, the researcher might be asked to identify the crucial factors associated with the problem, rather than establish a cause-and-effect relationship.

A study in which the researcher wants to delineate the cause of one or more problems is tailed a causal study. When the researcher is interested in delineating the important variable associated with the problem, the study is called a correlational study. It may be of interest to know that attempts are sometimes made to establish cause-and-effect relationships through certain types of correlational or regression analyses, such as cross-lagged correlations and path

The unit of analysis: individuals, dyads, groups, organizations, cultures

The unit of analysis refers to the level of aggregation of the data collected during the subsequent data analysis stage. If, for instance, the problem statement focuses on how to raise the motivational levels of employees in general, then we are interested in individual employees in the organization and have to find out what we can do to raise their motivation. Here the unit of analysis is the individual. We will be looking at the data gathered from each individual and treating each employee’s response as an individual data source. If the researcher is interested in studying two-person interactions, then several two-person groups, also known as dyads, will become the unit of analysis. Analysis of husband-wife interactions in families and supervisor-subordinate relationships in the workplace are good examples of dyads as the unit of analysis. However, if the problem statement is related to group effectiveness, then the unit of analysis will be at the group level. In other words, even though we may gather relevant data from all individuals comprising, say, six groups, we aggregate the individual data into group data so as to see the differences among the six groups. If we are comparing different departments in the organization, then the data analysis will be done at the departmental level that is, the individuals in the department will be treated as one unit-and comparisons made by treating the department as the unit of analysis.

Our research question determines the unit of analysis. For example, if we wish to study group decision-making patterns, we will probably be examining such aspects as group size, group structure, cohesiveness, and the like, in trying to explain the variance in group decision making and we will be studying the dynamics that operate in several different groups and the factors that influence group decision making. In such a case. The unit of analysis will be groups.

As our research question addresses issues that move away from the individual to dyads, and to groups, organizations, and even nations, so also does the unit of analysis shift from individuals to dyads, groups, organizations, and nations. The characteristic of these “levels of analysis” is that the lower levels are subsumed within the higher levels. Thus, if we study buying behavior, we have to collect data from, say, 60 individuals, and analyze the data. If we want to study group dynamics, we may need to study, say, six or more groups, and then analyze the data gathered by examining the patterns in each of the groups. If we want to study cultural differences among nations, we will have to collect data from different countries and study the underlying patterns of culture in each country.

Individuals do not have the same characteristics as groups (e. g. structure. cohesiveness), and groups do not have the same characteristics as individuals (e. g., IQ, stamina). There are variations in the perceptions, attitudes, and behaviors of people in different cultures. Hence, the nature of the information gathered, as well as the level at which data are aggregated for analysis are integral to decisions made on the choice of the unit of analysis.

It is necessary to decide on the unit of analysis even as we formulate the research question, since the data collection methods, sample size, and even the variables included in the framework may sometimes be determined or guided by the level at which data are aggregated for analysis.

Example                                                                                  

Individuals as the unit of analysis

The Personnel manager of a government organization wants to know how many of the staff would be interested in attending a three-day seminar on making appropriate investment decisions. For this purpose, data will have to be collected from each individual staff member and the unit of analysis is the individual.

Dyads as the unit of analysis

Having read about the benefits of mentoring, a human resources manager wants to first identify the number of employees in three departments of the organization who are in mentoring relationships, and then find out what the jointly perceived benefits (I. e., by both the mentor and the one mentored) of such a relationship are. Here, once the mentor and the mentored pairs are identified, their joint perceptions call be obtained by treating each pair as one unit. Hence, if the manager wants data from a sample of 10pairs, he will have to deal with 20 individuals, a pair at a time. The information obtained from each pair will be a data point for subsequent analysis. Thus, the unit of analysis here is the dyad.

Groups as the unit of analysis

A manager wants to see the patterns of usage of the newly installed Information System (IS) by the production, sales, and operations personnel. Here, three groups of personnel are involved and information on the number of times the IS is used by each member in each of the three groups, as well as other relevant issues, will be collected and analyzed. The final results will indicate the mean usage of the system per day or month for each group. Here, the unit of analysis is the group.

Divisions as the unit of analysis

Procter & Gamble wants to see which of its various divisions (soap, paper, oil, etc.) have made profits of over 12% during the current year. Here, the profits of each of the divisions will be examined and the information aggregated across the various geo- graphical units of the division. Hence, the unit of analysis will be the division, at which level the data will be aggregated.

Industry as the unit of analysis

An employment survey specialist wants to see the proportion of the workforce employed by the health care, utilities, transportation, and manufacturing industries. In this case, the researcher has to aggregate the data relating to each of the subunits comprised in each of the industries and report the proportions of the workforce employed at the industry level. The health care industry, for instance, includes hospitals, nursing homes, mobile units, small and large clinics, and other health care providing facilities. The data from these subunits will have to be aggregated to see how many employees are employed by the health care industry. This will need to be done for each of the other industries.

Countries as the unit of analysis

The Chief Financial Officer (CFO) of a multinational corporation wants to know the profits made during the past five years by each of the subsidiaries in England, Germany, France, and Spain. It is possible that there are many regional offices of these subsidiaries in each of these countries. The profits of the various regional centers for each country have to be aggregated and the profits for each country for the past five years provided to the CFO. In other words, the data will now have to be aggregated at the country level. As can be easily seen, the data collection and sampling processes become more cumber- some at higher levels of units of analysis (industry, country) than at the lower levels (individuals and dyads). It is obvious that the unit of analysis has to be clearly identified as dictated by the research question. Sampling plan decisions will also be governed by the unit of analysis. For example, if I compare two cultures, for instance those of India and the United States-where my unit of analysis is the country-my sample size will be only two, despite the fact that I shall have to gather data from several hundred individuals from a variety of organizations in the different regions of each country, incurring huge costs. However, if my unit of analysis is individuals (as when studying the buying patterns of customers in the southern part of the United States), I may perhaps limit the collection of data to a representative sample of a hundred individuals in that region and conduct my study at a low cost!

Time horizon: cross-sectional versus longitudinal studies

Cross-sectional studies

A study can be undertaken in which data are gathered just once, perhaps over a period of days or weeks or months, in order to answer a research question. Such studies are called one-shot or cross-sectional studies.

Longitudinal studies

In some cases, however, the researcher might want to study people or phenomena at more than one point in time in order to answer the research question. For instance, the researcher might want to study employees’ behavior before and after a change in the top management, so as to know what effects the change accomplished. Here, because data are gathered at two different points in time, the study is no cross-sectional or of the one-shot kind, but is carried longitudinally across a period of time. Such studies, as when data on the dependent variable are gathered at two or more points in time to answer the research question, are called longitudinal studies.

Previous Older Entries