Q2 . A college counselor was assigned to work with a group of 20 entering freshmen who have not yet declared
Q2 . A college counselor was assigned to work with a group of 20 entering freshmen who have not yet declared a major. The counselor believed that it would be helpful to form small support groups of these students and would like to group them on the basis of their interests. At orientation, these students filled in an Interest Inventory. Two summary scores, namely, Academic Comfort (AC) and Introversion-Extraversion (IE), were then computed. Academic comfort measures one’s comfort and likelihood of persistence in an academic environment, and Introversion-Extraversion measures one’s vocational preferences for ideas/things (Introversion) or for work with people (Extraversion).
The dataset: hw _ q2.sav contains the following variables
Variable Definition
AC Academic Comfort score
IE Introversion-Extraversion score
id student id
-
Generate a scatter plot of AC (y-axis) and IE (x-axis) scores of the 20 students.
Based on the plot, how many clusters do you think there are? Indicate the groupings in
the scatter plot (by drawing "boundary" for each group). [8marks]
-
Run a hierarchical clustering analysis to classify the students with the AC and IE
scores. Use Euclidean distance for proximity measure (without standardization).
Compare the results (in terms the dendrograms) for single-linkage and complete-
linkage method. How many clusters do you suggest? [12 marks] -
Rerun the cluster analysis by K-means method with 4 clusters. How do the four groups
differ in terms the AC and IE scores? [Set "iteration" to 3 00 and do not use "running
means"].
Remark s : As K-means may suffer from local optima problem, try to sort the data in four
different ways: 1) AC in ascending order, 2) AC in descending order, 3) IE in
ascending order and 4) IE in descending order.
For each sorting order, run the K-means analysis.
Compare the classification results, are there all the same? [10 marks] - Compare the classification results from complete linkage method with 4 clusters in (b)
and K-means in (c) [the solution that occurs most frequently out of 4 sorting order].
Are the results comparable? [10 marks]
Deliverable: Word Document
