A07: Student loans
This assignment helps you practice with aggregation. Note, not every query will use aggregation; use your judgment.
We will be using the
fafsa_students tables in the cinf201 database. The story of how I acquired these data was told in the Big Data course.
- Report the number of distinct schools.
- Report the number of states represented (should be 62, due to inclusion of territories, etc.).
- Report the number of public schools (school_type = 0) in the 32XXX zip code region.
- Report the application cycle and number of students total (independent + dependent) that have applied to stetson over the years (appcycles); order by appcycle (earliest first). Use only one query, do not directly use Stetson’s ope_id (use a join). Label the column with the sum as “students”.
- Report the school name and average number of dependent students (over all appcycles) for all private schools (school_type = 1) in florida; order by number of students, highest first. Label the column with the student value as “avg_dep_students”.
- Report the single school with the largest number of student applications (dependent + indepdenent) in 2015-2016. Only report the school name. Btw, the answer is UCLA; see this article.
- For each public school in Florida (school_type = 0), show the school name and dependent student count in which the most dependent students applied. Label the columns “school” and “most_dependent_students”. Order by school name (ascending). Limit the output to schools in which the most dependent students is greater than or equal to 1000.
- Report the appcycle and average ratio of dependent students to independent students (i.e., dependent_students / independent_students) for all appcycles except 2016-2017. Sort by appcycle, earlier first. Label the columns appcycle and avg_ratio.