>

No Student Left Unjoined: Basics of SQL Joins

 By Meriel O’Conor, Director of Data Consulting and Training

& Jaylin Dyson, Data Solutions Associate Analyst 

If you’re new to SQL, one of the first concepts you’ll encounter is joining data. 

At Vanderbilt we have thousands of tables of data stored separately. One table might hold core student information, another demographic attributes, another course data, and another faculty assignments. The question is how those pieces connect. 

If you’re fortunate, a data model already exists and clearly defines the relationships. In that case, your job is to understand and apply those relationships. If you’re building your own queries without that guidance, you need to establish those connections carefully and correctly. 

These same concepts apply in tools like Tableau Prep, where you may be joining via visuals not code, but it is still SQL under the surface.

Joins 

A join requires at least one column that links two tables. The most reliable option is a unique identifier. In higher education systems, this might be a Student_ID, Faculty_ID, or Course_ID. If a student’s Student_ID is 687424 in the enrollment table, it should also be 687424 in the financial aid table. That shared key allows you to connect those records and view enrollment and aid data together. 

If a true unique identifier is unavailable, you may be tempted to join on fields such as email address or name. These fields introduce risk. Email addresses change. Names are not unique. Inconsistent formatting can cause mismatches. Whenever possible, use system-generated IDs as join keys. 

Keys: Primary vs. Foreign 

A primary key (PK) uniquely identifies a row in a table (for example, StudentID in the Students table). It effectively defines the table’s grain: one row per student, per course section, or per application. In some cases, identifying a unique row in a table requires more than one field (e.g., course_id + term_id). This is known as a composite key. When keys are composite, you will need to ensure all components are included in the join condition. 

A foreign key (FK) is a column in one table that references the primary key in another (for example, Enrollments.StudentID referencing Students.StudentID). In that case, Students.StudentID is the primary key, and Enrollments.StudentID is the foreign key. FKs establish the relationships that joins use. 

Join Types

Once you have a matching field, the next question is: what data do you want to keep? 

Do you want only matched records? All records from one table? Or everything from both sides? 

Here is a classic Venn Diagram image to help visualize the different join types:


Inner Join 

An inner join returns only rows where a match exists in both tables. 

If you are analyzing students enrolled in courses, you are not interested in students who are not enrolled or courses that have no students. You only care about matched enrollment records. An inner join is appropriate here because it excludes unmatched students and unmatched courses. 

Left Join 

A left join returns all records from the left table and matched records from the right table. If no match exists, the right-side columns return NULL. 

Suppose you want to see all faculty in advising roles and the students assigned to them. You also want to include faculty who do not yet have assigned advisees, but you are not concerned with students who lack advisors. In that case, you would start with the advising faculty table and use a left join to bring in student assignments. This preserves all advising faculty while adding student data where it exists. 

Right Join 

A right join returns all records from the right table and matched records from the left. In practice, this is less common because you can usually achieve the same result by reversing the table order and using a left join. Most SQL practitioners prefer left joins for consistency and readability. 

Full Outer Join 

A full outer join returns all records from both tables, matching where possible and preserving unmatched rows on both sides. 

If you want to identify faculty without advisees, and students without advisors then a full outer join exposes both gaps in a single result set. Some SQL dialects refer to this simply as a FULL JOIN. 

Unions 

Joins combine data horizontally, adding columns from related tables. 

Unions combine data vertically, stacking rows from separate queries. 

If you administer the same student survey each year and store each year’s responses in a separate table, you would use a UNION (or UNION ALL) to combine those datasets into a single result set. For a union to work, each query must return the same number of columns, in the same order, with compatible data types. 

UNION removes duplicate rows. UNION ALL retains them. 

For further structured practice, DataCamp offers a four-hour course on joining data in SQL:
https://www.datacamp.com/courses/joining-data-in-sql 

Structural Considerations 

Primary keys, foreign keys, granularity, and cardinality function together, determining how joins behave and what results mean. Understanding the level of detail in each table prevents duplication errors and incorrect aggregations. 

For a related discussion of granularity and cardinality, see Degrees of Detail

Explore Story Topics