Database Design
Librarian Survey
Database
BRANDON, MCKAY, AND MATSANGAISE
A Report on the creation and implementation of a PostgreSQL database for accessing, querying, and analyzing data collected from Academic librarians working in Association of Research Library Member Institution libraries about their demographics, professional history, and scholarly communication practices.
Table of Contents
Contents
The Database Project _________________________________________________________ 1 Database Environment ________________________________________________________ 2 Information Flow Diagrams _____________________________________________________ 3 User Profiles ________________________________________________________________ 4 Goals and Objectives _________________________________________________________ 7 Database Profile _____________________________________________________________ 8 Sample SQL Queries ________________________________________________________ 10
Pg. 01
The Database Project
The Database Project
Project Description
The data in our database is supplied by a research project conducted in
the spring of 2012 by a group of researchers in the Scholarly Communications
class at SLIS who originally analyzed the data using unlinked spreadsheets.
The data was collected using Google Forms from librarians at Academic ARL
member institutions in the United States. Our goal was to make the data more
usable by supplying new information about the conferences individual librarians
attended, as well as providing a database for querying different types of
relationships between types of librarians and the conferences they attend. In
combination with the information retrieval interface created by the Web
Programming class, this database will allow researchers to find new
connections in the existing data.
Project Members
Melissa Brandon, Clinton McKay, Epaphras Matsangaise, each member
of this group is responsible for additional data collection, as well as
collaborating on the nature and function of the database, within our agreed
upon guidelines.
Pg. 02
Database Environment
Database Environment
Client Profile
Our client is the research team that compiled the data by administering
the survey of Academic ARL member institution librarians. The client collected
that data, met difficulties when trying to analyze it without a database, and
asked that a database be created to facilitate searching for connections
between different entities in the data set. The client also asked that the
database and its data be made available online to users elsewhere so that
researchers could query the data for information on their own research topics.
The client will utilize the database in combination with the information retrieval
interface created by the Web Programming class to further their research
objectives.
Pg. 03
Information Flow Diagrams
Information Flow Diagrams
Explanation:
This flow chart describes the process of gathering data from librarians by creating the survey and distributing it to them via email to member librarians at Academic
Association of Research Libraries Member Libraries.
Pg. 04
User Profiles
Improvements:
-With a database solution as opposed to a spreadsheet, the client will be able to query multiple fields at once whereas before the client was only capable of sorting a single column to find correlations (more powerful searching) -With a database solution, irrelevant answers can be excluded from query results allowing for faster processing of information
-Since information processing is faster, the question and answer cycle depicted in the analysis flow diagram will be much faster.
Pg. 05
User Profiles
User Profiles
Project Leader
This user is a member of the original research team that collected the
data. The Project Leader participates in data analysis but also makes decisions
about changes to the content and structure of the database as more
information is collected and is in charge of granting permissions for external
researchers to access the database online. The Project Leader will work
closely with the Database Administrator. Decisions about corrections to the
data and dealing with inconsistencies in the data will be made by the Project
Leader. This user will also be the primary contributor to the publication for
which this survey and database were designed; they will also be primarily
responsible for the accuracy of the data. The Project Leader has strong to
excellent proficiency in database use and online searching.
Internal Researchers
These researchers are members of the original research team that
collected the data. They have more access than external researchers because
they are in direct contact with the Project Leader. Working under the direction
of the Project Leader, internal researchers will continue to design and
administer surveys to expand the database. The internal researchers will also
be contributors to the publication for which this survey and database were
designed. Internal researchers have moderate to strong proficiency in
database use and online searching.
Pg. 06
User Profiles
External Researchers
These researchers are not members of the original research team that
collected the data. They have been granted access to the database by the
Project Leader. Their use of the database is contingent upon their adherence
to the policies put in place by the Project Leader. These users will have less
access to the data than internal researchers as they are not in direct contact
with the Project Leader on a daily basis. External researchers are expected to
have at least moderate proficiency in database use and online searching.
Pg. 07
Goals and Objectives
Goals and Objectives
Members of the original research team (the Project Leader and internal
researchers) are the primary clients and users of the database. They will use
the database to further their research goals and synthesize new information
from the large dataset. They will also work with the Database Administrator to
enter new information into the database in the future.
Client/User Goals
●
Be able to store and access the data collected from survey respondents
and other sources
○
Answers to questions about demographics, research practices,
publication practices, institutional repositories, emerging
technology, and professional conference attendance
○
Supplemental data collected from online sources
●
Query the data based on multiple parameters
●
Be able to make the data and the database accessible to external
researchers
Database Goals
●
Provide more powerful searching capabilities than are possible with
spreadsheet data
Make it possible for the researchers to share their data with an unlimited
number of external researchers and distribute updates seamlessly
Pg. 08
Database Profile
Database Profile
Business Rules
Each respondent has between zero and many attendances; each attendance is
had by one and only one respondent.
Each conference has between one and many attendances; each attendance
occurs at one and only one conference.
Each conference has one primary topic; each primary topic is had by between
one and many conferences.
Each conference has one primary sponsor; each primary sponsor sponsors
between one and many conferences.
Pg. 09
Database Profile
Pg. 10
Sample SQL Queries
Sample SQL Queries
1) View only the survey responses from male librarians who reported attending the ALA conference in the past 5 years.
SELECT * FROM surveyanswers
JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
WHERE surveyanswers.q3 = 'Male' AND attendance.conf_id = 5
2) Show the number of attendances reported to each of the conferences and the names of the conferences, order by number of attendances (highest to lowest).
SELECT COUNT(attendance.conf_id), conference.name FROM attendance JOIN conference ON conference.id = attendance.conf_id
GROUP BY conference.name, attendance.conf_id ORDER BY COUNT(attendance.conf_id) DESC
3) Show the survey responses of librarians who attended any of the top 5 most attended conferences.
SELECT * FROM surveyanswers
JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
WHERE conference.name IN (SELECT conference.name FROM attendance JOIN conference ON conference.id = attendance.conf_id
GROUP BY conference.name, attendance.conf_id ORDER BY COUNT(attendance.conf_id) DESC LIMIT 5)
Pg. 11
Sample SQL Queries
4) Show only the survey responses from librarians who reported the year they began working in academic libraries, order by the year they reported beginning their career in academic libraries.SELECT * FROM surveyanswers WHERE surveyanswers.q6 > 0 ORDER BY surveyanswers.q6 ASC
5) Show the names of conferences that were attended by librarians who reported using Blogs. SELECT DISTINCT conference.name FROM surveyanswers
JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
WHERE surveyanswers.q7 LIKE '%Blogs%'
6) Show a list of the conferences attended by librarians who reported having Doctoral degrees. SELECT DISTINCT conference.name FROM surveyanswers
JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
Pg. 12
Sample SQL Queries
7) Create a view of all conference attendances reported by librarians who also reported having an MLS/MIS/MLIS degree, then show how many librarians who reported having an MIS degree went to each conference:CREATE VIEW Master_Attendances AS SELECT * FROM attendance
WHERE attendance.respondent_id IN (
SELECT surveyanswers.id FROM surveyanswers
JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
WHERE surveyanswers.q2 LIKE '%M.L.S./M.I.S./M.L.I.S.%' )
SELECT COUNT(DISTINCT Master_Attendances.conf_id), conference.name FROM Master_Attendances
JOIN attendance ON attendance.respondent_id = Master_Attendances.respondent_id JOIN conference ON conference.id = attendance.conf_id
GROUP BY conference.name, attendance.conf_id ORDER BY COUNT(attendance.conf_id) DESC
Pg. 13
Sample SQL Queries
8) Show the number of librarians reporting to have Doctoral degrees.SELECT COUNT(DISTINCT surveyanswers.id) AS "Number of respondents reporting to have Doctoral degrees"
FROM surveyanswers
WHERE surveyanswers.q2 LIKE '%Doctoral%'
9) Show a list of the conferences attended by respondents who report reading professional publications on a daily basis.
SELECT DISTINCT conference.name FROM surveyanswers JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
WHERE surveyanswers.q10 LIKE '%Daily%' ORDER BY conference.name
10) Show a list of the conferences attended by respondents who report that their institutions do not offer support for attending professional conferences.
SELECT DISTINCT conference.name FROM surveyanswers JOIN attendance ON attendance.respondent_id = surveyanswers.id JOIN conference ON conference.id = attendance.conf_id
JOIN confsponsor ON confsponsor.id = conference.sponsor_id JOIN conftopic ON conftopic.id = conference.topic_id
WHERE surveyanswers.q21 LIKE '%No%' ORDER BY conference.name