• No results found

Data Package Contents

In document EdX Research Guide. Release (Page 33-38)

Each of the files you download contains one or more files of research data.

4.4.1 Extracted Contents of {org}-{site}-events-{date}.log.gz.gpg

The {org}-{site}-events-{date}.log.gz.gpg file contains all event data for courses on a single edX site for one 24-hour period. After you download a {org}-{site}-events-{date}.log.gz.gpg file for your institution, you:

1. Use your private key to decrypt the file. SeeDecrypt an Encrypted File.

2. Extract the log file from the compressed .gz file. The result is a single file named {org}-{site}-events-{date}.log. (Alternatively, the data can be decompressed in stream using a tool such as gzip.)

For more information about the events in this file, seeEvents in the Tracking Logs.

4.4.2 Extracted Contents of {org}-{date}.zip

After you download the {org}-{date}.zip file for your institution, you:

1. Extract the contents of the file. When you extract (or unzip) this file, all of the files that it contains are placed in the same directory. All of the extracted files end in .gpg, which indicates that they are encrypted.

2. Use your private key to decrypt the extracted files. SeeDecrypt an Encrypted File.

The result of extracting and decrypting the {org}-{date}.zip file is the following set of .sql, .csv, and .mongo files. Note that the .sql files are tab separated.

4.4. Data Package Contents 27

• {org}-{course}-{run}-auth_user-{site}-analytics.sql

• {org}-{course}-{run}-auth_userprofile-{site}-analytics.sql

• {org}-{course}-{run}-certificates_generatedcertificate-{site}-analytics.sql

• {org}-{course}-{run}-course_structure-{site}-analytics.json

• {org}-{course}-{run}-courseware_studentmodule-{site}-analytics.sql

• {org}-email_opt_in-{site}-analytics.csv

• {org}-{course}-{run}-student_courseenrollment-{site}-analytics.sql

• {org}-{course}-{run}-user_api_usercoursetag-{site}-analytics.sql

• {org}-{course}-{run}-user_id_map-{site}-analytics.sql

• {org}-{course}-{run}-{site}.mongo

• oraSubdirectory

• {org}-{course}-{run}-student_anonymoususerid-prod-analytics.sql.gpg

• {org}-{course}-{run}-wiki_article-{site}-analytics.sql

• {org}-{course}-{run}-wiki_articlerevision-{site}-analytics.sql

{org}-{course}-{run}-auth_user-{site}-analytics.sql

Information about the users who are authorized to access the course. SeeColumns in the auth_user Table.

{org}-{course}-{run}-auth_userprofile-{site}-analytics.sql

Demographic data provided by users during site registration. SeeColumns in the auth_userprofile Table.

{org}-{course}-{run}-certificates_generatedcertificate-{site}-analytics.sql

The final grade and certificate status for students (populated after course completion). SeeColumns in the certifi-cates_generatedcertificate Table.

{org}-{course}-{run}-course_structure-{site}-analytics.json

This file documents the structure of a course at a point in time. The file includes data for the course, including important dates, pages, and course- wide discussion topics. It also identifies each item of course content defined in the course outline. A separate file is included for each course on the site. For more information, seeCourse Content Data.

{org}-{course}-{run}-courseware_studentmodule-{site}-analytics.sql

The courseware state for each student, with a separate row for each item in the course content that the student accesses.

No file is produced for courses that do not have any records in this table (for example, recently created courses). See Columns in the courseware_studentmodule Table.

{org}-email_opt_in-{site}-analytics.csv

This file reports the email preference selected by students who are enrolled in any of your institution’s courses. See Institution-wide Data.

{org}-{course}-{run}-student_courseenrollment-{site}-analytics.sql

The enrollment status and type of enrollment selected by each student in the course. See Columns in the stu-dent_courseenrollment Table.

{org}-{course}-{run}-user_api_usercoursetag-{site}-analytics.sql

Metadata that describes different types of student participation in the course. See Columns in the user_api_usercoursetag Table.

{org}-{course}-{run}-user_id_map-{site}-analytics.sql

A mapping of user IDs to site-wide obfuscated IDs. SeeColumns in the user_id_map Table.

{org}-{course}-{run}-{site}.mongo

The content and characteristics of course discussion interactions. SeeDiscussion Forums Data.

oraSubdirectory

The ora subdirectory contains SQL tables for data relating to any open response assessment (ORA) problems in your organization’s courses. For more information, seeOpen Response Assessment Data.

• {org}-{course}-{run}-assessment_assessment-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_assessmentfeedback-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_assessmentfeedback_assessments-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_assessmentfeedback_options-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_assessmentfeedbackoption-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_assessmentpart-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_criterion-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_criterionoption-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_peerworkflow-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_peerworkflowitem-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_rubric-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_studenttrainingworkflow-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_studenttrainingworkflowitem-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_trainingexample-prod-analytics.sql.gpg

• {org}-{course}-{run}-assessment_trainingexample_options_selected-prod-analytics.sql.gpg

• {org}-{course}-{run}-submissions_score-prod-analytics.sql.gpg

• {org}-{course}-{run}-submissions_scoresummary-prod-analytics.sql.gpg

• {org}-{course}-{run}-submissions_studentitem-prod-analytics.sql.gpg

• {org}-{course}-{run}-submissions_submission-prod-analytics.sql.gpg

4.4. Data Package Contents 29

• {org}-{course}-{run}-workflow_assessmentworkflow-prod-analytics.sql.gpg

• {org}-{course}-{run}-workflow_assessmentworkflowstep-prod-analytics.sql.gpg

{org}-{course}-{run}-student_anonymoususerid-prod-analytics.sql.gpg

A mapping of user IDs to the course specific anonymous IDs used by open response assessment tables. SeeColumns in the student_anonymoususerid Table.

{org}-{course}-{run}-wiki_article-{site}-analytics.sql

Information about the articles added to the course wiki. SeeFields in the wiki_article File.

{org}-{course}-{run}-wiki_articlerevision-{site}-analytics.sql

Changes and deletions affecting course wiki articles. SeeFields in the wiki_articlerevision File.

Student Info and Progress Data

The following sections detail how edX stores stateful data for students internally. This information can be useful for developers and researchers who are examining database exports.

• Conventions

• MySQL Terminology

• User Data

• Courseware Progress Data

• Certificate Data

EdX also uses the Django Python Web framework. Tables that are built into the Django Web framework are docu-mented here only if they are used in unconventional ways.

5.1 Conventions

EdX uses MySQL 5.1 relational database system with InnoDb storage engine.

The following conventions apply to most of the .sql output files. The exception is the courseware_studentmodule table, which is created by a different process than the other edX SQL ta-bles.

• Output files are stored as UTF-8.

• Datetimes are stored as UTC (Coordinated Universal Time), and appear without trailing zeros.

• The .sql files are tab separated. Embedded tabs are replaced by the two character sequence \t.

• Records are delimited by newlines. Embedded newlines are replaced by the two character sequence \n.

• Embedded carriage returns are replaced by the two character sequence \r.

• Backslash characters (\) are escaped as \\.

Note: The submission table for open response assessments stores raw text that is JSON encoded.

When the last four of these conventions are applied to the submission.raw_answer column, the result is doubly encoded values.

Descriptions of the tables and columns that store student data follow, first in summary form with field types and constraints, and then with a detailed explanation of each column.

31

In document EdX Research Guide. Release (Page 33-38)

Related documents