Import Case Study 2: Presentation Importer

4.7 Case Studies for Semi-Automatic Authoring Importers

4.7.2 Import Case Study 2: Presentation Importer

Presentations are commonly used in education to display lecture slides. Here, we define a presentation to be any series of slides which can be read by Apache OpenOffice Impress. This includes the Microsoft PowerPoint (.ppt), Office Open XML Presentation files (.pptx), OpenDocument (.odp) and the older OpenOffice.org (.sxi) formats. Moodle also supports importing presentations from Microsoft PowerPoint in a similar way [114].

Whereas the MediaWiki format has a predefined structure of headings and subheadings, presentations can have a much wider variety of style (and therefore structure). Indeed, unlike the WikiText format, these presentation file formats do not clearly separate the ‘domain’ content from the ‘presentation’. This makes extracting the raw ‘domain’ content from the presentations much more difficult – therefore making reuse difficult (and, tangentially, reinforcing the importance of

81 the ‘separation of concerns’ principle (as described in Section 2.10.1) within

(adaptive) hypermedia.

4.7.2.1 Reuse of existing content (post-import editing)

It is likely that many lecturers will have already created lecture slides using some presentation software that is compatible with the above formats. After they have imported the presentation, they are free to edit the imported Domain and Goal maps (post-import editing), possibly enhancing their original ‘linear’, ‘real-world’ presentations by adding extra content, such as hyperlinks, to other resources or videos. Similarly, it is likely that an author may wish to reuse somebody else’s presentation slides. As with the MediaWiki importer, it is left to the author to check for accuracy and copyright issues within the imported material.

4.7.2.2 Creation of new content (pre-import editing)

It is also possible that a content author might wish to create original content for the AH course, using PowerPoint. This is particularly likely if a user needs to draw a diagram, or create a slide based on a pre-existing slide.

4.7.2.3 Interpreting a Structure from Presentations

When the user uploads a file to the server (via the MOT3.0 import interface), the server automates OpenOffice Impress22 to convert the file into a series of HTML and JPEG files. This is a standard feature of Impress, to allow the presentation to be exported to a webpage, and using an established external piece of software like this ensures that the content can be accurately converted to domain content (RI1.1).

82 PowerPoint also offers this facility, however it was decided to use Impress so that it could interact with PHP scripts from a Linux-based server.

The design decision was taken that for each slide in the presentation file, MOT3.0 will generate a concept. The presentation author will already have separated the content separate slides, so the content should already be separated into standalone pieces (RI1.2). Each concept will contain the following attributes.

 Title: On most slides Impress is able to automatically identify the title of the slide and put it in an <h1> tag (and the <title> tag) at the top of the

generated HTML file. Although most presentation applications provide a template layout for each slide, it is likely that the layout of some slides will be drastically changed by the user (for instance to show a fullscreen diagram), which would mean that no title can be extracted. In this case MOT3.0 records the title as ‘Untitled’.

 Text: The HTML file also contains the HTML representation of any text elements from the slide. Background styles and images are ignored, however paragraph styles, bullet points and most HTML compatible font styles (such as colour, italic and bold emphasis) are preserved.

 Image: The exporter automatically generates a JPEG file of the slide. This not only maintains the style of the slide, but also preserves any diagrams or images that could not be exported to the text attribute. As part of the import process, the MOT3.0 PHP script copies the generated images into a publicly readable folder on the webserver, and the attribute created

83 consists of an <img> tag with a src attribute that points to the image on the webserver.

 Notes: The presenter’s notes are also written to the HTML file (the notes

occur under a ‘<h3>Notes:</h3>’ heading). MOT3.0 can identify the notes part and store it as a separate attribute.

We considered it is important to preserve each of these elements of the slide to ensure that no information is lost. The aim was to extract as many different types of attributes as possible from the original linear content. This provides alternative methods (attribute types) for describing the same concept (slide) – therefore adhering to requirement RI1.4. Content imported in this way could therefore be immediately used, for instance, in a multimedia strategy (as defined in Section 2.6.2.4 ).

A revision strategy could also be easily created using such imported material. For instance, if a presenter’s notes contained a more detailed script about the slide, the learner could initially be shown the concept’s image and notes attributes. However, when the learner returns to this slide after reading the rest of the course, the concept might only show the text attribute, to remind the learner of the slide’s bullet points, without making the user read the full notes attribute. Although such strategies only use content attribute types (rather than pedagogical labels), these example strategies fulfil RI3.1.

Currently, the hierarchies generated by the importer are linear, simply containing a root concept for the first slide, with each other slide as a child concept of the root.

84 Since there are no direct relationships between slides (other than Next and

Previous), this basic format fulfils RI1.3. As with the MediaWiki importer, the structure of the Goal map simply mimics the structure of the Domain map – thus interpreting (quite straightforwardly) the Next and Previous relationships as prerequisite relations, and fulfilling RI2.1.

4.7.2.4 Evaluation

The presentation importer was developed after the Romanian evaluation described in Section 4.7.1.4 , however it formed a major part of the evaluation with the six course designers at the University of Warwick (introduced in Section 4.7.1.6 ). In particular, they were asked the following (the frequency of each answer is in brackets).

1) What do you think about importing Presentation content?

a) Being able to import Presentation content as domain maps is:

Very Useful (6), Quite Useful (0), Slightly Useful (0), Not Useful (0)

b) Being able to import Presentation content as goal maps is:

Very Useful (5), Quite Useful (1), Slightly Useful (0), Not Useful (0)

2) What do you think about the functionality of importing content into MOT3.0? a) The content of an imported Presentation is:

Good (2), Sufficient (4), Bad (0), Insufficient (0)

b) The number of attributes extracted from a Presentation is:

Good (3), Sufficient (3), Bad (0), Insufficient (0)

c) The type of attributes extracted from a Presentation is:

85 d) Speed of importing a Presentation is:

Good (0), Sufficient (2), Bad (4), Insufficient (0)

As with the Wikipedia importer, all 6 users thought that importing presentation content as domain maps was Very Useful. 5 out of 6 users thought importing the presentation content as goal maps was Very Useful, with the other saying Quite Useful.

There was also an interesting comment concerning the type of attributes extracted:

“For PPT, different ways of extracting finer granularity need revisited - e.g., extracting individual images from a page, extracting bullet points as separate concepts or attributes, etc.”

This comment is interesting because it points towards a way of creating a hierarchy out of an otherwise flat structure, and could be investigated in future research.

The major issue surrounding the importer was the speed of importing. The processing time is proportional to the number of slides in the presentation, and each slide took around 5 seconds to process. As with the Wikipedia importer, this is not scalable if several users try to use the feature at the same time. The responders commented that it would be better to have a progress bar (or some other

feedback) to show how many slides need to be imported, and provide an estimated completion time.

86 Table 4.2 shows the results of a t-test for the presentation question against an expected result of 0.

Question Mean StdDev T P

 The content of the imported

Presentation

1.333 0.516 6.32 0.001

 The number of attributes extracted

from a Presentation

1.5 0.548 6.71 0.001

 The type of attributes extracted from a

Presentation

1.333 0.516 6.32 0.001

 Speed of importing a Presentation -0.333 1.033 -0.79 0.465

Table 4.2: T-Test results compared with an expected result of 0

4.7.2.5 Future Improvements to the Presentation Importer

As suggested by the evaluation comments, some progress bar information needs to be relayed to the user. The comment described above about extracting a finer granularity from the presentation slides (e.g. bullet points and individual images), is also very interesting. For instance, if a slide contains animated bullet points, it would be interesting to simulate this animation by separating the bullet points into separate attributes. The corresponding goal model sublessons could then have a label named ‘bullet’, with a weight relating to its sequence in the animation. The system could then recommend a strategy which would be similar to a roll-out strategy (defined in Section 2.6.2.2 ).

Future research will investigate how to analyse the titles of the slides, and structure the generated hierarchy accordingly. For instance, if a presentation contains four

87 consecutive slides with the same title, the importer could infer a relationship

between these slides, and restructure the domain hierarchy accordingly (i.e., all four concepts will be children of a grouping concept). However, since there is no de facto standard for designing slides it might be difficult to reliably interpret such a structure.

In document Manual and automatic authoring for adaptive hypermedia (Page 106-113)