• No results found

Google Analytics and UT Urchin - An Evaluation

N/A
N/A
Protected

Academic year: 2021

Share "Google Analytics and UT Urchin - An Evaluation"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Last Edited August 28, 2014 PLAN REQUIREMENTS SOLUTION ANALYSIS DESIGN BUILD TEST TRAIN/DEPLOY MAINTENANCE

Research Summary

Executive Summary

One of the university’s core Web analytics tools, Urchin, is no longer supported by Google, and the UT Urchin server is out of warranty. The Web Analytics Tool Assessment project (WATA) was assigned to review Web analytics tools, beginning with Google Analytics and Splunk, which are already in active use at the University, and determine which would best meet the needs of the University.

During the WATA project, Splunk was removed from the assessment list because the Web Intelligence plug-in is no longer being supported and it could not be considered a feasible enterprise solution. A review of Web analytics tools used by peer institutions revealed that Google Analytics was used by all of the universities surveyed, and no other viable alternative was discovered. Therefore, the focus of the project shifted to reviewing Google Analytics.

The WATA project team and customer steering committee identified fourteen areas in which Google Analytics could not fully meet the University’s Web analytics needs, and possible solutions were suggested for each of the areas. Also, a risk analysis was used to rate the probability and level of impact for each area if no solution could successfully be applied.

Research Summary and Results

Historical Data

ITS Applications recognized a need for centralized aggregated logging of analytics in 2002. A beta test of Urchin Analytics began in 2004. The success of that proof of concept led to an enterprise-wide implementation of Urchin in late 2004.

Urchin was the sole enterprise-wide source of Web analytics data until the Office of the President created a social media strategy for the University in 2009. As part of the social media strategy, Google Analytics, already an industry standard, was recommended to be implemented widely across campus. The Office of Public Affairs and the ITS Web Team were tasked with coordinating the implementation of Google Analytics on key pages with webmasters around campus.

At the time of the initiative to implement Google Analytics across the University’s Web presence, the Information Security Office expressed concern about Cat-1 or sensitive data being sent to Google’s servers in the cloud. Splunk was recommended as a log-based tool that could be hosted locally to provide robust

(2)

Web analytics. However, the Office of the President and the Office of Public Affairs chose to move forward with Google Analytics while providing the guideline that webmasters should only implement the Google Analytics tracker on public pages in order to prevent the potential transmission of sensitive data. Many departments across campus participated in the initiative, but the project did not get buy-in from every targeted department.

The Office of the Registrar installed its own instance of Splunk to handle user-based logging, which provided a level of granularity that Google Analytics does not provide and could be used on pages that collect Cat-1 data.

In June 2013, the Web and Contracts Services team became aware that the Web Intelligence plug-in for Splunk is no longer being actively supported. While Splunk continues to be a valuable tool for Web server analysis, it can no longer be considered a viable option for an enterprise-level Web analytics tool.

Peer Institution Benchmarking

Before starting to review the available Web analytics tools that may fit the university’s needs, the project team performed a very brief survey via email to 15 peer institutions in order to find out what tool they use and how they handle secure pages.

Out of the 15 peer institutions that were asked to participate, only one - the Ohio State University - provided a response, without specifying a tool. The response stated that the central IT department uses a homegrown tool and that other departments may use 3rd party tools.

As no usable feedback was obtained through the preliminary peer institution survey, the WATA team performed benchmarking on 28 universities based on the UT Web publishers list of peer institutions, in order to gain a better perspective on which Web analytics tools are used in academia. The team searched on each university’s website to determine which tools are used to analyze Web data. Out of the 28 universities, 23 provided online information about the tool(s) they use. As it is illustrated below, all 23 use Google Analytics and only three use additional tools besides Google Analytics.

University Web Analytics Tool

1 Indiana University Bloomington Google Analytics

2 Michigan State University Google Analytics

3 Ohio State University, Main Campus Google Analytics 4 University of California, Berkeley Google Analytics 5 University of California, Los Angeles Google Analytics 6 University of Illinois at Urbana-Champaign Google Analytics 7 University of Michigan, Ann Arbor Google Analytics 8 University of Minnesota, Twin Cities Google Analytics 9 University of North Carolina at Chapel Hill Google Analytics

10 University of Washington, Seattle Google Analytics & Urchin 11 University of Wisconsin at Madison Google Analytics

(3)

12 Carnegie Mellon University Google Analytics & Urchin

13 Columbia University Google Analytics

14 Duke University Google Analytics

15 Harvard University Google Analytics

16 Massachusetts Institute of Technology Google Analytics

17 Northwestern University Google Analytics

18 Stanford University Google Analytics

19 The University of Southern California Google Analytics

20 University of Florida Google Analytics

21 University of Pennsylvania Google Analytics & WebLog Expert

22 Texas A&M Google Analytics

23 Yale University Google Analytics

The results from the peer institution benchmarking confirm that Google Analytics is the standard Web analytics tool in higher education, and it is being widely used despite the constraints that have been identified, such as not allowing Google Analytics to collect sensitive data.

Surveys

The WATA team conducted a survey of campus Web publishers via the UT survey tool Qualtrics to determine which Web analytics tools are currently in use along with the strengths and weaknesses of each tool (see Appendix A). The survey was publicized on the University's WebPub mailing list, made up primarily of Web publishers, and on the TXEdge mailing list, made up primarily of developers.

In total, 29 campus developers coming from various departments responded; 23 use Google Analytics, 12 use Urchin and 7 use other Web analytics tools. The results showed that Google Analytics is widely used by almost 80% of those responding. Many of these users also use other tools such as Urchin. Of those responding, less than 10% use Urchin exclusively and these users come from the Office of Research Support, LAITS in College of Liberal Arts and the Office of Student Financial Services.

Overall, the campus developers that use Google Analytics consider it a user-friendly tool that covers their needs with simple implementation. The developers that use Urchin exclusively or together with Google Analytics find Urchin to be hard to use with a less intuitive interface but think it captures more information as a log based tool. Since the majority of developers already use Google Analytics, we will focus on their responses about the aspects they like to change or improve about Google Analytics.

From the survey responses the following points were listed as the main areas for improvement for Google Analytics at UT:

 Google Analytics fails to automatically capture information about binary downloads such as PDFs and image files.

(4)

 Google Analytics cannot track user visit data and provide statistics if JavaScript is turned off in a browser.

 UT policy1 currently prohibits UT Web developers from using Google Analytics to track interactions and provide statistics for secure pages because it is a third-party tool and it is not hosted locally. All of the points mentioned above were included in the requirements for an enterprise Web analytics tool for UT. These and other items are examined below for potential mitigation strategies.

Recommendation

Based on our research, Google Analytics is the only viable option for an enterprise-level Web analytics tool. Therefore the WATA project has shifted to focus only on Google Analytics and specifically on those areas in which the tool cannot fully meet the university’s needs. The WATA team has identified deficiencies in Google Analytics from the project’s Requirements document based on input from Web analytics experts across campus, the campus analytics user survey, and assistance from the customer steering committee.

Areas to Mitigate

Focusing solely on Google Analytics, the project team identified shortcomings in Google Analytics as a campus-wide tool, and conducted some preliminary research to determine whether these shortcomings could feasibly be mitigated. Through consultations with campus experts on Google Analytics, the team identified some possible solutions to the areas of shortcoming. In total fourteen areas were identified in which the tool does not fully meet the project requirements. The areas to be mitigated, as well as a suggested solution for each area, were separated into five main categories and are summarized below:

Areas Suggested Solution Strategy

Authentication/Authorization 1. Google Analytics does not use EID

authentication.

Research a) a SAML-based solution2, and/or b) a Google single sign-on server solution.

2. Google Analytics does not support authorized user groups.

Research an LDAP-based solution3.

1

Section 5.11 of the Information Resources Use and Security Policy deals with the handling of sensitive data, and section 5.11.2.4 specifically states that: "If the university intends to provide Category I Digital Data to a third party acting as an agent of or otherwise on its behalf (such as an application service provider) and if it determines that its provision of Category I Digital Data to a third party will result in a significant risk to the confidentiality and/or integrity of such Data, a written agreement with the third party is required. The agreement must specify terms and conditions that protect the confidentiality and/or integrity of the Category I Digital Data as required by this Policy. The written agreement must require the third party to use appropriate administrative, physical, and technical safeguards to protect the confidentiality and/or integrity of all Category I Digital Data obtained and that the university, as applicable, shall monitor compliance with the provisions of the written agreement."

2 Security Assertion Markup Language (SAML) is an XML standard that allows secure web domains to exchange user authentication

(5)

Areas Suggested Solution Strategy 3. Google Analytics does not authorize user

groups to view and create reports.

Research an LDAP-based solution.

4. Google Analytics does not provide an API to allow for automated deprovisioning when a user leaves the University.

Research an LDAP-based solution.

Privacy 5. Google Analytics does not report on

interactions that don't generate page views, such as carousel clicks and dropdown clicks.

Research a Google Tag Manager solution4.

Tracking 6. Google Analytics does not track all user traffic on authentication protected pages.

Consult with the ISO to establish guidelines for the use of Google Analytics on authentication protected pages.

7. Google Analytics does not track all user traffic on browsers with deactivated JavaScript.

Research php-ga5 as a possible server-side solution.

8. Google Analytics does not track all user traffic across public UT hosted sites, e.g. sites utilizing different tracking accounts.

Research a Google Tag Manager solution.

Reporting 9. Google Analytics does not report on binary files downloaded, such as most frequently

downloaded pdf files.

____________________________________________________ 1

Security Assertion Markup Language (SAML) is an XML standard that allows secure web domains to exchange user authentication and authorization data.

2

Lightweight Directory Access Protocol (LDAP) is an open application protocol for accessing and maintaining distributed directory information services.

3

Google Tag Manager is a tool that helps manage different kinds of tags that are on a website from one interface.

Research a Google Tag Manager solution.

10. Google Analytics does not report on file size, such as most frequently downloaded files over a certain size.

Research a Google Tag Manager solution.

11. Google Analytics does not report on click paths across UT hosted sites.

Research a Google Tag Manager solution.

Constraints

3

Lightweight Directory Access Protocol (LDAP) is an open application protocol for accessing and maintaining distributed directory information services and is supported by the UT EID system.

4

Google Tag Manager (GTM) is a tool that extends the functionality of Google Analytics by managing different kinds of tags that are on a website from one interface.

5

(6)

Areas Suggested Solution Strategy 12. Google Analytics must comply with the

university's Web Accessibility Policy, www.utexas.edu/web-accessibility-policy

An initial accessibility testing was performed and the Google Analytics reporting dashboard was found fairly inaccessible but the tables of raw data are accessible. Perform a complete accessibility testing.

13. Google Analytics must comply with University Campus IT Policies and Standards,

http://www.utexas.edu/cio/policies/

Consult with the ISO to establish guidelines for the use of Google Analytics.

14. Google Analytics must comply with the University's Web Privacy Policy,

http://www.utexas.edu/web-privacy-policy

Consult with the ISO and Legal Services to establish guidelines for the use of Google

Analytics and review the University’s Web Privacy Policy.

Risk analysis

In order to assess whether the implementation of Google Analytics as the official Web analytics tool of the university will fulfill the developers’ needs and successfully meet the concerning areas, the project team performed a risk analysis (see Appendix B). The risk analysis in this section describes the probability of occurrence and potential impact of not being able to mitigate the predefined areas with the suggested solution strategies.

From the analysis, the biggest concern of implementing Google Analytics as the enterprise-wide analytics tool is not to be able to fully mitigate the issue of tracking information from authenticated pages and complying with the University’s IT and Privacy policies.

Conclusion and Next Steps

During the Web Analytics Tool Assessment project it became evident that the replacement for Urchin will be Google Analytics, a tool that is an industry standard and is being used widely by the University’s peer institutions. Google Analytics may not cover the University’s needs completely, but there are available options that are able to fulfill or mitigate the areas that the tool itself cannot.

The Web Analytics Tool Assessment project will be completed once the current document is approved by the project’s CSC and the WTI committee. A later project will follow, where Google Analytics will be implemented as the enterprise Web analytics tool with support from ITS, and exploring the suggested solutions provided for each deficiency area.

Revision/Approval History

Version Date Updater Name Description

V 0.1 7/9/2014 Christina Konstantinidou Initial draft completed.

V 0.2 7/27/2014 Christina Konstantinidou Updates based on team’s feedback. Appendix section was added.

(7)

V 0.3 8/4/2014 Christina Konstantinidou and

Rachel Strain Updates based on the Director’s feedback. V 0.4 8/28/2014 Christina Konstantinidou Approved by the CSC.

V 1.0 9/4/2014 Christina Konstantinidou Approved by WTI.

(8)

Appendix A

Google

Analytics

Urchin Other What do you like about your

current Web analytics tool(s)?

What would you change about your current Web analytics

tool(s)?

What Web analytics functions are most important to you?

Department

1 1 It lets us know which pages are popular, how visitors are coming to our pages, it lets us track how effective our email campaigns have been.

I can't think of anything specific that I would change. It provides good detail for our sites.

Seeing visitor count, visitor paths, and landing pages.

Moody College of Communication

2 1 I like the support for multiple date ranges. /

The interface is difficult to understand. It is hard to figure out page visits for specific pages and files.

Counting page visits. Office of Research Support

3 1 1 Google Analytics is pretty much the industry standard, and it works well for our needs, allowing us to track all of the things that are important to us, with the flexibility to track way, way more than we're currently tracking, if we wanted to extend our analytics.

The main downside to relying on Google Analytics, a client-side tool, is that it fails to automatically capture information about non-web binary downloads like PDFs and image files (these have to be handled through event tracking, which is somewhat cumbersome, but at least the option exists). This is why having two tools (GA and Urchin) made sense for us, as the server-side tool could scrape much more info about those file requests.

Visualizing traffic patterns, measuring goals, learning about the technologies that our visitors are using (browser versions, device info, presumed bandwidth, etc.)

School of Law

4 1 Every other week, the ICES department at UT analyzes their pageviews, referrals from social media sites, number of mobile views, and the search terms that led to people reaching our site. We can also see which files were downloaded, and we can compare all of our numbers to the same period in the previous year.

The dashboard in Google Analytics can be a bit

overwhelming at times with all of its numerous options.

ICES

5 1 Being able to use it. Simple page views, internal referrers

6 1 1

7 1 Difficult to use Urchin for providing site traffic estimates to clients. Clients don't understand how to use the interface and what the various reports mean.

Estimates of unique visitors, patterns of user visitation to site (which content, times/days of access, etc), use of specific site resources

LAITS in College of Liberal Arts

8 1 Keeps it in house rather than feeding Google.

In-house stats. Also indicates unwanted hot links.

9 1 Everything - it gives me all of the data I need to do my job

I would have every site at UT using the "common" tracker or only the home and core pages. Right now it is a mix and it messes up the data.

The standard functions are all I really need - the ability to find visits, visitors (unique and total), exit rates, bounce rates, previous click, next click, all trended over time for any page on the site.

University Communications

10 1 1 1 Google is very user-friendly and provides statistics frequently requested by our marketing department. GoSquared is even more user-friendly, beating out Google Analytics when it comes to real-time statistics.

Google and GoSquared are easy to use and allow for easy analysis of site traffic. Urchin never produced very user-friendly reporting information (graphs, real-time statistics).

Pageviews, geographic locations of visitors, and visits that originate from our marketing efforts.

(9)

11 1 1 I like Google Analytics' default overview reports, custom report building, and automatic email distribution options (emails are helpful for content managers to get regular feedback). In both Google Analytics and Urchin, I like being able to see broad trends and drill down into specifics as needed. / /

Google Analytics' interface is always changing, and I never feel like I have a solid grasp on all the functions and features, so a more stable interface is something I wish for there. Urchin has been more static/predictable over time, but the interface is not terribly user-friendly, and feels really clunky overall. I'd like to see an improved interface there.

easily generated reports that are understandable by staff not well-versed in web analytics

Tarlton Law Library

12 1 Being able to track visits and interactions with our site over time.

I wish that Google would stop changing the interface. I wish that I could feel more confident that internal traffic is being excluded.

Referral and source data University Extension

13 1 Industry standard, web-based interface, exportable data, simple implementation

More and better documentation Frank Erwin Center

14 1 Hard to beat GA reporting capability.

Lack of stats if JS is turned off in the browser - not as reliable as server log-driven stats. Would ideally like to have both GA and server log-driven stats on all my websites.

Office of

Development

15 1 Very visual, and easy to navigate. Its also easy to manage because we use Google + and other google services under the same account.

I wish I knew more about the social integration/tracking. Still means I have to go multiple places to get a comprehensive analytical profile of our comms.

Seeing what content is - and isn't - getting traction.

center for transportation research

16 1 1 I like Google's crawl error analysis. / / Analog is crude and ugly, but works OK for ad-hoc reporting

I don't like the additional latency that stems from basing analytics on communication with a remote server.

I'm more interested in error analysis and performance metrics than traffic analysis and search engine optimization

Harry Ransom Center

17 1 I like that I can exclude a few specific staff IPs so I can get an accurate visual of how outside users are using our website and not have bloated stats when staff are working on specific projects (i.e., for our purposes we don't care about internal traffic so that we can show the value of the public website versus an intranet). Love the graphics options, being able to see traffic flow and site search terms, etc.

(1) I wish that it could pull from the server logs to give a stat of PDFs that are downloaded directly (I use a hack to find out when a PDF link is clicked on our webpage, but if somebody has the direct link to a PDF, I don't have a stat for that. (2) I wish I could have separate analytics for known robots versus true traffic (2) I'm irritated that the new dashboards only allow 10 or so saved stats on a page--I have to PDF 5 different dashboards and then combine them to get my monthly report (3) I don't like the graphics for the Geographic map overlay--I wish the "visits by country" had the number included on the overlay for printing purpose and I wish the "visits by city" was slightly different (the dot over Austin covers up the entire state)

See above & daily visits, site search terms, new vs. returning and visitor frequency, unique visitors, PDF downloads

(10)

19 1 1 Data Driven Decisions - Inform clients of how to male changes and then be able to measure effects of changes

Google could update our version so I could use Tag manager. Urchin is slow and could be a better tool if we could make it perform faster. Also, transparency to the giving/donation page. Development is important but how de we know if we actually accomplish anything. This needs to be a shared and transparent effort.

CSV down loads. Automatic report compilation with email PDFs. Pivot Tables. Regular Expressions, Tag management. Filters, ISP and Geo graphic filters

LAITS

20 1 Moody College of

Communication

21 1 1 1 Web and Contract

Services 22 1 1 McCombs School of Business 23 1 1 University Unions 24 1 College of Fine Arts

25 1 1 Housing and Food

Services 26 1 Office of Student Financial Services 27 1 Texas Advanced Computing Center 28 1 Texas Advanced Computing Center 29 1 1 Provost's Office Appendix B

The risk analysis is based on the project team members’ familiarity with Web analytics and the University’s Web standards. The values in the table below vary from 1 to 5, where 1 is the lowest value and 5 is the highest. The score of each risk results from the formula: (Probability x Impact), and the items with the highest values are the ones that have the highest probability of occurring and will also have the highest impact if they occur.

Risk Probability Impact Risk score

1. Google Analytics would not be able to provide EID authentication using a SAML-based solution, and/or a Google single sign-on server solution.

3 1 3

2. Google Analytics would not be able to support authorized user

groups using an LDAP-based solution. 4 2 8

3. Google Analytics would not be able to authorize user groups to

view and create reports using an LDAP-based solution. 4 2 8

4. Google Analytics would not be able to provide an API to allow for automated deprovisioning when a user leaves the University using an LDAP-based solution.

(11)

5. Google Analytics would not be able to report on interactions that don't generate page views, such as carousel clicks and dropdown clicks using a Google Tag Manager solution.

1 3 3

6. Google Analytics would not be able to get ISO clearance to track

all user traffic on authentication protected pages. 3 4 12

7. Google Analytics would not be able to track all user traffic on browsers with deactivated JavaScript using a php-ga server-side solution.

3 2 6

8. Google Analytics would not be able to track all user traffic across public UT hosted sites, e.g. sites utilizing different tracking accounts, using a Google Tag Manager solution.

4 2 8

9. Google Analytics would not be able to report on binary files downloaded, such as most frequently downloaded pdf files, using a Google Tag Manager solution.

1 3 3

10. Google Analytics would not be able to report on file size, such as most frequently downloaded files over a certain size, using a Google Tag Manager solution.

1 2 2

11. Google Analytics would not be able to report on click paths

across UT hosted sites using a Google Tag Manager solution. 4 2 8

12. Google Analytics would not comply with the university's Web

Accessibility Policy, www.utexas.edu/web-accessibility-policy 3 3 9 13. Google Analytics would not comply with University Campus IT

Policies and Standards, http://www.utexas.edu/cio/policies 3 4 12

14. Google Analytics would not comply with the University's Web

References

Related documents

The research questions included the motives behind online group buying, the criteria for evaluating the campaigns, methods of getting informed about the campaigns, types of products

Those tales and in kill santa claus games free jigsaw games: thongs of the screen puzzles online santa compose himself as creepy and a gun on.. Kenny is santa claus games adapted

Our conceptualization of frame of reference may be used to interpret how the opportunity to engage in self-determined behaviors (in this study, individualized goal- setting)

The Purchaser undertakes (and accepts responsibility) that, after the Closing Date (x) the respective Bornemann Entities will upon the Sellers’ written request not object to, or

We first explain the rise in social entrepreneurship in international development, and we introduce the central assumptions in the literature on how social entrepreneurs define

Detailed logic is developed between groups of activities, see the expample below and are preceded with a Start Milestone and Succeded wit a Finish Milestone... 4.3 High

여기서는 콘텐츠산업 인력과 콘텐츠직업 인력의 개념을 활용한다. 콘텐츠기업은 콘텐 츠를 직접 생산하는 기획․제작 인력과 콘텐츠 유통 및 경영 활동을 지원해주는

If you got website or web application supporting the mobile app, your best analytics tools shall be Google analytics or AdMob (extension of Google analytics) allows