Jisc case studies wiki Case studies / University of Huddersfield - Business Intelligence
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

University of Huddersfield - Business Intelligence

Visualisation of Research Strengths (VORS)

 

Case study written October 2012. 

 

Contents

 


Background

 

Prior to the inception of the VORS project at Huddersfield data visualisations of research were only produced on an as needed basis. These were produced to meet the needs of senior management on either an ad hoc basis or for periodic meetings to facilitate planning and accountability. The production of these visualisations was done manually and tended to consume considerable amount of staff time both in the gathering of data from diverse sources and in the actual manipulation of that data to produce the visualizations. A prior project at Huddersfield funded by the JISC Research Information Management programme had started to address the problems of automatically bringing together source research data onto one coherent system but that system did not address the problems of data visualisation. The primary purpose of the VORs project was to demonstrate to senior management that visualization system could provide more timely and more trustworthy summary visualizations of data at lower cost in terms of administrative staff time.

 

Aims and Objectives

 

At the onset of the project the ambition of the project was to leverage both internal data as exemplified in the stored metadata about publications present in the University’s repository along with data available  from other universities and public bodies.  We found however that there were legal restrictions on the use of ‘robots’ to automatically gather some of the external data that we had planned to use. We also felt that our initial ambitions placed us in competition with commercially available software.

 

Our initial aims and objectives would be expressed as follows:

 

Many HEIs now maintain repositories containing their researchers' publications. They have the potential to provide much information about the research strength of an HEI, as publications are the main output of research. The project aims to merge internal information extracted from an institution's publications repository with external information (academic subject definitions, quality of outlets and publications), for input to a visualisation tool. The tool will assist research managers in making decisions, which need to be based on an understanding of research strengths across subject areas, such as where to aim internal investment. In the event that the tool becomes a part of a BI resource, it could lead to institution vs institution comparisons and visual benchmarking for research.

 

The project aimed to deliver a demonstrator of  BI (the visualisation tool). The objective is to satisfy the information and decision support needs of senior research managers, in that the tool can convey an accurate visual understanding of the coverage and depth of a HEI's research when displayed on a background of the full breadth and scope of all academic research areas.

 

The demonstrator will include:

 

  1. Input from the current set of references within a publications repository, and external data
  2. A set of functions that include: 

a) the ability to drill down onto subject areas
b) the ability to view the research map historically, so that growing or dwindling research areas can be identified
c) the ability to select which combination of quality factors are used to highlight publications

 

     3. A set of characteristics that include: 

a) the utilisation of state of the art visualisation techniques

b) an innovative blend of HEI internal publication reference data with external subject ontology and outlet quality data

c) a set of interface data definitions in XML, so that the demonstrator tool may be accessed in a service oriented manner, to maximise its potential for re-use.

 

To deal with the identified problems we felt that we needed to restrict our concerns to the manipulation of purely internal data. We aimed to maintain our focus on publication data (but to consider the requirements of the planning and accountability functions) and on the requirements to monitor progress in assembling research data for eventual submission for the REF.  With this in mind a reformulation of our objectives in producing a demonstrator system would primary remove the ambition expressed in 3b above. However we now consider that the requirement for re-use expressed in 3c above needs to be more explicitly stated:

 

We regard it as of primary importance that the visualization software produced should be capable of redeployment to deal with new data sets, as management requirements change and develop, and that the redeployment should be capable of being done solely on the basis of the level of IT expertise required to produce the new data sets.

 

Project Approach

 

The project team, faced with changing requirements decided to take a two strand approach to the production of graphics/visualisation software. We would jointly develop both software conservatively based on standard business style graphics and develop more experimental 3D style graphics to try and capture data in an innovative manner. Both strands would rely on the same data gathering processes and software infrastructure. The two strand approach was designed to give us a fall-back system should the more ambitious approach turn out to be over ambitious.

 

Governance

 

 We had a project group headed by the project manager. We recently recruited a Research Assistant to assist with visualisation. We met once fortnightly, to review and evaluate progress. The PVC for Research and Enterprise is championed the project and directed change management as and when necessary.

 

Technologies and Standards used

 

Consideration was given to the use of commercial dash-board software but this was rejected both on the grounds of cost and because it would be too limiting in its support for the production of innovative graphics; though we recognised that there are always ways of innovatively deploying even standard business graphics.

 

The base technologies used were:

 

  • SQL Server databases (2005, 2008)
  • Microsoft Silverlight for web based visualisations 
  • Dot Net technologies deployed in an MVC/MVVM (Model View Controller/ Model-View View-Model) pattern to provide WEB services
  • XML to enable both the specification of data sets to be presented and to store the data sets themselves 

 

The primary data source used was the database back-end to our Research Information System known locally as RIMs (SQL Server 2008). The RIM system is responsible for integrating data drawn from the Universities Repository System along with data from HR and Active Directory. The RIM database was enhanced to meet the needs of storing the assembled data-sets to be presented to users.

 

Web services were developed using .Net technologies primarily following a MVVM (Model-View View-Model) design pattern. The web service feeds JSON (JavaScript Object Notation data,www.json.org) to the web clients conforming to a model developed for VORs this model provides a level of data independence between the client and the service provider, in that the web client does not need to have any built in knowledge of the topics inherent in the data presented. It only knows about the structure of the data, this helps us achieve our goal of a high level of re-use. The MVVM pattern also enabled us to further implement a ‘Repository’ pattern which allows the web service to be plugged into either a database storing the data-sets to be presented or to access XML files stored locally containing the data to be presented.

 

Establishing and Maintaining Senior Management buy-in

 

We have full support from our senior management on this project. We hold regular meetings to keep them updated on progress and informed of any changes. We also continue to ask for their requirements and feedback, keeping them fully involved.

 

Outcomes

 

VoRS Architecture

 

The Web server feeds data sets to the visualization software conforming to corresponding specifications as defined in linked XML specification files. The web server software and the visualization software require only that data to be displayed conforms to the matching XML specification file which in turn must conform to the VORS data visualisation specification language.

 

The VORS architecture provides us with great flexibility. The use of XML specification and data sets gives us a degree of abstraction from actual storage technologies. The base data may be stored in any technology as long as it is possible to generate XML from this source.

 

The VORS specification language allows for the development of standard web services to feed the data to client based graphical visualization software. The service needs to understand the VORS specification but not the particulars of either the data to be presented or how it is stored in the base systems.

 

The VORS specification language also allows the client graphical software to be developed independently of the details of the web service. Additionally it allows multiple styles of graphical presentations to be developed and prototyped as long as the data to be presented conforms to the specification language.

 

In this way we have produced a base front end plugin that presents data on a client’s browser using standard business graphics. We have also developed experimental 3D visualisations of the same data.

 

VORS Data Specification Language

 

To allow data independent visualisation software to be developed we define a structure that data of any sort to be displayed must conform to. The principal abstraction we have used in constructing our specification language is as follows:

 

  • Data to be present in a single instance of our visualization must be about a single type of uniquely identifiable business object
  • Business objects may have any number of properties but the properties must conform to a small range of distinct types
    • They may be dates
    • They may be simple labels
    • They may be integers
    • They may be real numbers
    • They may be objects of the type defined in (a) with the limitation that such nested objects may only occur one level deep
  • Properties may be single valued or multiple valued

 

 

The visualization software can perform two primary tasks in relation to such objects.

 

  • The objects may be counted as filtered by values of their properties and such counts may be grouped by value of their properties
  • Numeric properties of objects may be summed and averaged after being filtered and grouped by the values of their other properties

 

Examples of business objects to be visualized.

 

  • Academic members of staff having properties such as:
    • Name (single valued)
    • Date of Birth (single valued)
    • Contract start date (single valued)
    • School (single valued)
    • FTE (single valued)
    • Research Group (multi valued)
  •  Publication having properties such as:
    • Title (single valued)
    • Publisher (single valued)
    • Publication (single valued)
    • Subject (single valued)
    • Date of publication (single valued)
    • Author (multi valued)
  •  Funded Project having properties such as:
    • Title
    • Funding Body
    • Application date
    • Start date
    • End data
    • Principal Investigator
    • Value

(all single valued)

 

The instances of VORS specifications are queriable by the Client software front end. This allows for the front end software to present visualizations that are interactive. Front ends may be developed to allow the user to determine how a dataset is to be filtered and grouped and whether or not the role of the visualization is to display counts of the relevant objects or to display the results of summing or averaging of the numeric properties of the objects.

 

The user driven control of individual visualisation allows us to develop dash boards and individual visualizations in the absence of complete determination of end user requirements. As a generalization the IT department responsible for creating a dashboard need to be told the topic of concern. i.e. the objects to be explored but not the details of how the data is to be cut and sliced before display.

 

Example and Screen Shots

 

The first set of graphs are taken from the base system that implements the VORs architecture but where the visualization client plug-in is limited in its ability, display only standard business graphics.

 

Example Background

  

The example presented here has a data set defined for Academics being considered for entry in the forthcoming REF exercise. As their properties they have counts of the number of publications they have ready for inclusion graded by quality. They have also properties such as the UOA they are attached to and their school etc.

 

The toolkit generated for this dataset allows the user in the view tab to select whether or not the display should be counting the number of academics (objects) or should be displaying the results of the numeric properties, regarded as “measures”. In this example we see that the user has selected to view the measures and has requested to see the sum of the target number of publications for the academics concerned. The “Filter” and “Grouping” tabs allows for filtering and grouping.

 

 

Screen 1. Showing a single instance of the VoRS graphic embedded in a mock up of our RIMs system. The graphic includes a toolbox on the left, inside the website navigation menu, And a bar graph on the right as controlled by the selections made in the tool box. In this example the user has simply pressed go, which provides a simple count of the objects, candidates entered into the REF, belonging to the current data-set.

 

 

Screen 2. In this case the user has moved to the Grouping tab of the toolbox, and selected to group the candidates entered into the REF by UOA (Unit Of Assessment)

 


 

Screen 3. In this case the user has additionally moved to the Filter tab of the toolbox and selected to show only candidates that are classified as ‘green’ or ‘amber’.

 


Screen 4. Moving on the user has now selected the View tab and both selected to view measures of the objects i.e. some sort of performance measure of the REF Candidates, and has choosen to see the sum of the number of publications the candidates are in a position to enter into the REF. Remember the candidates are still filtered by status as either ‘green’ or ‘amber’ and obviously grouped by UOA.

 


Screen 5. More meaningfully now, the user has additionally chosen to show the target number of publications our ‘green’ and ‘amber’ candidates should be entering into the REF and this is shown as the point or scatter data items along with the columns showing the proposed currently available.

 


Screen 6. Finally the user has unselected the ‘Proposed Pubs’ measure and selected both Three and Four star (Publications) and elected to show them as a stacked bar chart.

 

The above story board example of a user viewing academics entered into the REF illustrates how easy it is for user to manipulate the VoRS client front end. Note: the above example uses mock data.

 

One of our key goals was to make the graphics easy to use as we recognised that senior management do not have the time to learn to manipulate complex software.

 

A second key goal was to make it easy for IT professionals to implement new instances of the VoRS client displaying new data. To illustrate how simple this is only two items in the XML files had to be changed to include the Target number of publications shown in the above examples as developed from an earlier example without the target data.

 

First the specification file has to have an entry defining the data field, Target Pubs:

 


 

Second every entry of a candidate’s data in the data-set file has to have an entry for the target as below:

 

Here we show the entry for one person with the target field included.

 

3D Visualisation

 

The second, more ambitious, strand of the VoRs project involved leveraging our expertise in producing 3D visualisations for computer games. The initial idea we came into the project with was that we could apply the technologies for rendering landscapes to the presentation of large data-sets. We appeared to have access to large data sets derived from the meta-data stored about academic publications in our own and other universities’ repository systems. We mention above that we had problems with access to external data but that did not present any fundamental difficulties in terms of this development strand as we still had access to our own repository data.

 

The problems we had arose primarily from a lack of fit between the nature of the data we had and the presentation medium we had chosen.

 


 

In this early example, the basic problem we struggled with throughout the project is already apparent. The graphic represents the volume of publications of different departments of the university at a specific period of time. The graphic allowed for user interaction in that they can rotate the graphic to better view a specific portion. They can also adjust the weights that represent ‘different types of publications’, ‘articles’, ‘conference proceedings’, ‘other’, which results in the sizes of the peaks growing or shrinking. This early graphic is not being properly labelled to assist in interpretation of the landscape but that is not the real problem with this visualisation.

 

The problem we faced was that the dimensions other than height did not really convey and additional meaning. The fact that point B was closer to point A than it is to point C does not mean anything. Mathematics does not have a closer relation to computing than it has to engineering. The ordering of the position of the different departments across the map is purely arbitrary, so proximity of one peak to that of another does not carry any information.

 

What we needed was some notion of progression in order to give meaning to the relative positions within a map both so that moving relatively to the right or left or moving up or down conveys some information. One such possible progression was to introduce the notions of time such that moving in one direction represents moving earlier or later in time. In this way we have the possibility of demonstrating progressions through time. But to make a map style visualisation work we need more than one meaningful dimension. If we assign time to the left right dimension we would then need to provide another dimension for top to bottom. With our data we struggled to find other worthwhile dimensions. We therefore turned to other styles of visualisation that did not require such a high degree of connectivity between the data points.

 

 

In the above graphic we have represented the different departments as concentric rings with no specific meaning to the ordering. Mouse ‘hover tooltips’ can disambiguate which ring represents which departments. Progress round the ring within a colour segment represents time and the different colour segments represent percentage contributions of different elements being counted. For example within the data-set of publications, articles, conference proceedings, other. Again hover tooltips can disambiguate the values at a specific point in the ring. The 3D interactive visualisation allowed you to rotate the disc to better get a visual impression of how things were changing within the chosen category from year to year. This visualisation clearly works better than the map visualisation but we still have some problems with it. The interactive nature of the graphic, the ability to rotate the graphic to view different portions and to rotate on the top to bottom plan to better view the relative heights of segments of the rings are very appealing but it makes it technically difficult to adequately label the rings and their segments either in a static manner or by using hover tooltips. Without adequate labelling the graphic is in danger of becoming unintelligible.

 

We tried producing a number of variations of the above graphics one of which is shown below. In this visualisation each disc represents a different department/school with the rings within each disc representing different categories of publication and the overlaid spider’s web representing the degree of connectivity between the different departments/schools. The degree of connectivity was measured by counting publications jointly authored by academics drawn from the different departments/schools.

 


 

Key lessons learned

 

The key lesson learnt in terms of our approach to the project was how valuable it was to approach the project with a fall-back strategy to ensure that the project delivered something worthwhile to the university even when our more ambitious goals turn out to be either unachievable or even partially misconceived.

 

We have also learnt that although the 3D modelling of data can be achieved in some ways it is a step too far in the current climate,.  We need to be able to embed strand one software at the University and ensure buy-in from all stakeholders before pursuing further more effective visualisations of data.

 

Looking ahead

 

We intend to use the strand one software developed for the VoRs project within the universities RIM system. We believe that that can be deployed using the IT skills readily available with current staff. Developing the software further would require commitment to staffing and no decisions have been made about either the need or viability of doing that.

 

We already have the buy-in of senior management at the University including the Vice-Chancellors office, Deans of Schools and the Director of Research and Enterprise to ensure that further developments and implementations to the VoRs project will exist outside of this project scope and that VoRs will be embedded within the working practices of the University and will become an integral tool for Research Information Management at the University.

 

Summary and reflection

 

At the start of the project the aim of the project was to deliver a demonstrator BI tool which embodies elements of the JISC BI maturity model up to level 6 and which satisfies the information and decision support needs of senior research managers, in that it can convey an accurate visual understanding of the coverage and depth of a HEI’s research when displayed on a background of the full breadth and scope of all academic research areas.

 

We have implemented a tool that can now be embedded within the RIMS system at the University and will become an integral tool for senior management decision making within the REF process and beyond.

 

The life of this project will continue on for years to come as more and more developments are made and implemented for key stakeholders so that we will be at a BI maturity level of 6 for a sustained period of time and will be able to measure the benefits from this initial project.