Jisc case studies wiki Case studies / The Open University - Business Intelligence
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

The Open University - Business Intelligence

RETAIN: Retaining Students Through Intelligent Interventions

 

Case study written October 2012.

 

Contents

 


Background

 

The business case

 

The main predicted benefit of the RETAIN project is to improve student retention at the Open University (OU). This is a key area where many universities can make financial savings.  Approximately 30% of students fail to complete a module that they are registered for.  Some of these students transfer to other courses, whilst others have reasons for leaving that cannot be addressed through better support.  However, approximately 30% of these students could potentially be retained through better support and intervention. Based on an average cost to the OU of a non-completing student, the project cost could be recouped by retaining 50 students a year for 3 years.

 

Other key aims are improved efficiency of the Associate Lecturers (ALs) who are responsible for groups of students, through more strategic use of their time and resources. Also, greater student satisfaction by providing better student support and through using tools to improve module design and understand where difficulties in a module generally occur. This in turn will lead to reputational gain for the OU, which is previously known to have high attrition rates, which will be improved by giving better support to students.

 

Aims & objectives

 

The RETAIN project aims to help the OU in it’s strategic aim to improve student retention by providing tools to help predict which students are having difficulties with their study.

 

The project aims to utilise as many available sources of student data to predict those students who are at risk of failing or not completing a module. The main data sources are the Virtual Learning Environment (VLE) data which records all the clicks that each student makes on course materials, forums and quizzes and the static data, which includes the students’ achievements on continuous assessments and their overall results.

 

The results will be presented through a dashboard, through which ALs will be able to view data for their individual sets of students. This will show who is doing well and who is doing badly, according to the predictors of success or potential failure. ALs will be able to choose how and when to intervene with students who are identified as potentially at risk, and to mark these interventions and what form they took (e.g. email, telephone call, online chat). In the future, this will make it possible to try to identify the extent to which interventions are effective.

 

A secondary aim is to demonstrate how aggregated data from the same sources can be presented through the same dashboard for module managers, to help them plan for future module deliveries. This will include highlighting points in the course where failing students disengage. This could indicate Tutor Marked Assessments (TMAs) that are too difficult, or course materials that are not easily accessed, or do not support a task. 

 

Context

 

The OU, like many Higher Education institutes, is interested in using Business Intelligence both to make processes more efficient and to improve the student experience. One key issue for the OU is that as a provider of distance education, the contact between students and their lecturers is very minimal. This can make it difficult for a lecturer to identify developing issues with a student and take steps to intervene. On the other hand, OU modules are increasingly delivering course materials through the VLE and including interactive activities such as forums and quizzes. This provides a good opportunity for learning about a student’s behaviour by analysing their interaction with the VLE. However, the VLE dataset is vast and it is not integrated with the other sources of student data. This has been a significant challenge to the project.

 

Key drivers

 

The key drivers for the RETAIN project are the need to improve student retention, to save money by allowing tutors to target their interventions more effectively to the students that need them, and to provide a better service to paying students.

 

Project approach

 

The main stages of the project are as follows:

 

1   Requirements gathering from the target user group, to ensure that their needs are realised through the trial demonstrator.

2   Develop predictive methods from 3 historical trial data sets, of both VLE and static data sources.

3   Validate the methods using further data sets.

4   Develop a dashboard for viewing data, integrating in the first place with the trial data sources.

5   Trial the dashboard with the target users

6   Develop a strategy for the real-time use of data from both VLE and static data sources.

 

Stages 1 and 6, in particular, are ongoing throughout the project.

 

Scope

 

The project aims to use historic student data sets to develop and test methods for predicting which students are at risk of failing a module. The findings from this data should allow prediction to be made on ‘live’ unknown data, based on patterns in user’s online activity on the VLE and other data such as their performance on assessments. The findings from the historical data, of itself, will be valuable to the module design teams especially for the modules involved. However, it is not until the system can access live data that the real potential can be tapped.  While it is not within the scope of the project to deliver the full live version, the aim is to make sufficient improvements and recommendations towards making this possible in the near future.

 

Technologies & standards used

 

In the initial stages of the project, rapid prototyping was conducted using LISP, interfaced with SQL, to build decision trees to predict from the available data. These tools were exported to Java. Later development was conducted using WEKA (Waikato Environment for Knowledge Analysis). The web interface was developed using Java and javascript, also interfaced to SQL for the data. The historical data extracts were provided as .cvs files.

 

Establishing & maintaining senior management buy-in  

 

The project has had the support of senior management right from the start of the project. It has been easy to approach senior management and get support and input into the proposed ideas, at all stages of the project.  The subjects of Business Intelligence, retention and learning analytics form part of key strategies of the Open University. During the course of the project there have been efforts to consolidate the work that has been happening around the OU and to bring everyone together as part of one group, so as to coordinate effort and to have a more focused approach to resolving some of the key issues related to getting BI working in the OU. Many of these issues relate to data integration and data consistency. 

 

Of particular note is a data warehouse initiative which has commenced during the RETAIN project. The aim of this is to provide a single point of access to different data sources, which will benefit not only the deployment of the RETAIN results but also aid future additional development and integration with other BI functions and initiatives at the OU. One drawback has been that since the project started it has become more difficult to obtain necessary data for development and testing, since the preference is to allocate the limited resources towards completing the data warehouse, which will then be the usual route for obtaining the data. However, the data warehouse timescale does not match with the project timeframe.

 

Outcomes

 

Assessment of the JISC infoNet BI maturity levels, with reference to the 6 BI implementation issues where relevant

 

To understand the factors affecting the OU’s data maturity, we will first examine some relevant BI implementation issues.

 

Data Definition

 

The biggest challenge has been – and still is – the issues of data consistency. During the course of the project it has become apparent that there is no centralized, formal documentation of the different sources of data. The knowledge about data is dispersed across different people in different departments. Apparent discrepancies between data sources may have (and usually turn out to have) logical explanations. For example, a student might be registered on a course, but if they don’t click in the VLE then they don’t appear in that data set. However, in order to have confidence in any developed methods on the data, it is essential to fully understand exactly where data originates from and the circumstances under which it is stored.

 

It is necessary to start developing a uniform representation of the meaning of the data, which eventually might form a policy of data usage. This must describe for each source of data:

 

  • Who is responsible for it.
  • What data is stored in that database: e.g. registered students, students who have clicked at least once on a VLE.
  • How does data get into a database: e.g. automatically, via a students clicks on the VLE, or entered by a telephone operator who chooses a code category.
  • What is the precise meaning of all data stored. What are the possible categories of data, what is the possible range of values, what is meant by each.

 

Data usage

 

Data usage is dependent not only on a good data definition but also on good access to data. Like many institutions, the OU maintains many disparate data sources, with responsibility for maintaining and protecting them being divided amongst several departments. Current tools that make use of student data to present statistics and assist module managers in making strategic decisions rely on uploads of the relevant data. Currently, this does not include the regular inclusion of VLE data, which obviously affects the ability to fully integrate the results of the RETAIN project into the standard reporting processes. It also impacts on any future development of learning analytic tools that might make use of this, or other non-integrated data sources.

 

The OU is now seeing a move towards a more centralized data-warehouse and improved data definition. This is with the explicit aim of improving the infrastructure for learning analytics and business intelligence. The aim is to integrate this with a single interface for accessing data and running reports, tailored to user needs. Part of this move is being informed by the RETAIN project outputs. This will significantly advance the OU’s maturity level.

 

Process improvement

 

Data definition ensures that there is a unified understanding of the data and what it means. Process improvement can assist the OU’s maturity by ensuring firstly that data is entered and stored correctly in accordance with the data definition and secondly that there is a clear process for utilising the outputs of projects like RETAIN and allowing future development to take place. To this end, a clear process has been defined whereby the RETAIN project has supplied the data requirements to the data warehousing team, based on the project results. This will allow the data warehousing team to limit the amount of VLE data that needs to be available in the data warehouse. This data will be available for the predictive modelling tools to analyse. The outputs, e.g. predicted ‘at risk’ students, will be visualised through the common interface that is being developed. The data warehousing team will also ensure that larger data sets are provided for future development and refinement of the predictive models. This analysis might suggest additional or reduced data to be made available in the warehouse in a future iteration of upgrading the predictive modelling functions.

 

Change Management

 

Handling the transition between one way of working to another is of key concern to ensuring proper uptake of the developed tools. This includes not only the processes required to ensure that the correct data is available when needed, but also to ensure that end-users make use of the developed tools. In this case, it is ensuring that ALs and module managers can get benefit from the RETAIN project outputs.

 

At the start of the project we received strong verbal support from senior management to the project aims. We also brought on board the head of the Student Support Review, as a key person for ensuring uptake of developed technologies to the target users. The Student Support Review have been extremely beneficial for informing the direction the project should take to best assist users, for targeting appropriate users for the requirements gathering and will help with evaluating the developed prototype. They are responsible for trialling ways to better support students.

 

If the RETAIN outputs are successful through this route then this will help the Business Intelligence system to evolve and grow, by engaging users to provide more input regarding their needs and suggest new ways or functions that can be provided in the future. The RETAIN project has already seen a lot of evidence of this happening during the course of the project while demonstrating the developing technologies around the OU.

 

Maturity Model

 

During the bidding process for the project, a benchmarking was done to assess the OU’s level of maturity against the JISC maturity model. This benchmarking involved speaking with many of the people who had responsibility for the data sources that would be used in the project, as well as the team who currently worked with the data and who understood very well the possible issues with the data. Based on this assessment, we placed the OU’s maturity at level 2.

 

However, taking into consideration the issues outlined above under data definition and data usage, it is apparent that when looking at the individual data sources as a whole, that the situation is somewhat less clearcut than we were initially led to believe. The issue is one of integrating the sources. While there is a good level of maturity individually for each system, there is little understanding of how the data sources combine. Essentially since documentation of each individual system is lacking (the knowledge tends to be stored locally with the people in the departments) then it becomes difficult to know how to integrate sources. This impacts on the ability to evolve the system according to user requirements.

 

The project as a whole has accomplished a great deal in raising awareness of the critical issues for achieving greater maturity in key areas. There is good reason to believe that by the end of the project there will be some integration of VLE data and static data for the purpose of reporting, and plans for wider scale deployment. This is helped by the newly formed OU learning analytics team, with whom we have been working closely. This is part of a wider Management Intelligence strategy at the OU to improve data for the purposes of better reporting for learning analytics and BI. The learning analytics team will be specifically looking at the data warehousing issue of VLE data with other data sources, including issues of data modelling and developing a data dictionary. A VLE-tracking tool developed by the data-warehouse team is due to go live, this will allow collection of more module specific data that can be used for improving the predictive models. Combined with a uniform, centralized representation of data this should present a clearer and therefore more usable picture of the data. This will get the OU to level 3 of the maturity model. However, the trial application will target level 6, with concrete plans for how to achieve this by further development. 

 

Details, with screenshots, of data visualisations/dashboards linked to common staff roles (eg 'dashboards for PVCs or Heads of Departments'), operational areas (eg 'planning') or specific business requirement (eg. HESA returns or the REF)

 

The achievements of the project are in 3 areas. Firstly, in the development of accurate predictive models for identifying students who are at risk of failing. Secondly, in the visualisation of this data through a dashboard. Thirdly, in producing key recommendations for improving infrastructure to advance the OU’s maturity for Business Intelligence and for designing modules to take best advantage of the opportunities to apply predictive models.

 

Executive Summary of predictive modelling results

 

Experiments with developing predictive models in the context of RETAIN project pursued two main goals:

 

  • testing the suitability of using VLE data to predict students’ future performance, in particular, tendency to drop out of the course.
  • discovering patterns in the data related with the drop in student’s performance, which can be used by tutors to contact such students proactively and provide assistance, if necessary.

 

The experiments are described in detail in the Appendix. The following presents a summary of the findings.

 

The experiments have confirmed that VLE data can serve as a valuable indicator of expected student performance. Moreover, specific patterns in the VLE data are useful for predicting the moment when the student either decides to leave the course or loses the motivation to continue: in particular, a student who started the course actively engaging with the VLE system and then stopped interacting with is likely to be “at risk”. Detecting such moments early in the process can potentially help the tutors to contact the students and provide necessary assistance that can ultimately improve student retention. Interestingly, it is also easier to recognise the students which decide to drop out after the first TMA than those who make this decision later in the course. A possible reason for this is that in later stages, students are more likely to drop out for unexpected personal reasons:  nothing can be found in their data that will indicate these types of problem starting to occur and in fact, it is less likely these students would be retained even with intervention. This finding suggests a strategy of allocating more tutor resources for student support at the beginning of the course.

 

These experiments can be seen as a starting point to developing predictive models for student performance using VLE data. For the future work, two directions are particularly promising to explore:

 

  • Including available information about a specific student: e.g., performance in the past courses, registration data, etc.
  • Including course-specific information: e.g., interdependencies between TMA assignments, specific VLE content that the tutors expect to be accessed in order to prepare for an assignment, etc.

 

Dashboard

 

The dashboard is designed to be extensible to allow the integration of future work on improving or extending the predictive modelling tools. Therefore, the aim was to provide an interface that could not only visually flag students who are at risk, but make it easy to understand what factors have been taken into consideration when reaching that conclusion. This allows the ALs to bring some of their own knowledge to bear when deciding what – if any – action to take for students. Through the dashboard, ALs can view their students in a table and choose how to sort them, for example by high risk to low risk (fig 1). The risk is indicated by a set of coloured dots, each representing a different type of risk. They follow a traffic light convention, where green is no risk, amber is medium and red is high. 

 

Figure 1 – Viewing a list of students in the dashboard, different risk factors are shown in the right hand column.

 

The AL can explore the data visually through a pie chart showing the percentage of students in each risk category (fig 2). By clicking on a section of the pie chart, the AL is shown a bar graph of the different types of risk factor in that risk category and the number of students who fall into that category.

 

 

Figure 2 – An overview of a student group, indicating how many are deemed at medium or high risk

 

Individual student statistics are also available (fig 3). These show the TMA performance of the student and their clicking behaviour, in both cases comparing them to the average student.

 

Figure 3 – individual student data

 

The dashboard also includes an admin page, where it is possible to select module data from which to train a predictive model, as well as the type of model to train it on (fig 4). Once the model has been trained for a module it can be applied, which means that the output of the model appears in the dashboard for the students in that module.

 

Figure 4 – building new predictive models from data

 

Recommendations

 

The project has identified some general improvements that need to be made to facilitate future development and deployment of BI at the OU. Based on this, two key recommendations have been drawn up.

 

Recommendation 1

The first recommendation is to improve the consistency of data definitions, for the reasons discussed in the section on data definition. This recommendation has been discussed with the learning analytics team who are responsible for the data warehouse and who, as part of their remit, will be applying data modelling and developing a data dictionary.

 

Recommendation 2

The second recommendation is to understand the parameters of module design and to formally capture the differences between modules, in terms of how the VLE is utilised, how the module is marked, and the TMA profile. This can be used to improve the efficacy of the predictive modelling output by including domain knowledge from the modules, such as knowing if and when there are critical VLE activities to be completed by students, so that this non-engagement criteria can be used to supplement other risk factors. A VLE-tracking tool that is being developed in another project at the OU will soon begin to gather some of this data.

 

Key lessons learned

 

We understood that data harmonisation would be a key focus of the project, as this is something that impacts heavily on any BI project. However, we had underestimated the extent of inconsistency in the data sources and the efforts that would be required to try and understand the complex issues and to try to find solutions.

 

There have also been unanticipated problems with access to data, despite a general willingness and even an outright commitment from the OU to provide the data for the project. This is due to limitations on people’s time and resources. Since the upcoming data warehouse will ultimately provide the necessary data, it has not been considered efficient to find interim measures solely for the purpose of testing the RETAIN outputs in the lifetime of the project. This has restricted the quantity of data available for testing the outputs. This could have been avoided by obtaining a larger data set right at the start of the project, rather than assuming the data would be as easily available down the line.

 

We have also learned that as a project like this progresses it is very beneficial to keep disseminating results to relevant departments within the institution and in return to find out what everyone else is doing. As the project has progressed the OU has become much more active in dealing with many of the issues raised by the RETAIN project, as awareness of the benefits of improving data for learning analytics and Business Intelligence has grown as a whole. By making the RETAIN project activities known, it has been possible to integrate with the work that is being done. This is extremely beneficial for improving the prospect of future uptake of the project outputs. 

 

In developing the predictive models, a certain amount of trial and error is required for determining which parameters are most informative for building the model. However, the data sets are very large and contain a lot of superfluous data. In order to reduce the guesswork, a key finding has been that it is beneficial to gather input from many diverse sources. This has entailed not just speaking to the people who regularly work with the data and produce reports based on historical data, but also utilising the knowledge and intuition of module managers and lecturers.

 

Looking ahead

 

Working with the data-warehousing team there is a clear process being proposed. The data-warehouse will in the first instance integrate all data sources needed for implementing the current predictive tools. The VLE data will be updated on a daily basis, which has been ascertained to be a reasonable interval for providing up-to-date risk assessments on students.

 

The project outputs were trialled using a purpose built dashboard. However, during the course of the project another OU initiative has begun which is looking to provide a single access point for data with a consistent look and feel. Therefore, a new direction has been taken which will involve integrating the project outputs into this dashboard in order to ensure uptake.

 

This integrated approach will represent a significant step for the OU in terms of moving to a new BI maturity level. RETAIN has had an impact in raising awareness in the OU both of the benefits in improving BI maturity and the issues that need to be addressed in order to do so. The RETAIN project was a key component of the submitted plan for funding of the data warehouse. This work has fed directly into the plans for new dashboard approach.

 

Future work on improving the predictive tools will feed requirements to the data-warehousing team who can in a future iteration change the data that is available accordingly. This is consistent with level 5 of the JISC maturity model.

 

Future development will require on some level of funding. Opportunities for this will be sought both internally and externally.

 

Summary and reflection

 

The RETAIN project aimed to develop tools for predicting students at risk of failing their OU module, combining data from VLE’s and static data sources. The developed methods would be validated on a subset of OU data, with plans for mainstreaming the technology being investigated but not implemented within the project timescale. A trial dashboard would show how the developed predictive models could be applied to ‘live’ student data to provide up-to-date reports on students, identifying those who may benefit from an intervention. This dashboard would provide information at different levels of detail, for different types of staff.

 

These aims have been clearly met. Furthermore, the evaluation of the predictive models has shown them to give good accuracy. The best prediction has been found when comparing a students behaviour in the VLE against their previous activity patterns. A student who starts out clicking generally continues to click. As their clicking activity declines, this is a good indication that they are becoming disengaged or have encountered a problem.  On the data studied, this finding was applicable across different modules and so can be used as a good baseline starting point for investigating student behaviour with regard to retention. Still, the developed methods provide good scope for refinement, for example the inclusion of module specific knowledge. Some of this initial work has yielded interesting results and hopefully there will be continued support to continue it in the future.

 

The OU is a large institution and, like other institutions, must ensure that all resources are used efficiently and are cost-effective. This is the key driver for projects like RETAIN, which endeavour to produce cost-saving benefits by using the time and resources of Associate Lecturers more efficiently and to ultimately improve student retention. However, in the process of developing and trying to get tools like RETAIN formally integrated with existing OU systems, there are a lot of considerations such as the time and resources it takes from staff to get it up and running, including the senior management, the module teams, the IT staff, the student support teams, the technical development staff and the design teams. However, despite all of these considerations, the most notable aspect of the RETAIN project has been the overall willingness and enthusiasm for the project and the commitment to try and find ways to both trial and further integrate the results.

 

Click here for a summary video of the project.

 

APPENDIX

 

Predictive modelling Tools

 

Setup

In our initial studies we developed models taking into account two kinds of features: students’ performance scores for different TMA assignments and their engagement with the VLE estimated by the number of visited pages on the web site. The models we tested focused on predicting two kinds of students’ results:

  • Performance drop. This involves predicting whether a student who has performed well until then will have the performance fall at the next assignment.
  • Final outcome. This includes predicting the final outcome of the course for a student while possessing only information about assignment scores and VLE clicks until some point in the course timeline (submission of a specific TMA ).

 

For each of these types of prediction, two kinds of models were tested:

  • Classification, aimed at predicting one of the two types of result: “pass” or “fail”. For the performance drop model, the class of interest involves students which used to have scores above the pass level for earlier assignments but are likely to fail the next one: either by receiving a low score or by not submitting.
  • Regression, in which the model must try to predict the actual score for the next assignment or for the overall course.

 

The performance drop prediction model uses as its features the students’ assignment scores and VLE clicks within a time window . The features for the performance drop model include the scores for TMAs t – k …t - 1 and number of VLE clicks in each period between TMAs t – k …t. The feature to predict involves either the nominal class label (“drop”/”no-drop”) or the actual score for the next TMA.

 

The final outcome prediction model uses the scores for TMAs 1. . t - 1, the average TMA score, and VLE clicks in periods between submission dates of each two subsequent TMAs 1. . t (t = 2, 3, 4 were used). 

 

Experiments

 

Training and testing was performed with historical data from 3 courses from different domains: math (MU123), business (B201), and arts (AA100). Models trained on this data were evaluated using 10-folds cross-validation. For both types of models, decision trees were found to outperform other kinds of learning algorithms.

 

Table 1: Predictive model results for the performance drop classification task for different values of the window size k.

Window size k

1

2

3

 

p

r

F1

p

r

F1

p

r

F1

MU123

tma+vle

0.90

0.86

0.88

0.91

0.87

0.89

0.90

0.88

0.89

vle

0.82

0.86

0.84

0.81

0.88

0.84

0.81

0.88

0.84

tma

0

0

0

0

0

0

0

0

0

AA100

tma+vle

0.77

0.43

0.55

0.75

0.50

0.60

0.77

0.50

0.61

vle

0.70

0.33

0.45

0.71

0.38

0.50

0.70

0.38

0.50

tma

0

0

0

0

0

0

0

0

0

B201

tma+vle

1.00

0.90

0.94

0.99

0.90

0.94

0.99

0.90

0.94

vle

0.80

0.88

0.84

0.80

0.88

0.83

0.80

0.86

0.83

Tma

0

0

0

0.63

0.08

0

0.63

0.08

0.15

 

Table 1 shows the results obtained by the performance drop classification model for 3 different courses given different values of the window size k. As our goal is primarily to recognise the “at risk”students, the results a measured using the standard precision, recall, and F1 measures for the class “drop”[1]. With the window size 3, the performance drop classifier was able to achieve high precision for all three courses (between 0.77 and 0.98) and good overall accuracy (F-measure between 0.61 and 0.94). Interestingly, number of VLE clicks before the TMA being predicted was found to be the most informative attribute: a student who used to work with the VLE before but then stopped is likely to fail at the next TMA. Thus, even with the time window size k = 1 was sufficient to build the model, while increasing it could only lead to marginal increase in performance. Information about actual TMA assignment scores could only be used as complementary to VLE: it could improve the F-measure, but was insufficient for building a reliable model by itself.

 

Table 2: Predictive model results for the overall outcome classification task for different values of the threshold time point t.

TMA period t

2

3

4

 

p

r

F1

p

r

F1

p

r

F1

MU123

tma+vle

0.90

0.44

0.59

0.69

0.52

0.59

0.38

0.16

0.22

vle

0.90

0.44

0.59

0.66

0.54

0.59

0.00

0.00

0.00

tma

0.59

0.09

0.15

0.44

0.18

0.25

1.00

0.08

0.14

AA100

tma+vle

0.73

0.21

0.32

0.65

0.37

0.47

0.74

0.33

0.46

vle

0.87

0.10

0.18

0.78

0.29

0.43

0.69

0.17

0.27

tma

0.62

0.14

0.23

0.59

0.09

0.15

0.50

0.15

0.23

B201

tma+vle

0.79

0.54

0.64

0.63

0.49

0.55

0.65

0.49

0.56

vle

0.78

0.52

0.62

0.61

0.52

0.56

0.77

0.36

0.49

 

tma

0.56

0.49

0.52

0.72

0.31

0.43

0.60

0.38

0.47

 

Similarly, the final outcome classification model (Table 2) usually achieved high precision (above 0.73 for t = 2) which, however, dropped with the increase of t: i.e., it was easier to recognise the students dropping out at the beginning rather than those who already managed to progress more towards the end of the course. Again, an interesting result was that VLE clicks were found to be more informative than TMA scores: the F1 measure achieved by models built using only VLE clicks (“vle”) usually was higher than the one achieved using only TMA scores (“tma”).

 

Table 3: Results for the regression model predicting the next TMA score for different values of the window size k measured using the mean absolute error (MA) and the root of the mean squared error (RMS).

Window  size k

1

2

3

 

MA

RMS

MA

RMS

MA

RMS

MU123

7.70

12.28

7.37

11.92

7.40

11.96

AA100

10.10

14.66

9.61

14.15

9.46

13.98

B201

8.21

12.34

7.93

11.88

7.82

11.79

 

Table 4: Results for the regression model predicting the overall score for different values of the threshold time point t.

TMA t

2

3

4

 

MA

RMS

MA

RMS

MA

RMS

MU123

12.17

17.99

7.16

11.24

3.27

5.15

AA100

15.77

21.75

11.67

17.52

7.97

12.94

B201

13.76

19.45

10.62

15.55

8.37

12.56

 

Tables 3 and 4 show the results achieved by regression models trying to predict the score for the next TMA and the overall score for the course. Given that both TMA and the overall scores are measured on the scale between 0 and 100, results with the average error of about 10 can be informative indicators of expected student’s performance. Given that the overall score partially depends on the TMA average, the predictions were more precise with the time. Moreover, we found that many erroneous overall score predictions were related to cases where a student who performed well in the beginning decided to leave the course at some point: such situations cannot be predicted using available data if the decision is made after the point t.

 

Figures 5 and 6 show the results in graph form.

 

Figure 5 – Predicting performance drop for 3 modules

 

Figure 6 – predicting the final outcome for 3 modules

 

--------------------------------------------------------------------------------------------------------------