Review
Educational data mining: A survey and a d
of recent works
Mat
educational data mining approaches
fold
dva
the
240
formed into an ad-hoc data base suitable to be mined. As result of the execution of statistical and clus-
tering processes, a set of educational functionalities was found, a realistic pattern of EDM approaches was
ter-bas
04) de
and di
e predictions that
s, domain
alities, and
ed in repo
modalities.
Some of the EDM trends are anticipated here. One of them cor-
responds to the standard integration of an EDMmodule to the typ-
ical architecture of the wide diversity of computer-based
educational systems (CBES). Other tendency demands that EDM
provides several functionalities during three stages of the teach-
ing-learning cycle. The first stage corresponds to the provision of
EDM proactive support for adapting the educational setting
according to the student’s profile prior to deliver a lecture. During
E-mail address: apenaa@ipn.mx
URL: http://www.wolnm.org/apa
1 AIWBES: adaptive and intelligent web-based educational systems; BKT: Bayesian
knowledge tracing; CBES: computer-based educational systems; CBIS: computer-
based information system,; DM: data mining; DP: dynamic programming; EDM:
educational data mining; EM: expectation maximization; HMM: hidden Markov
model; IBL: instances-based learning; IRT: item response theory; ITS: intelligent
tutoring systems; KDD: knowledge discovery in databases; KT: knowledge tracing;
LMS: learning management systems; SNA: social network analysis; SWOT: strengths,
weakness, opportunities, and threats; WBC: web-based courses; WBES: web-based
Expert Systems with Applications xxx (2013) xxx–xxx
Contents lists availab
Expert Systems w
journal homepage: www.e
educational systems.
managed by conventional, open, and distance educational
de Mendizábal S/N, La Escalera, Gustavo A. Madero, D.F., C.P. 07320, Mexico.
Tel.: +52 55 5694 0916/+52 55 5454 2611 (cellular); fax: +52 55 5694 0916.
tings. EDM pursues to find out patterns and mak
characterize learners’ behaviors and achievement
edge content, assessments, educational function
cations (Luan, 2002). Source information is stor
⇑ Address: WOLNM & ESIME Zacatenco, Instituto Politécnico Nacional, U.
Profesional Adolfo López Mateos, Edificio Z-4, 2do piso, cubiculo 6, Miguel Othón
0957-4174/$ - see front matter � 2013 Published by Elsevier Ltd.
http://dx.doi.org/10.1016/j.eswa.2013.08.042
Please cite this article in press as: Peña-Ayala, A. Educational data mining: A survey and a data mining-based analysis of recent works. Expert System
Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.08.042
knowl-
appli-
sitories
meaning of the traditional mining term biases the DM grounds.
But, instead of searching natural minerals, the target is knowledge.
DM pursues to find out data patterns, organize information of hidden
relationships, structure association rules, estimate unknown items’
values to classify objects, compose clusters of homogenous objects,
and unveil many kinds of findings that are not easily produced by
days, the use of DM in the education arena is incipient and gives
birth to the educational data mining (EDM) research field (Anjew-
ierden, Kollöffel, & Hulshof, 2007). As we will see in Section 2, in
a sense the first decade of the present century represents the
kick-off of EDM.
EDM emerges as a paradigm oriented to design models, tasks,
methods, and algorithms for exploring data from educational set-
1. Introduction
Data mining (DM1) is a compu
(CBIS) (Vlahos, Ferratt, & Knoepfle, 20
repositories, generate information,
discovered, and two patterns of value-instances to depict EDM approaches based on descriptive and pre-
dictive models were identified. One key finding is: most of the EDM approaches are ground on a basic set
composed by three kinds of educational systems, disciplines, tasks, methods, and algorithms each. The
review concludes with a snapshot of the surveyed EDM works, and provides an analysis of the EDM
strengths, weakness, opportunities, and threats, whose factors represent, in a sense, future work to be
fulfilled.
� 2013 Published by Elsevier Ltd.
ed information system
voted to scan huge data
scover knowledge. The
a classic CBIS. Thereby, DM outcomes represent a valuable support
for decisions-making.
Concerning education, it is a novel DM application target for
knowledge discovery, decisions-making, and recommendation
(Vialardi-Sacin, Bravo-Agapito, Shafti, & Ortigosa, 2009). Nowa-
Educational data mining approach pattern
Pattern for descriptive and predictive
approaches and 18 tools. A profile of the EDM works was organized as a raw data base, which was trans-
Alejandro Peña-Ayala ⇑
WOLNM & ESIME Zacatenco, Instituto Politécnico Nacional, U. Profesional Adolfo López
Gustavo A. Madero, D.F., C.P. 07320, Mexico
a r t i c l e i n f o
Keywords:
Data mining
Educational data mining
Data mining profile
a b s t r a c t
This review pursues a two
tional data mining (EDM) a
tent of the review based on
selection and analysis of
ata mining-based analysis
eos, Edificio Z-4, 2do piso, cubiculo 6, Miguel Othón de Mendizábal S/N, La Escalera,
goal, the first is to preserve and enhance the chronicles of recent educa-
nces development; the second is to organize, analyze, and discuss the con-
outcomes produced by a data mining (DM) approach. Thus, as result of the
EDM works, an EDM work profile was compiled to describe 222 EDM
le at ScienceDirect
ith Applications
lsevier .com/locate /eswa
s with
the student-system interaction stage, it is desirable that EDM ac-
quires log-data and interprets their meaning in order to suggest
recommendations, which can be used by the CBES for personaliz-
ing services to users at real-time. In the next stage, EDM should
carry out the evaluation of the provided education concerning:
delivered services, achieved outcomes, degree of user’s satisfac-
tion, and usefulness of the resources employed. What is more, sev-
eral challenges (i.e., targets, environments, modalities,
functionalities, kinds of data, . . .) wait to be tackled or have been
recently considered by EDM, such as: big data, cloud computing,
social networks, web mining, text mining, virtual 3-D environ-
ments, spatial mining, semantic mining, collaborative learning,
learning companions, . . .
The present work extends the period described by earlier sur-
veys, summarized in Section 2.2, that cover from 1995 up to
2009. The aim is to preserve and update the chronicles of recent
EDM development. The scope of the work is limited and provides
a partial image of the EDM activity published in all celebrated
events and available media. In spite of this, the work provides a
snapshot of the EDM labor that several members have been achiev-
ing. Inclusively, it applies the essential subject, DM, to organize,
analyze, and discuss the content of the overview. Such a policy is
approach patterns are highlighted. Finally, the conclusions
Section tailors a snapshot of the sample and a critical analysis of
the EDM arena that are useful to inspire future work.
2. Method and materials
In this section, the method and the materials of the overview
are described. The method is a framework devoted to gather and
mine EDM works. The materials tailor the survey domain through
five subjects: a reference to prior EDM reviews, the scope of the
collected EDM works, a profile of DM, a summary of CBES, and
the data representation of EDM approaches used for mining.
As a result of the method application, a sample of 240 EDM
works published between 2010 and the first quarter of 2013 was
gathered. It is made up of two sub-samples, one of 222 EDM ap-
proaches and another of 18 EDM tools (i.e., the first represents
EDM applications and the second software). The sample symbol-
izes a valuable source that is used to provide a highlight of the
EDM arena in Section 3 and a brief analysis in Section 4. Moreover,
the sample is examined to produce statistics and discover some
findings, which are illustrated in the following subsections as well
7. Ed
of Se
8
diti
ecti
2 A. Peña-Ayala / Expert Systems with Applications xxx (2013) xxx–xxx
a novelty: to preach through example.
As result of the application of such a policy, the next four con-
tributions are offered to be used by the EDM community: a DM
profile, an EDM approach profile, a pattern for EDM approaches
based on descriptive models, and a pattern for EDM approaches
based on predictive models. The first facilitates the description of
the DM baseline that supports an EDM approach. The second is
useful to define the nature and baseline of an EDM approach. The
third and four are patterns to design EDM approaches, which are
useful as a reference to develop similar versions of descriptive
and predictive models.
In this paper a survey of EDM works fulfilled from 2010 up to
2013 1st Qtr. is presented. In addition, the method followed for
producing the overview is outlined in Section 2, as well as the gath-
ered material is stated. A sample of 240 EDM works is summarized
in Section 3. Such a collection is organized according to typical
functionalities fulfilled by CBES that were found from the material.
In Section 4, an analysis of the sampled works is provided to shape
the recent status and evolution of the EDM field, and some EDM
EDM
works
Proceedings
Journals
Books 1. Selection
2. Analysis
of EDM
works
Section
3.1 to 3.7
E
S
Section
4
9.
Edition of
Section 5
EDM
snapshot
Fig. 1. Workflow of the DM approach performed to analyse, cla
Please cite this article in press as: Peña-Ayala, A. Educational data mining: A su
Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.08.042
as in Sections 3 to 5.
2.1. Framework applied for knowledge discovery of educational data
mining works
The method used to carry out this survey is a framework de-
signed to gather, analyze, and mine EDM works. It follows a work-
flow to lead the activities oriented to knowledge discovery in
databases (KDD). The workflow is split into three stages. The devel-
opment of each stage is achieved by three tasks. Thus, nine tasks
compose the whole KDD workflow pictured in Fig. 1, whose pur-
pose and outcomes are explained as follows:
The ‘‘EDM work collection’’ stage performs three tasks. The first
task seeks source references that publish EDM works. As a result, a
collection of EDM works is gathered. The second evaluates EDM
works and produces an EDM approach profile per each chosen
EDMwork. The third analyzes the EDM approach profiles and orga-
nizes a raw EDM database.
The ‘‘data processing stage’’ encompasses the tasks labeled as
fourth, fifth, and sixth in Fig. 1. The fourth task transforms the
EDM
approach
profile
3. Analysis
of profiles Raw
EDM
database
EDM
functio-
nalities
ition
ction
3
4. Pre-
processing
Ad-hoc
EDM
database
5. Statistical
process
Statistics
.
on of
on 4
6. Data
mining
process
Patterns
ssify, represent, and mine data of the EDM related works.
rvey and a data mining-based analysis of recent works. Expert Systems with
raw EDM database into an ad-hoc EDM database to facilitate statis- functionalities in CBES, and the design of techniques devoted to
EDM.
A couple of reviews appeared in 2009 to shape a state of the
235 works classified and counted in the following way: 36 in tra-
ditional education, 54 WBES, 29 LMS, 31 ITS, 26 adaptive educa-
A. Peña-Ayala / Expert Systems with Applications xxx (2013) xxx–xxx 3
tical and mining processes. The fifth performs statistical processes
to generate seven kinds of EDM functionalities to gather homoge-
neous related works and statistics. The sixth mines the ad-hoc
EDM database to find out patterns that characterize the gathered
EDM works.
The stage oriented to ‘‘edit and interpret the results’’ contains
the tasks labeled as seven to nine in Fig. 1. The seventh task clas-
sifies the EDM works according to the educational functionalities
they most focus on. As result, seven topics are organized in bal-
anced proportions of homogeneous EDM works to outline seven
Sections presented as 3.1 to 3.7. The eighth interprets the patterns
produced by the DM approach to discover relationships between
the traits value-instances that characterize the EDM approaches.
The last task analyzes the discovered knowledge from the EDM
works to tailor a snapshot of the EDM arena that is described in
Sections 2 to 5.
2.2. Previous reviews of data mining and educational data mining
As the starting point of this work, prior reviews of DM and EDM
were examined to tailor a conceptual frame about the domain
study. Therefore, five reviews are introduced in this subsection,
where one is oriented to DM, and the other four cover the period
from 1995 up to 2009.2
As EDM is based on DM, a review of DM techniques and appli-
cations achieved during 2000 to 2011 is summarized as follows.
Shu-Hsien, Pei-Hui, and Pei-Yuan (2012) present a state of the
art about DM that concerns a series of works fulfilled throughout
the past decade. The paper surveys and classifies 216 works using
nine categories that are presented with their respective counting of
works: (a) neural networks: 9; (b) algorithm architecture: 22, (c)
dynamic prediction: 17; (d) analysis of system architecture: 23;
(e) intelligent agent systems: 14; (f) modeling: 15; (g) knowl-
edge-based systems: 19; (h) systems optimization: 14; (i) informa-
tion systems: 28. The authors recognize the broad baseline that
supports DM models, tasks, methods, techniques, and algorithms.
Finally, three suggestions are made: (1) include social sciences
methodologies; (2) integrate several methodologies into a holistic
one; (3) change the policy to guide future development of DM.
Regarding EDM, Romero and Ventura (2007) present a review of
81 works published from 1995 up to 2005, where only seven cor-
respond to the 1990´s. They identify statistics-visualization and
web mining as a couple of DM techniques to classify the applica-
tion of DM to CBES. As for statistics, several tools are identified
and seven EDM works cited. Concerning visualization, four works
are referenced and one tool is recognized. Regarding web mining,
it is split into three kinds of tasks: (1) clustering, classification,
and outlier detection; (2) association rules and sequential pattern;
(3) text mining. A sample of EDM works is given for each kind of
task. However, the sample is partitioned into three variants of e-
Learning systems: particular web-based courses (WBC), well-
known learning management systems (LMS), and adaptive and
intelligent web-based educational systems (AIWBES). So, nine col-
lections of EDM works are provided in the review with the next
statistic: (a) 15 works of clustering, classification, and outlier
detection tasks split into: 3 WBC, 3 LMS, 9 AIWBES; (b) 14 papers
about association rules and sequential pattern tasks divided into: 6
WBC, 4 LMS, 4 AIWBES; (c) 7 works related to text mining parti-
tioned into: 4 WBC, 2 LMS, 1 AIWBES. As future trends, they de-
mand: friendly EDM tools for non-technical users, the
standardization of DM methods and data; the integration of DM
2 As none of the works that are cited by the five reviews is included in the
references of this paper, readers are encouraged to seek such papers to analyze the
EDM background.
Please cite this article in press as: Peña-Ayala, A. Educational data mining: A su
Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.08.042
tional systems, 23 test-questionnaires, 14 text-contents, and 22
others. Concerning EDM applications, they are gathered into eleven
educational categories with the next counting: (a) analysis and
visualization of data: 35; (b) providing feedback for supporting
instruction: 40; (c) recommendations for students: 37; (d) predict-
ing students’ performance: 76; (e) student modeling: 28; (f)
detecting undesirable student behaviors: 23; (g) grouping stu-
dents: 26; (h) social network analysis: 15; (i) developing concept
maps: 10; (j) constructing courseware: 9; (k) planning and sched-
uling: 11. At the end of the review, authors assert: ‘‘EDM is now
approaching its adolescence . . .’’
2.3. Scope of the present survey of educational data mining works
An interpretation of the four EDM reviews published up to 2010
shows that: the current century represents the start of EDM, be-
cause near to 98% of the cited works have appeared since 2000.
In consequence, EDM is living its teenage period. During its growth,
EDM has shifted from isolated papers published in conferences and
journals, to dedicated workshops,3 an international conference on
educational data mining,4 a specialized journal of EDM,5 a handbook
(Romero, Ventura, Pechenizkiy, & Ryan, 2011), and a society of ex-
perts and partisans,6 as well as one edited book (Romero & Ventura,
2006) and another in press (Peña-Ayala, 2013). This synergy reveals
the increasing interest in EDM and is the main reason to update the
review by means of the present survey.
Therefore, the scope of the current overview is constrained to a
sample of representative EDM works published in journals,7 chap-
ter books related to EDM, as well as papers presented in EDM confer-
ences and workshops. The chosen references have been published
during the period from 2010 to the first quarter of 2013. In this
way, the EDM chronicles are extended and refreshed.
3 See https://pslcdatashop.web.cmu.edu/KDD2011/
4 See http://edm2013.iismemphis.org/
5 See http://www.educationaldatamining.org/JEDM/
EDM. The first is the work made by Baker and Yacef (2009). They
celebrate the nascent EDM research community, define DM and
EDM, and provide 45 EDM references, where one corresponds to
1973, another to 1995, and one more to 1999. The review identifies
some EDM targets, such as: student models, models of domain
knowledge, pedagogical support, and impacts on learning; where
8, 4, 3, and 4 related works are respectively cited. The second re-
view published in 2009 was presented by Peña-Ayala, Domínguez,
and Medel (2009). It offers 91 references about three topics: CBES,
DM, and EDM. Concerning the former, approaches such as com-
puter-assisted instruction, intelligent tutoring systems (ITS), LMS,
and web-based educational systems (WBES) are considered.
Regarding DM, several models, tasks, and techniques are identi-
fied; where mathematical, rules-based, and soft computing tech-
niques are the target of analysis. As for EDM works, they are
organized into four functionalities: student modeling, tutoring,
content, and assessment.
The fourth EDM review corresponds to Romero and Ventura
(2010), who enhanced their prior EDM survey adding 225 works,
keeping the former seven references of the 1990s, and including
three papers published in 2010. One novelty concerns a list of
6 See http://www.educationaldatamining.org/
7 Most of the journals are indexed by � Thompson Reuters Journal Citation Reports
and are published by prestigious editorials such as: Elsevier, Springer, IEEE, and Sage.
rvey and a data mining-based analysis of recent works. Expert Systems with
2.4. Data mining in a nutshell
According to Witten and Frank (2000), DM is the process ori-
ented to extract useful and comprehensible knowledge, previously
unknown, from huge and heterogeneous data repositories. Thus,
the design of a DMwork demands the instantiation of several char-
acteristics to shape the approach, such as: disciplines to tailor the
theoretical baseline, the sort of model to be built, tasks to perform,
methods and techniques to mechanize the proposal, as well as the
algorithms, equations, and frames (e.g., data structures, frame-
works) to deploy the approach on computers and internet settings.
In consequence, this subsection is oriented to define those attri-
butes, provide some of their instances, and reveal the statistics of
their occurrence among the sub-sample of 222 EDM approaches.
(Bhattacharyya & Hazarika, 2006), and natural language (McCarthy
Table 1
Counting
本文档为【Educational data mining A survey and a data mining-based analysis - Copy】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。