首页 Educational data mining A survey and a data mining-based analysis - Copy

Educational data mining A survey and a data mining-based analysis - Copy


Educational data mining A survey and a data mining-based analysis - Copy Review Educational data mining: A survey and a d of recent works Mat educational data mining approaches fold dva the 240 formed into an ad-hoc data base suitable to be mined. As result of the execution of statistical and clus- tering processes, a set of ...

Educational data mining A survey and a data mining-based analysis - Copy
Review Educational data mining: A survey and a d of recent works Mat educational data mining approaches fold dva the 240 formed into an ad-hoc data base suitable to be mined. As result of the execution of statistical and clus- tering processes, a set of educational functionalities was found, a realistic pattern of EDM approaches was ter-bas 04) de and di e predictions that s, domain alities, and ed in repo modalities. Some of the EDM trends are anticipated here. One of them cor- responds to the standard integration of an EDMmodule to the typ- ical architecture of the wide diversity of computer-based educational systems (CBES). Other tendency demands that EDM provides several functionalities during three stages of the teach- ing-learning cycle. The first stage corresponds to the provision of EDM proactive support for adapting the educational setting according to the student’s profile prior to deliver a lecture. During E-mail address: apenaa@ipn.mx URL: http://www.wolnm.org/apa 1 AIWBES: adaptive and intelligent web-based educational systems; BKT: Bayesian knowledge tracing; CBES: computer-based educational systems; CBIS: computer- based information system,; DM: data mining; DP: dynamic programming; EDM: educational data mining; EM: expectation maximization; HMM: hidden Markov model; IBL: instances-based learning; IRT: item response theory; ITS: intelligent tutoring systems; KDD: knowledge discovery in databases; KT: knowledge tracing; LMS: learning management systems; SNA: social network analysis; SWOT: strengths, weakness, opportunities, and threats; WBC: web-based courses; WBES: web-based Expert Systems with Applications xxx (2013) xxx–xxx Contents lists availab Expert Systems w journal homepage: www.e educational systems. managed by conventional, open, and distance educational de Mendizábal S/N, La Escalera, Gustavo A. Madero, D.F., C.P. 07320, Mexico. Tel.: +52 55 5694 0916/+52 55 5454 2611 (cellular); fax: +52 55 5694 0916. tings. EDM pursues to find out patterns and mak characterize learners’ behaviors and achievement edge content, assessments, educational function cations (Luan, 2002). Source information is stor ⇑ Address: WOLNM & ESIME Zacatenco, Instituto Politécnico Nacional, U. Profesional Adolfo López Mateos, Edificio Z-4, 2do piso, cubiculo 6, Miguel Othón 0957-4174/$ - see front matter � 2013 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.eswa.2013.08.042 Please cite this article in press as: Peña-Ayala, A. Educational data mining: A survey and a data mining-based analysis of recent works. Expert System Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.08.042 knowl- appli- sitories meaning of the traditional mining term biases the DM grounds. But, instead of searching natural minerals, the target is knowledge. DM pursues to find out data patterns, organize information of hidden relationships, structure association rules, estimate unknown items’ values to classify objects, compose clusters of homogenous objects, and unveil many kinds of findings that are not easily produced by days, the use of DM in the education arena is incipient and gives birth to the educational data mining (EDM) research field (Anjew- ierden, Kollöffel, & Hulshof, 2007). As we will see in Section 2, in a sense the first decade of the present century represents the kick-off of EDM. EDM emerges as a paradigm oriented to design models, tasks, methods, and algorithms for exploring data from educational set- 1. Introduction Data mining (DM1) is a compu (CBIS) (Vlahos, Ferratt, & Knoepfle, 20 repositories, generate information, discovered, and two patterns of value-instances to depict EDM approaches based on descriptive and pre- dictive models were identified. One key finding is: most of the EDM approaches are ground on a basic set composed by three kinds of educational systems, disciplines, tasks, methods, and algorithms each. The review concludes with a snapshot of the surveyed EDM works, and provides an analysis of the EDM strengths, weakness, opportunities, and threats, whose factors represent, in a sense, future work to be fulfilled. � 2013 Published by Elsevier Ltd. ed information system voted to scan huge data scover knowledge. The a classic CBIS. Thereby, DM outcomes represent a valuable support for decisions-making. Concerning education, it is a novel DM application target for knowledge discovery, decisions-making, and recommendation (Vialardi-Sacin, Bravo-Agapito, Shafti, & Ortigosa, 2009). Nowa- Educational data mining approach pattern Pattern for descriptive and predictive approaches and 18 tools. A profile of the EDM works was organized as a raw data base, which was trans- Alejandro Peña-Ayala ⇑ WOLNM & ESIME Zacatenco, Instituto Politécnico Nacional, U. Profesional Adolfo López Gustavo A. Madero, D.F., C.P. 07320, Mexico a r t i c l e i n f o Keywords: Data mining Educational data mining Data mining profile a b s t r a c t This review pursues a two tional data mining (EDM) a tent of the review based on selection and analysis of ata mining-based analysis eos, Edificio Z-4, 2do piso, cubiculo 6, Miguel Othón de Mendizábal S/N, La Escalera, goal, the first is to preserve and enhance the chronicles of recent educa- nces development; the second is to organize, analyze, and discuss the con- outcomes produced by a data mining (DM) approach. Thus, as result of the EDM works, an EDM work profile was compiled to describe 222 EDM le at ScienceDirect ith Applications lsevier .com/locate /eswa s with the student-system interaction stage, it is desirable that EDM ac- quires log-data and interprets their meaning in order to suggest recommendations, which can be used by the CBES for personaliz- ing services to users at real-time. In the next stage, EDM should carry out the evaluation of the provided education concerning: delivered services, achieved outcomes, degree of user’s satisfac- tion, and usefulness of the resources employed. What is more, sev- eral challenges (i.e., targets, environments, modalities, functionalities, kinds of data, . . .) wait to be tackled or have been recently considered by EDM, such as: big data, cloud computing, social networks, web mining, text mining, virtual 3-D environ- ments, spatial mining, semantic mining, collaborative learning, learning companions, . . . The present work extends the period described by earlier sur- veys, summarized in Section 2.2, that cover from 1995 up to 2009. The aim is to preserve and update the chronicles of recent EDM development. The scope of the work is limited and provides a partial image of the EDM activity published in all celebrated events and available media. In spite of this, the work provides a snapshot of the EDM labor that several members have been achiev- ing. Inclusively, it applies the essential subject, DM, to organize, analyze, and discuss the content of the overview. Such a policy is approach patterns are highlighted. Finally, the conclusions Section tailors a snapshot of the sample and a critical analysis of the EDM arena that are useful to inspire future work. 2. Method and materials In this section, the method and the materials of the overview are described. The method is a framework devoted to gather and mine EDM works. The materials tailor the survey domain through five subjects: a reference to prior EDM reviews, the scope of the collected EDM works, a profile of DM, a summary of CBES, and the data representation of EDM approaches used for mining. As a result of the method application, a sample of 240 EDM works published between 2010 and the first quarter of 2013 was gathered. It is made up of two sub-samples, one of 222 EDM ap- proaches and another of 18 EDM tools (i.e., the first represents EDM applications and the second software). The sample symbol- izes a valuable source that is used to provide a highlight of the EDM arena in Section 3 and a brief analysis in Section 4. Moreover, the sample is examined to produce statistics and discover some findings, which are illustrated in the following subsections as well 7. Ed of Se 8 diti ecti 2 A. Peña-Ayala / Expert Systems with Applications xxx (2013) xxx–xxx a novelty: to preach through example. As result of the application of such a policy, the next four con- tributions are offered to be used by the EDM community: a DM profile, an EDM approach profile, a pattern for EDM approaches based on descriptive models, and a pattern for EDM approaches based on predictive models. The first facilitates the description of the DM baseline that supports an EDM approach. The second is useful to define the nature and baseline of an EDM approach. The third and four are patterns to design EDM approaches, which are useful as a reference to develop similar versions of descriptive and predictive models. In this paper a survey of EDM works fulfilled from 2010 up to 2013 1st Qtr. is presented. In addition, the method followed for producing the overview is outlined in Section 2, as well as the gath- ered material is stated. A sample of 240 EDM works is summarized in Section 3. Such a collection is organized according to typical functionalities fulfilled by CBES that were found from the material. In Section 4, an analysis of the sampled works is provided to shape the recent status and evolution of the EDM field, and some EDM EDM works Proceedings Journals Books 1. Selection 2. Analysis of EDM works Section 3.1 to 3.7 E S Section 4 9. Edition of Section 5 EDM snapshot Fig. 1. Workflow of the DM approach performed to analyse, cla Please cite this article in press as: Peña-Ayala, A. Educational data mining: A su Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.08.042 as in Sections 3 to 5. 2.1. Framework applied for knowledge discovery of educational data mining works The method used to carry out this survey is a framework de- signed to gather, analyze, and mine EDM works. It follows a work- flow to lead the activities oriented to knowledge discovery in databases (KDD). The workflow is split into three stages. The devel- opment of each stage is achieved by three tasks. Thus, nine tasks compose the whole KDD workflow pictured in Fig. 1, whose pur- pose and outcomes are explained as follows: The ‘‘EDM work collection’’ stage performs three tasks. The first task seeks source references that publish EDM works. As a result, a collection of EDM works is gathered. The second evaluates EDM works and produces an EDM approach profile per each chosen EDMwork. The third analyzes the EDM approach profiles and orga- nizes a raw EDM database. The ‘‘data processing stage’’ encompasses the tasks labeled as fourth, fifth, and sixth in Fig. 1. The fourth task transforms the EDM approach profile 3. Analysis of profiles Raw EDM database EDM functio- nalities ition ction 3 4. Pre- processing Ad-hoc EDM database 5. Statistical process Statistics . on of on 4 6. Data mining process Patterns ssify, represent, and mine data of the EDM related works. rvey and a data mining-based analysis of recent works. Expert Systems with raw EDM database into an ad-hoc EDM database to facilitate statis- functionalities in CBES, and the design of techniques devoted to EDM. A couple of reviews appeared in 2009 to shape a state of the 235 works classified and counted in the following way: 36 in tra- ditional education, 54 WBES, 29 LMS, 31 ITS, 26 adaptive educa- A. Peña-Ayala / Expert Systems with Applications xxx (2013) xxx–xxx 3 tical and mining processes. The fifth performs statistical processes to generate seven kinds of EDM functionalities to gather homoge- neous related works and statistics. The sixth mines the ad-hoc EDM database to find out patterns that characterize the gathered EDM works. The stage oriented to ‘‘edit and interpret the results’’ contains the tasks labeled as seven to nine in Fig. 1. The seventh task clas- sifies the EDM works according to the educational functionalities they most focus on. As result, seven topics are organized in bal- anced proportions of homogeneous EDM works to outline seven Sections presented as 3.1 to 3.7. The eighth interprets the patterns produced by the DM approach to discover relationships between the traits value-instances that characterize the EDM approaches. The last task analyzes the discovered knowledge from the EDM works to tailor a snapshot of the EDM arena that is described in Sections 2 to 5. 2.2. Previous reviews of data mining and educational data mining As the starting point of this work, prior reviews of DM and EDM were examined to tailor a conceptual frame about the domain study. Therefore, five reviews are introduced in this subsection, where one is oriented to DM, and the other four cover the period from 1995 up to 2009.2 As EDM is based on DM, a review of DM techniques and appli- cations achieved during 2000 to 2011 is summarized as follows. Shu-Hsien, Pei-Hui, and Pei-Yuan (2012) present a state of the art about DM that concerns a series of works fulfilled throughout the past decade. The paper surveys and classifies 216 works using nine categories that are presented with their respective counting of works: (a) neural networks: 9; (b) algorithm architecture: 22, (c) dynamic prediction: 17; (d) analysis of system architecture: 23; (e) intelligent agent systems: 14; (f) modeling: 15; (g) knowl- edge-based systems: 19; (h) systems optimization: 14; (i) informa- tion systems: 28. The authors recognize the broad baseline that supports DM models, tasks, methods, techniques, and algorithms. Finally, three suggestions are made: (1) include social sciences methodologies; (2) integrate several methodologies into a holistic one; (3) change the policy to guide future development of DM. Regarding EDM, Romero and Ventura (2007) present a review of 81 works published from 1995 up to 2005, where only seven cor- respond to the 1990´s. They identify statistics-visualization and web mining as a couple of DM techniques to classify the applica- tion of DM to CBES. As for statistics, several tools are identified and seven EDM works cited. Concerning visualization, four works are referenced and one tool is recognized. Regarding web mining, it is split into three kinds of tasks: (1) clustering, classification, and outlier detection; (2) association rules and sequential pattern; (3) text mining. A sample of EDM works is given for each kind of task. However, the sample is partitioned into three variants of e- Learning systems: particular web-based courses (WBC), well- known learning management systems (LMS), and adaptive and intelligent web-based educational systems (AIWBES). So, nine col- lections of EDM works are provided in the review with the next statistic: (a) 15 works of clustering, classification, and outlier detection tasks split into: 3 WBC, 3 LMS, 9 AIWBES; (b) 14 papers about association rules and sequential pattern tasks divided into: 6 WBC, 4 LMS, 4 AIWBES; (c) 7 works related to text mining parti- tioned into: 4 WBC, 2 LMS, 1 AIWBES. As future trends, they de- mand: friendly EDM tools for non-technical users, the standardization of DM methods and data; the integration of DM 2 As none of the works that are cited by the five reviews is included in the references of this paper, readers are encouraged to seek such papers to analyze the EDM background. Please cite this article in press as: Peña-Ayala, A. Educational data mining: A su Applications (2013), http://dx.doi.org/10.1016/j.eswa.2013.08.042 tional systems, 23 test-questionnaires, 14 text-contents, and 22 others. Concerning EDM applications, they are gathered into eleven educational categories with the next counting: (a) analysis and visualization of data: 35; (b) providing feedback for supporting instruction: 40; (c) recommendations for students: 37; (d) predict- ing students’ performance: 76; (e) student modeling: 28; (f) detecting undesirable student behaviors: 23; (g) grouping stu- dents: 26; (h) social network analysis: 15; (i) developing concept maps: 10; (j) constructing courseware: 9; (k) planning and sched- uling: 11. At the end of the review, authors assert: ‘‘EDM is now approaching its adolescence . . .’’ 2.3. Scope of the present survey of educational data mining works An interpretation of the four EDM reviews published up to 2010 shows that: the current century represents the start of EDM, be- cause near to 98% of the cited works have appeared since 2000. In consequence, EDM is living its teenage period. During its growth, EDM has shifted from isolated papers published in conferences and journals, to dedicated workshops,3 an international conference on educational data mining,4 a specialized journal of EDM,5 a handbook (Romero, Ventura, Pechenizkiy, & Ryan, 2011), and a society of ex- perts and partisans,6 as well as one edited book (Romero & Ventura, 2006) and another in press (Peña-Ayala, 2013). This synergy reveals the increasing interest in EDM and is the main reason to update the review by means of the present survey. Therefore, the scope of the current overview is constrained to a sample of representative EDM works published in journals,7 chap- ter books related to EDM, as well as papers presented in EDM confer- ences and workshops. The chosen references have been published during the period from 2010 to the first quarter of 2013. In this way, the EDM chronicles are extended and refreshed. 3 See https://pslcdatashop.web.cmu.edu/KDD2011/ 4 See http://edm2013.iismemphis.org/ 5 See http://www.educationaldatamining.org/JEDM/ EDM. The first is the work made by Baker and Yacef (2009). They celebrate the nascent EDM research community, define DM and EDM, and provide 45 EDM references, where one corresponds to 1973, another to 1995, and one more to 1999. The review identifies some EDM targets, such as: student models, models of domain knowledge, pedagogical support, and impacts on learning; where 8, 4, 3, and 4 related works are respectively cited. The second re- view published in 2009 was presented by Peña-Ayala, Domínguez, and Medel (2009). It offers 91 references about three topics: CBES, DM, and EDM. Concerning the former, approaches such as com- puter-assisted instruction, intelligent tutoring systems (ITS), LMS, and web-based educational systems (WBES) are considered. Regarding DM, several models, tasks, and techniques are identi- fied; where mathematical, rules-based, and soft computing tech- niques are the target of analysis. As for EDM works, they are organized into four functionalities: student modeling, tutoring, content, and assessment. The fourth EDM review corresponds to Romero and Ventura (2010), who enhanced their prior EDM survey adding 225 works, keeping the former seven references of the 1990s, and including three papers published in 2010. One novelty concerns a list of 6 See http://www.educationaldatamining.org/ 7 Most of the journals are indexed by � Thompson Reuters Journal Citation Reports and are published by prestigious editorials such as: Elsevier, Springer, IEEE, and Sage. rvey and a data mining-based analysis of recent works. Expert Systems with 2.4. Data mining in a nutshell According to Witten and Frank (2000), DM is the process ori- ented to extract useful and comprehensible knowledge, previously unknown, from huge and heterogeneous data repositories. Thus, the design of a DMwork demands the instantiation of several char- acteristics to shape the approach, such as: disciplines to tailor the theoretical baseline, the sort of model to be built, tasks to perform, methods and techniques to mechanize the proposal, as well as the algorithms, equations, and frames (e.g., data structures, frame- works) to deploy the approach on computers and internet settings. In consequence, this subsection is oriented to define those attri- butes, provide some of their instances, and reveal the statistics of their occurrence among the sub-sample of 222 EDM approaches. (Bhattacharyya & Hazarika, 2006), and natural language (McCarthy Table 1 Counting
本文档为【Educational data mining A survey and a data mining-based analysis - Copy】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
下载需要: 免费 已有0 人下载