<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1611962&amp;fmt=gif">
How to apply  
Open Days  
Current Students  
CZ  
Menu
Open Days  
How to apply  
Current Students  
CZ  

MYETL: a Java software tool to extract, transform and load your business

by Michele Nuovo, on Mar 15, 2018 10:28:08 AM

The backbone of data warehouse architecture is constituted by ETL (Extract, Transform and Load) processes. However, ETL is not useful only for the refreshment of data warehouses. In fact, new applications have emerged with the advent of Web 2.0. Those applications integrate data which are dynamically obtained via web-service invocations to more than one source into an integrated environment. Google Maps, a web mapping service application and technology provided by Google, or Yahoo Pipes, an interactive feed aggregator and manipulator, are two examples. Under the hood, the philosophy for their operation is `pure' ETL. Furthermore, with the evolution of the technology, interest is moving to types of data that do not necessarily follow the traditional relation format, as XML, biomedical, multimedia data, and so on (Vassiliadis and Simitsis, 2007).

Although the ETL processes are well known in the computer science field, various issues still remain open. The most important problem is the standardisation: in the market, there exists several tools that provide ETL functionality but each of these tools follows a different approach for the modelling and representation of the different steps. To create a globally accepted paradigm of thinking on this topic is an issue for the academic community (Vassiliadis and Simitsis, 2007).

The aim of this project is to build a working prototype of Java Software which allows the user to extract data from the defined sources, apply the defined transformation on those data and finally load them into a target Teradata data mart that will store the data for Business Intelligence (BI) purpose. Examples of BI tools are MicroStrategy, IBM Cognos, or Informatica which are used to produce business reports on a data mart (bi-tools.org). Various phases has been involved including research and analysis of the theory behind the ETL process, design of the System Architecture and Software Graphical Unit Interface (GUI), implementation in Java programming language of the defined design using the chosen methodology and testing of the implemented code. Finally a User and Maintenance Documentation has been created to give assistance and to describe the practice overview to the final users of the developed system.

Continue reading

Topics:School of Media & IT

Comments