Menu
Apache Tajo brings data warehousing to Hadoop

Apache Tajo brings data warehousing to Hadoop

A relatively obscure open source Apache package provides a way to put Hadoop data in a data warehouse for further analysis

Tajo provides the ability to set up a distributed data warehouse accross multiple servers

Tajo provides the ability to set up a distributed data warehouse accross multiple servers

Organizations that want to extract more intelligence from their Hadoop deployments might find help from the relatively little known Tajo open source data warehouse software, which the Apache Software Foundation has pronounced as ready for commercial use.

The new version of Tajo, Apache software for running a data warehouse over Hadoop data sets, has been updated to provide greater connectivity to Java programs and third party databases such as Oracle and PostGreSQL.

While less well-known than other Apache big data projects such as Spark or Hive, Tajo could be a good fit for organizations outgrowing their commercial data warehouses. It could also be a good fit for companies wishing to analyze large sets of data stored on Hadoop data processing platforms using familiar commercial business intelligence tools instead of Hadoop's MapReduce framework.

Tajo performs the necessary ETL (extract-transform-load process) operations to summarize large data sets stored on an HDFS (Hadoop Distributed File System). Users and external programs can then query the data through SQL.

The latest version of the software, issued Monday, comes with a newly improved JDBC (Java Database Connectivity) driver that its project managers say makes Tajo as easy to use as a standard relational database management system. The driver has been tested against a variety of commercial business intelligence software packages and other SQL-based tools.

Other new features include catalogs of built-in SQL commands from both Oracle and PostgreSQL systems.

Like a growing number of database systems, Tajo now features full support for JSON (JavaScript Object Notation), easing the process for Web developers to work with Tajo. Tajo can also work directly with Amazon S3 (Simple Storage Service)

Gruter, a big data infrastructure startup in South Korea, is leading the charge to develop Tajo. Engineers from Intel, Etsy, NASA, Cloudera and Hortonworks also contribute to the project.

Perhaps because of its South Korean home base, the software is not very widely known elsewhere in the world, compared to other open-source SQL-based Hadoop packages such as Hive or Impala.

At least in one test of the software, conducted in 2013, Tajo appeared to possess a speed advantage, according to Gruter. Korea's SK Telecom telecommunications firm ran Tajo against 1.7 terabytes worth of data, and found it could complete queries with greater speed than either Hive or Impala, in most instances.

As with most benchmarks, results may vary according to the specific workload. New editions of Hive and Impala may have also closed the speed gap as well.

SK Telecom uses the software in production duties, as does Korea University and NASA's Jet Propulsion Laboratory. The Korean music streaming service Melon uses the software for analytical processing, and has found that Tajo executes ETL jobs 1.5 to 10 times faster than Hive.

The Apache Software Foundation provides support and oversight for more than 350 open source projects, including Hadoop, the Cassandra NoSQL database and the Apache HTTP server.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Follow Us

Join the New Zealand Reseller News newsletter!

Error: Please check your email address.

Tags applicationsdata miningsoftwareApache Software Foundation

Featured

Slideshows

Educating from the epicentre - Why distributors are the pulse checkers of the channel

Educating from the epicentre - Why distributors are the pulse checkers of the channel

​As the channel changes and industry voices deepen, the need for clarity and insight heightens. Market misconceptions talk of an “under pressure” distribution space, with competitors in that fateful “race for relevance” across New Zealand. Amidst the cliched assumptions however, distribution is once again showing its strength, as a force to be listened to, rather than questioned. Traditionally, the role was born out of a need for vendors and resellers to find one another, acting as a bridge between the testing lab and the marketplace. Yet despite new technologies and business approaches shaking the channel to its very core, distributors remain tied to the epicentre - providing the voice of reason amidst a seismic industry shift. In looking across both sides of the vendor and partner fences, the middle concept of the three-tier chain remains centrally placed to understand the metrics of two differing worlds, as the continual pulse checkers of the local channel. This exclusive Reseller News Roundtable, in association with Dicker Data and rhipe, examined the pivotal role of distribution in understanding the health of the channel, educating from the epicentre as the market transforms at a rapid rate.

Educating from the epicentre - Why distributors are the pulse checkers of the channel
Kiwi channel reunites as After Hours kicks off 2017

Kiwi channel reunites as After Hours kicks off 2017

After Hours made a welcome return to the channel social calendar last night, with a bumper crowd of distributors, vendors and resellers descending on The Jefferson in Auckland to kickstart 2017. Photos by Maria Stefina.

Kiwi channel reunites as After Hours kicks off 2017
Arrow exclusively introduces Tenable Network Security to A/NZ channel

Arrow exclusively introduces Tenable Network Security to A/NZ channel

Arrow Electronics introduced Tenable Network Security to local resellers in Sydney last week, officially launching the distributor's latest security partnership across Australia and New Zealand. Representing the first direct distribution agreement locally for Tenable specifically, the deal sees Arrow deliver security solutions directly to mid-market and enterprise channel partners on both sides of the Tasman.

Arrow exclusively introduces Tenable Network Security to A/NZ channel
Show Comments