Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,152,111 members, 7,814,899 topics. Date: Wednesday, 01 May 2024 at 10:37 PM

Is Hadoop The Next Generation Of The Database? - Science/Technology - Nairaland

Nairaland Forum / Science/Technology / Is Hadoop The Next Generation Of The Database? (348 Views)

Chatgpt-4: The Next Generation Of Conversational AI / ‘Next-Generation’ Wearable Tech Turns Your Body Heat Into Electricity (Photos) / The Most Secured Smartphone In The World - The Next Generation Of Iphone X (2) (3) (4)

(1) (Reply)

Is Hadoop The Next Generation Of The Database? by sarahjohns388: 7:53am On Apr 20, 2018
At the time, a long line of startups were offering a new breed of database designed to store and analyze much larger amounts of data. Greenplum. Vertica. Netezza. Hammerbacher and Facebook tested them all. But they weren't suited to the task either.

In the end, Facebook turned to a little-known open source software platform that had only just gotten off the ground at Yahoo. It was called Hadoop, and it was built to harness the power of thousands of ordinary computer servers. Unlike the Greenplums and the Verticas, Hammerbacher says, Hadoop could store and process the ever-expanding sea of data generated by what was quickly becoming the world's most popular social network.

Over the next few years, Hadoop reinvented data analysis not only at Facebook and Yahoo but so many other web services. And then an army of commercial software vendors started selling the thing to the rest of the world. Soon, even the likes of Oracle and Greenplum were hawking Hadoop. These companies still treated Hadoop as an adjunct to the traditional database – as a tool suited only to certain types of data analysis. But now, that's changing too.

On Monday, Greenplum DBA Online Training – now owned by tech giant EMC – revealed that it has spent the last two years building a new Hadoop platform that it believes will leave the traditional database behind. Known as Pivotal HD, this tool can store the massive amounts of information Bigdata Hadoop was created to store, but it's designed to ask questions of this data significantly faster than you can with the existing open source platform.

"We think we're one the verge of a major shift where businesses are looking at a set of canonical applications that can't be easily run on existing data fabrics and relational databases," says Paul Martiz, the former Microsoft exec who now oversees Greenplum. Businesses need a new data fabric, Maritz says, and the starting point for that fabric is Hadoop.

That's a somewhat surprising statement from a company whose original business was built around a relational database – software that stores data in neat rows and columns. But Greenplum and EMC are just acknowledging what Jeff Hammerbacher and Facebook learned so many years ago: Hadoop Training in New York– for all its early faults – is so well suited to storing and processing the massive amounts of data facing the modern business.

What's more, Greenplum is revamping Hadoop to operate more like a relational database, letting you rapidly ask questions of data using the structured query language, or SQL, which has been a staple of the database world for decades. "When we were acquired [by EMC], we really believed that the two worlds were going to fuse together," says Greenplum co-founder Scott Yara. "What was going to be exciting is if you cold take the massively parallel query processing technology in a database system [like Greenplum] and basically fuse it with the Hadoop platform."

The trouble with Hadoop has always been that it takes so much time to analyze data. It was a "batch system." Using a framework called Hadoop MapReduce, you had the freedom to build all sorts of complex programs that crunch enormous amounts of data, but when you gave it a task, you could wait hours – or even days – for a response.

With its new system Greenplum has worked to change that. A team led by former Microsoft database designer Florian Waas has designed a new "query engine" that can more quickly run SQL queries on data stored across a massive cluster of systems using the Hadoop File System, or HDFS. Open source tools such as Hive have long provided ways of running SQL queries on Hadoop data, but this too was a batch system that needed a fair amount of time to complete queries.

This query engine will make its debut later this year as part of Pivotal HD. Greenplum is now a key component of an EMC subsidiary called The Pivotal Initiative, which seeks to bring several new age web technologies and techniques to the average business.

This time, Greenplum is in lock-step with Jeff Hammerbacher. After leaving Facebook, Hammerbacher helped found a Hadoop startup known as a Cloudera, and late last year, he unveiled a system called Impala, which also seeks to run real-time queries atop Hadoop. But according to Waas and Yara, Pivotal HD is significantly faster than Impala and the many other tools that run SQL queries atop Hadoop. Yara claims that it's at least 100 times faster than Impala.

The caveat, says Waas, is that if a server crashes when Pivotal HD is running a query, you're forced to restart the query. This is a little different from what people have come to expect when running jobs at Hadoop, which was specifically designed to keep running across a large cluster of servers even as individual machines started to fail – as they inevitably do.

"The query extensions of Pivotal HD behave slightly differently in that they require a restart of the query when a machine is lost," he says. "An individual query needs to be restarted but the integrity, accessibility and functionality of the system is guaranteed to continue. We consider this a small price to pay for several orders of magnitude performance enhancement as we do not materialize any results during processing."

The traditional database will always have its place. Even Greenplum will continue to offer its original data warehouse tool, which was based on the open source PostgreSQL database. But the company's new query engine is yet another sign that Hadoop will continue to reinvent the way businesses crunch their data. Not just web giants. But any business.

(1) (Reply)

Whatsapp CEO Jan Koum To Leave Facebook Amid Privacy Flap / Computer Dealers / Technology Articles For Sale

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 15
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.