IBM today announced it is launching IBM watsonx.data, a data store built on an open lakehouse architecture, to help enterprises easily unify and govern their structured and unstructured data, wherever it resides, for high-performance AI and analytics. The solution is currently in a closed beta phase and is expected to be generally available in July 2023.
What is watsonx.data?
Watsonx.data will be core to IBM’s new AI and Data platform, IBM watsonx, announced today at IBM Think. With watsonx, IBM will launch a centralized AI development studio that gives businesses access to proprietary IBM and open-source foundation models, watsonx.data to gather and clean their data, and a toolkit for governance of AI.
Watsonx.data will allow users to access their data through a single point of entry and run multiple fit-for-purpose query engines across IT environments. Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution. It also offers built-in governance, automation and integrations with an organization’s existing databases and tools to simplify setup and user experience.
Supporting the data management life cycle
According to IDC’s Global StorageSphere, enterprise data stored in data centers will grow at a compound annual growth rate of 30% between 2021-2026. With increased data volumes comes increased data silos, operational costs, and regulatory pressures, which can lead to greater scrutiny and demand for improved business outcomes from data, analytics and AI investments.
This proliferation of data spans every industry, and organizations have an opportunity to turn it into actionable insights that can inform revenue strategies and enhance operational efficiencies.
“The media and entertainment industry has undergone a significant digital transformation, with viewers consuming content across different devices and platforms,” said Vitaly Tsivin, EVP Business Intelligence at AMC Networks. “Watsonx.data could allow us to easily access and analyze our expansive, distributed data to help extract actionable insights and maximize our resource utilization to deliver superior user experiences for viewers of AMC Networks’ curated, high-quality content.”
Notably, watsonx.data runs both on-premises and across multicloud environments. The solution will help businesses harness their increasingly siloed data and apply advanced AI and analytics to derive actionable insights, all while supporting robust data governance and observability throughout the data management life cycle.
Strong partnerships for even stronger solutions
Watsonx.data is engineered to use Intel’s built-in accelerators on Intel’s new 4th Gen Xeon Scalable Processors and open-source query engines such as Presto, the Velox acceleration library and Spark, to deliver rapid and reliable data processing for high performance SQL querying, reporting, business intelligence, and machine learning.
“We recognize the importance of watsonx.data and the development of the open-source components that it’s built upon,” said Das Kamhout, VP and Senior Principal Engineer of the Cloud and Enterprise Solutions Group at Intel. “We look forward to partnering with IBM to optimize the watsonx.data stack, achieving breakthrough performance through our joint technological contributions to the Presto open-source community.”
IBM and Intel have a long history of collaboration on data and AI products, including the optimization of IBM Db2 on Intel Xeon platforms, AI acceleration with IBM Watson NLP Library for Embed with OneAPI, and now watsonx.data.
Watsonx.data will allow users to modernize their data repositories with data warehouse-like capabilities, while benefiting from low-cost object storage and open data and table formats like Iceberg, to help them make data-driven decisions within minutes.
“Open data lakehouse architectures powered by the Apache Iceberg table format give organizations the flexibility to use fit-for-purpose analytical solutions to future-proof their data platforms for all workloads,” said Paul Codding, EVP of Product Management of Cloudera. “IBM and Cloudera customers will benefit from a truly open and interoperable hybrid data platform that fuels and accelerates the adoption of AI across an ever-increasing range of use cases and business processes.”
IBM and Cloudera have a long-standing strategic partnership that includes certified product integrations and joint sales and support models.
Wasonx.data will be available on premises and across multiple cloud providers, including IBM Cloud and Amazon Web Services (AWS). This builds on last year’s announcement of IBM expanding their relationship with AWS to offer IBM software as a service on AWS. The solution will also be available in AWS Marketplace.
“Organizations are increasingly adopting data lakehouse solutions to support their growing data needs, especially as we see an industry-wide shift toward AI solutions,” said Soo Lee, Director Worldwide Strategic Alliances at AWS. “Making watsonx.data available as a service in AWS Marketplace further supports our customers’ increasing needs around hybrid cloud – giving them greater flexibility to run their business processes wherever they are, while providing choice of a wide range of AWS services and IBM cloud native software attuned to their unique requirements.”
Watsonx.data will extend IBM’s market leadership in data and AI, most recently demonstrated by its evaluation as a leader in The Forrester Wave: Data Management for Analytics, by integrating with existing IBM solutions like StepZen, Databand.ai, IBM Watson Knowledge Catalog, IBM zSystems, IBM Watson Studio, and IBM Cognos Analytics with Watson. These integrations enable watsonx.data users to implement various industry-leading data catalog, lineage, governance, and observability solutions across their data ecosystems.
Beyond launch, watsonx.data is expected to undergo continuous development, incorporating the latest performance enhancements to the Presto open-source query engine via Velox and through IBM’s recent acquisition of Ahana, the only SaaS for Presto and a strong contributor to the Presto open-source community. Further development of watsonx.data will also incorporate IBM’s Storage Fusion technology to enhance data caching across remote sources as well as semantic automation capabilities built on IBM Research’s foundation models to automate data discovery, exploration, and enrichment through conversational user experiences.
Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.
 When comparing published 2023 list prices normalized for VPC hours of watsonx.data to several major cloud data warehouse vendors. Savings may vary depending on configurations, workloads and vendors.
 IDC, Worldwide Global StorageSphere Forecast, 2022–2026: An Installed Base of 7.9ZB of Storage Capacity in 2021 Came at a Cost of $370 Billion — Is It Enough? (IDC Doc #US49051122, May 2022)