Data Observability 系列文 9:資料可觀測性的定義

看完 Google Trends,接著該輪到用定義來認識 Data Observability!

前言

大家好,我是艦長。

在上一篇文章中,我們有看到 Gartner 的 Hype Cycle,清楚地看到「Data Observability」這個關鍵字是有被關注的。

本文我們就來做個「名詞解釋」,比較不同來源的定義,看看什麼是 Data Observability。

各種 Data Observability 定義

下面我整理了幾個不同的定義來源:

  1. Monte Carlo Data: 如上一篇文章提過的,作為 Data Observability 主要的推廣廠商之一,Monte Carlo Data 當然有定義什麼是 Data Observability。

    “Data observability provides full visibility into the health of your data and systems so you are the first to know when the data is wrong, what broke, and how to fix it.”

    (定義出處:https://www.montecarlodata.com/blog-what-is-data-observability/

  2. IBM

    同樣上篇文章提到,跨足全球的 IT 顧問及解決方案巨頭 IBM,也沒放過 Data Observability,他們提出的定義如下。

    “Data observability refers to the practice of monitoring, managing and maintaining data in a way that ensures its quality, availability and reliability across various processes, systems and pipelines within an organization.”

    (定義出處:https://www.ibm.com/think/topics/data-observability

  3. Dynatrace

    專注在提供 Observability 雲端供應商 Dynatrace,也有提供他們對於 Data observability 的看法。

    “Data observability is a discipline that aims to address the needs of organizations to ensure data availability, reliability, and quality throughout the data lifecycle—from ingestion to analytics and automation. Ensuring data trustworthiness and security can pose significant hurdles for organizations that rely on data to inform business and product strategies, optimize and automate processes, and drive continuous improvements.”

    (定義出處:https://www.dynatrace.com/knowledge-base/data-observability/

  4. Splunk

    另一個 Data 圈的老牌供應商 Splunk,也有自己的定義。

    “Data observability is the term for your ability to fully understand, monitor, and manage the quality, reliability, and performance of data across various data pipelines. Observability provides a transparent view of data flows to ensure their accuracy and validity.

    Comprehensively speaking, data observability is a proactive approach to data management that allows businesses to gain insights and recognize issues in their data ecosystem. This can be done in real-time, contributing to enhanced decision-making.

    Data observability enables the ability to inspect, diagnose, and rectify data inconsistencies within an organization’s information system. Data observability also acts as a cornerstone for boosting business intelligence.”

    (定義出處:https://www.splunk.com/en_us/blog/learn/data-observability.html

  5. Azure

    上篇文章也提到,三大雲中的 Azure 也沒放過 Data observability。

    “Data observability is your ability to understand the health of your data and data systems by collecting and correlating events across areas like data, storage, compute and processing pipelines.

    Building and operating a resilient, scalable, and performant data platform requires adopting proven DevOps-inspired processes across teams that represent functional domains. Data observability enables business owners, DevOps engineers, data architects, data engineers, and site reliability engineers to automate issue detection, prediction, and prevention, and to avoid downtime that can break production analytics and AI.”

    (定義出處:https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/manage-observability

  6. Datadog

    知名的 Monitoring、Mertics 現在也改為 Observability 服務的雲端供應商 Datadog 也加入戰場。

    “Data Observability helps data teams detect, resolve, and prevent issues that impact data quality, performance, and cost. It enables teams to monitor anomalies, troubleshoot faster, and maintain trust in the data powering downstream systems.”

    (定義出處:https://docs.datadoghq.com/data_observability/

  7. 歐萊禮出版的書籍《Fundamentals of Data Observability》

    作者認為 Gartner 的定義跟他的定義吻合,所以在書中是直接貼上 Gartner 的定義,但因為下一個要分享的就是 Gartner 的定義,這裡就不重複貼上了。

    “It is worth noting that Gartner has defined data observability as the following, which aligns well with our definition:”

  8. Gartner

    續上,當然知名的顧問公司 Gartner 有提出自己的定義,但一般如果你沒有付錢,是看不到 Gartner 他們家的完整報告,但如上一個定義《Fundamentals of Data Observability》有引用 Gartner 的定義,或者你也可以在公開搜尋到其他文章引用 Gartner 對於 Data observability 的定義。

    “Data observability is the ability of an organization to have a broad visibility of its data landscape and multilayer data dependencies (like data pipelines, data infrastructure, data applications) at all times with an objective to identify, control, prevent, escalate and remediate data outages rapidly within expectable SLAs.

    Data observability uses continuous multilayer signal collection, consolidation and analysis to achieve its goals as well as to inform and recommend better design for superior performance and better governance to match business goals.”

    另外,Gartner 也有另一個針對 Data observability tools 的定義。

    “Gartner defines data observability tools as software applications that enable organizations to understand the state and health of their data, data pipelines, data landscapes, data infrastructures, and the financial operational cost of the data across distributed environments. This is accomplished by continuously monitoring, tracking, alerting, analyzing and troubleshooting data workflows to reduce problems and prevent data errors or system downtime. The tools also provide impact analysis, solution recommendation, collaboration and incidence management. They go beyond traditional network or application monitoring by enabling users to observe changes, discover unknowns and take appropriate actions with goals to prevent firefighting and business interruption.”

    (定義出處:https://www.gartner.com/reviews/market/data-observability-tools

定義的共同之處

依據上面這些定義,你覺得它們有什麼共同點?我覺得我看完之後有三個想法:

  1. 「資料可觀測性」的核心目標是確保「資料」值得信賴(可用、可靠、有品質)。
  2. 「資料可觀測性」意味著企業會依據其商業與業務需求,去主動的監控與管理資料。
  3. 「資料可觀測性」意味著企業對於資料、資料管線、資料基礎設施,有更全面的掌握;會收集在資料生命週期中多層次的資訊,藉此實現對於「資料」之更廣泛、全面的觀測。

簡而言之,「資料可觀測性」即是企業將「可觀測性」附加在 Data 之上,讓企業可以更有所本(依據商業需求、資源與成本)的去主動管理資料,進而使用資料。

(出處:炬識科技 Athemaster 總經理 Anna 口述。)

小結

本文,我嘗試透過比較多家不同的定義,來認識「資料可觀測性」。雖然說商業公司多少會嘗試爭奪熱門關鍵字的「定義」,甚至在其中加油添醋,但大方向是不會變的,也很少會出現 B 廠商直接做出與 A 廠商完全相反或相違背的定義。

系列文到此,我們總算開始認識「資料可觀測性」了。

就如前幾篇文提到的,讓我們將「可觀測性 Observability」附加在 Data 上。提升「資訊透明度」與「即時性」。讓 Data Team 可以擁有更多「主動觀測」的能力,去理解他們最在意的 Product—Data。

系列文連結

此系列文持續撰寫中,陸續更新連結。

  1. Data 不只是技術問題
  2. 軟體圈都在注意些什麼?
  3. Data User 最關心的是什麼?
  4. 軟體圈與 Data 圈眼中的 Data
  5. 如何取得一杯乾淨的水(Data)?
  6. 我們過去為了資料品質做了些什麼?
  7. 有了監控為什麼還不夠?
  8. 什麼是資料可觀測性?先看看關鍵字趨勢
  9. 資料可觀測性的定義

轉貼本文時禁止修改,禁止商業使用,並且必須註明來自「艦長,你有事嗎?」原創作者 Cheng Wei Chen,及附上原文連結。

用贊助表達你的支持

更多文章