Dataplex, Google’s data governance service, has announced data profiling and AutoDQ in public preview.
What it does: Automates data and profile quality scans with flexible data models.
Key features: Data Scans in Dataplex are serverless, require zero data copy, can be scheduled, or triggered on demand by data consumers, producers and governors. Data profile scan results offered in UI with rich insights; recommends rules with passing thresholds for data quality dimension.
➝ AsyncEventsReceived measures total events successfully queued for processing.
➝ AsyncEventAge measures time between successful queuing and function invocation
➝ AsyncEventsDropped measures events dropped without successful execution.
The service, which offers private clouds powered by VMware vSphere clusters on bare-metal Azure infrastructure, has brought four features to general availability
➝ Azure Log Analytics for AVS with prebuilt queries
➝ New Node SKUs powered by Intel Xeon and NVMe-based SSDs
➝ Customer Managed Keys with Azure Key Vault
➝ Azure NetApp Files volumes as file share for AVS
Also: Stretched clusters, which assures 99.99% uptime for critical applications through automatic failover, is entering preview.
➝ New Cold Start feature requires only 50 labeled and 50 unlabeled fraud events
➝ Enables continuous model retraining as dataset grows
➝ MTA supports large-scale Java app modernization and migration projects by providing line-by-line recommendations for your source code.
➝ Azure’s contributions include rulesets to provide guidance for configuring data sources, using Java Key Store and file systems
Must-read Analysis & Advice
Context: Chargeback is a Cloud FinOps process of mapping cloud consumption to internal consumers, which enables recovery of cloud services costs. It helps incentivize individual teams to be efficient in cloud resources consumption.
For BigQuery: For flat-rate pricing, use the total millisecond slots used by a team’s queries to compute chargeback.
For GKE: Join exported usage data with billing export data for every SKU, and choose to chargeback resource requests rather than resource limits and resource usage.
For Compute Engine: Enable CUD-sharing (committed use discount) across all projects.
A senior Azure consultant draws upon learnings from a large-scale transformation for a professional sports organization.
Data flow implemented: Azure Data Factory writes to zones within the internal data lake. Databricks/ Apache Spark reads from this zone, does the transformation, and processes Dataframe to an external facing data lake for analytics.
What was nice: Spark combined with Databricks Autoloader enabled the same codebase usage in real-time and batch-processing requirements.
What is required to add an analytics engine on an existing feature of a high-use application?
The product: Tax Insights, part of Shopify Tax feature, informs retailers of their tax liabilities across various states in a timely fashion, thus helps improve compliance.
What was needed: Creating new data models, modifying existing ones, building new functionality to handle dynamically changing data, and publishing results in a key-value store for end-user consumption.
Tip: Article has details how the engineering team went about planning, data gathering, prototyping data models, productionizing jobs, and publishing insights.
➝ While composability is driven by API-first solution design, it is primarily concerned with enabling interfaces and executing functions for enterprise business objects.
➝ See how composability drives domain ownership, abstracts technical interfaces from proprietary applications, and mitigates duplication of effort.
What is Conway’s Law: States that architecture of software systems tends to reflect the org structure of involved teams. What is commonly derived then is that team reorganization is the only effective way forward.
Key takeouts: Leadership should be clear on what to achieve and how to achieve. What are the core building blocks? These are the guardrails that should be in place. The discussion also offered another useful test: if you understand your role in the larger capability being built, what dependencies you are taking on, and so on, the impact of Conway’s Law can be ignored.
Azure NDm A100 v4-series training benchmarks for T5 model with NVIDIA JAX. Scaling efficiency of 84% at 16 nodes (128 GPUs) for the Large T5 model, 82% for the XL T5 model.
Tech layoffs are not the end of the world. Non-tech companies advertising significantly more tech jobs.
Amazon EC2 now lets you automatically rollback to undo the changes made by the Auto Scaling instance refresh feature when it fails.
Google Cloud adds multi-architecture support to fix the issue of deploying multi-architecture container images to Cloud Run.