NPW Insights (Free): Week 2/4 for Solution Architect

NPW Insights (Free): Week 2/4 for Solution Architect

Gregor Hohpe on cost impact of loose coupling, scaling media ML at Netflix, caching for Azure Container Registry, new services in AWS Step Functions, EKS Anywhere on Snow.

NPW Research

Top News

New Google Cloud pricing models to boost flexibility

What’s new: Flex Agreements, which will give access to monthly spend discounts, CUD, cloud credits, and professional services without upfront commitments. Standard, Enterprise, and Enterprise Plus pricing tiers, will offer flexibility to choose features and functionality across the portfolio.
What’s changed: Cloud Spanner free trial extended to 90 days and BigQuery auto scaling works more granularly.
Bottomline: Flex agreements will bring pricing incentives based on monthly spend and new pricing tiers will offer feature sets tailored to business needs and your stage of cloud adoption

2022 VOID report calls for mindset shift in incident reporting

Report finding: The Verica Open Incident Database, which assembles all publicly available security incident reports, has several key insights. Incident length has no correlation to its severity. This means that popular incident metrics like MTTR are highly variable, and offer little to no insight into system reliability.
Implications: Shallow metrics like MTTR and incident count should be used only as a starting point to understand complex systems. New forms of incident reporting, which reveal the costs of coordination or response teams, like socio-technical incident and post-incident review data, and near-misses should be adopted.
Read detailed conversation with Courtney Nash, Safety Systems Analyst, who talks about how organizations can go about adopting these new forms of reporting.

Azure SQL GA updates for mid-February

Optimized locking for lower lock memory and improved DB concurrency.
Automatic key rotation for CMKs in SQL Database and SQL Managed Instance.
Max size configuration of TempDB, it persists on server restart.

Azure Cache for Redis Premium tier now offers enhanced passive geo-replication

New metrics to track health of geo-replication link.
Single-click failover between geo-primary and geo-replica caches.
Global cache URL automatic updation of DNS records after geo-failovers.

Must-read Analysis & Advice

How loose coupling affects your cloud bills

Gregor Hohpe, former Technical Director, Google, and AWS Sr. Principal Evangelist digs deep into design and cost implications of decoupling in the cloud.
Starting point: Design time and runtime decoupling comes at a cost in the cloud, and incurs latency.
Observations: Decoupling makes cloud costs visible, and that’s a good thing because then the costs can be optimized. In his example scenario, 50ms latency reduction comes at a fixed cost, which makes the tradeoff clear.
Instead of defining service boundaries by their natural duties, consider the intent of the service – use pattern diagrams to define the topology.
In some cases, decoupling may not require an event broker – in the example scenario, replacing event broker with SNS significantly reduces the cost of the solution.
Conclusion: Decoupling can actually reduce your cloud costs.

Mitigating DDoS attacks with Azure Front Door

The CDN service can redistribute both encrypted and unencrypted DDoS traffic away from source systems during an attack, and layer 3, 4, and 7 DDoS protection is included with AFD.
Key takeaways: Integrate Azure Web Application Firewall with AFD, and use rate limiting, bot protection rulesets, custom rules, and geo-filtering to block suspicious traffic. If internet-facing Azure resources don’t use AFD, use the Azure DDoS Protection product. Connect source systems to AFD via Private Link.

Lessons from scaling media machine learning at Netflix

Challenges and solutions to scaling machine learning for video, audio and image media, exemplified with the Match Cutting feature, a video editing technique which picks shots with similar visual framing, composition, or action to create connection between two scenes.
Key challenges: Expensive media feature computation, lack of standardization in input files, triggering model runs on new assets in the pipeline, waste of repeat computation.
The solution: Standardizing video encodes across the catalog, shot segmentation (feature served by infrastructure teams), storing feature values for reuse, serving high-scoring pairs to video editors through media search platform.

Threat modeling of cloud services by cybersecurity expert

Ken Wolstencroft shares details of threat modeling of Google Cloud Storage service as an example using the STRIDE framework.
STRIDE Framework: STRIDE stands for Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service and Elevation of Privilege.
What’s inside: He maps all potential weaknesses of Storage Service, provides a list of potential threats with STRIDE threat categorization, and 15 specific steps for threat mitigation. It’s an exhaustive threat model that details all possible threats by mapping threat actors (internal and external) and their attack goals, to Google Cloud Storage service features.

How to right-size microservices: Lee Atchison

Making services too small simplifies the code but increases complexity for system architects.
Finding the right size entails trial and error, because it depends on the application and the organization.
Small services are better for less mature development teams, but require a mature service infrastructure and application architecture team.

Other Updates

Caching for Azure Container Registry, which lets users cache container images from Microsoft Artifact Registry and Docker Hub enters public preview.

AWS Step Functions adds support for 35 more AWS services, including Amazon EMR Serverless, AWS Clean Rooms, IoT Fleetwise, and IoT Roborunner.

Virtualization-based security enclaves, which offers data protection features of Always Encrypted in Azure SQL Database independent of the underlying hardware, is now in public preview.

AWS Incident Detection and Response now lets you ingest events from New Relic via Amazon EventBridge with new integration.

EKS Anywhere on Snow, which lets you create and operate Kubernetes clusters on AWS Snow Family devices, is now generally available.

Serverless for Hyperscale in Azure SQL Database, which scales both compute and storage automatically based on workload demand for databases requiring up to 80 vCores and 100TB, is now in public preview.

Where was most of the action last week with AWS, Azure and Google Cloud. What products from these CSPs got the highest attention. Cloud topics that generated the most interest. Based on usage analysis of our 12,000+subscribers among software engineers, DevOps engineers and solution architects.