AWS Topics for certified Architect

AWS Config conformance packs

AWS Config is a service that provides details on how your AWS resources are configured, how they relate to each other , how they were configured in the past

Features

Specify the resource types you want AWS Config to record.
S3 bucket to receive a configuration snapshot
SNS to send configuration stream notifications
Rules that you want AWS Config to use to evaluate compliance information
Conformance packs, or a collection of AWS Config rules and remediation actions
Aggregator to get a centralized view of your resource inventory and compliance - collects AWS Config configuration and compliance data from multiple AWS accounts and AWS Regions into a single account and Region.
Write advanced queries by referring to the configuration schema of the AWS resource.

https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html

Ways to Use AWS Config

notify you whenever resources are created, modified, or deleted
evaluate the configuration settings of your AWS resources for compliance to rules and get notified of violations
Auditing and Compliance with internal policies and best practices
View how the resource you intend to modify is related to other resources and assess the impact of your change.
Use the historical configurations of your resources provided by AWS Config to troubleshoot issues and to access the last known good configuration of a problem resource.
Security Analysis - IAM policy at specific point in time, EC2 security groups configurations, etc

Conformance pack

A collection of AWS Config rules and remediation actions that can be deployed as a single entity in an account and a Region or across an organization in AWS Organizations.

Formats:

- a YAML template that contains the list of AWS Config managed or custom rules and remediation actions.

- AWS Systems Manager documents (SSM documents)

AWS Security Hub

Automate security best practice checks, aggregate security alerts into a single place and format, and understand your overall security posture across all of your AWS accounts.
Performs security best practice checks, aggregates alerts, and enables automated remediation.
Automated checks based on a collection of security controls curated by experts
support for common frameworks like CIS, PCI DSS, and more.
Integrates findings from other services like Config, Firewall Manager, etc
Conduct Cloud Security Posture Management (CSPM)
Security Orchestration, Automation, and Response (SOAR) workflows
Integration with EventBridge.
data ingestion into your Security Information and Event Management (SIEM), ticketing, and other tools by consolidating the integrations between AWS services and your downstream tooling and by normalizing your findings.
Visualize your security findings to discover new insights
Searching, correlating, and aggregating, and fine-tuning diverse security findings by accounts and resources as well as visualizing findings in the Security Hub dashboard.

AWS Managed Microsoft AD

Run Microsoft Active Directory (AD) as a managed service.
Highly available pair of domain controllers connected to your virtual private cloud (VPC), run in different Availability Zones in a Region of your choice.
Run directory-aware workloads in the AWS Cloud, including Microsoft SharePoint and custom .NET and SQL Server-based applications.
Configure a trust relationship between AWS Managed Microsoft AD in the AWS Cloud and your existing on-premises Microsoft Active Directory, providing users and groups with access to resources in either domain, using AWS IAM Identity Center.
Connect your AWS resources with an existing on-premises Microsoft Active Directory.
Manage users and groups
Provide single sign-on to applications and services
Create and apply group policy
Enable multi-factor authentication by integrating with your existing RADIUS-based MFA infrastructure to provide an additional layer of security when users access AWS applications
Securely connect to Amazon EC2 Linux and Windows instances

Amazon S3 Event Notifications

Receive notifications when certain events happen in your S3 bucket.

Configuration in the notification subresource that's associated with a bucket.

Publish notifications for the following events:

New object created events

Object removal events

Restore object events

Reduced Redundancy Storage (RRS) object lost events

Replication events

S3 Lifecycle expiration events

S3 Lifecycle transition events

S3 Intelligent-Tiering automatic archival events

Object tagging events

Object ACL PUT events

Amazon Simple Notification Service (Amazon SNS) topics

Amazon Simple Queue Service (Amazon SQS) queues

AWS Lambda function

Amazon EventBridge

For more information, see Supported event destinations.

Amazon Simple Queue Service FIFO (First-In-First-Out) queues aren't supported

AWS Glue

https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

Data integration service
Discover, prepare, move, and integrate data from multiple sources.
Tooling for authoring, running jobs, and implementing business workflows.
Discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog.
Visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.
Search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Workloads like ETL, ELT, and streaming in one service,
Integrates with AWS analytics services and Amazon S3 data lakes.
Discover and organize data
Automatically discover data – Use AWS Glue crawlers to automatically infer schema information and integrate it into your AWS Glue Data Catalog.
Manage schemas and permissions – Validate and control access to your databases and tables.
Connect to a wide variety of data sources – Tap into multiple data sources, both on premises and on AWS, using AWS Glue connections to build your data lake.
Transform, prepare, and clean data for analysis
Build complex ETL pipelines with simple job scheduling – Invoke AWS Glue jobs on a schedule, on demand, or based on an event.
Clean and transform streaming data in transit – Enable continuous data consumption, and clean and transform it in transit. This makes it available for analysis in seconds in your target data store.
Deduplicate and cleanse data with built-in machine learning – Clean and prepare your data for analysis without becoming a machine learning expert by using the FindMatches feature. This feature deduplicates and finds records that are imperfect matches for each other.
Built-in job notebooks – AWS Glue job notebooks provide serverless notebooks with minimal setup in AWS Glue so you can get started quickly.
Edit, debug, and test ETL code – With AWS Glue interactive sessions, you can interactively explore and prepare data. You can explore, experiment on, and process data interactively using the IDE or notebook of your choice.
Define, detect, and remediate sensitive data – AWS Glue sensitive data detection lets you define, identify, and process sensitive data in your data pipeline and in your data lake.
Automatically scale based on workload – Dynamically scale resources up and down based on workload. This assigns workers to jobs only when needed.
Automate jobs with event-based triggers – Start crawlers or AWS Glue jobs with event-based triggers, and design a chain of dependent jobs and crawlers.
Run and monitor jobs – Run AWS Glue jobs with your choice of engine, Spark or Ray. Monitor them with automated monitoring tools, AWS Glue job run insights, and AWS CloudTrail. Improve your monitoring of Spark-backed jobs with the Apache Spark UI.

AWS Athena

https://docs.aws.amazon.com/athena/latest/ug/what-is.html

interactive query service
analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL.
Point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.
Run data analytics using Apache Spark
Submit Spark code for processing and receive the results directly.
Simplified notebook experience in Amazon Athena console to develop Apache Spark applications using Python or Athena notebook APIs.
Athena SQL and Apache Spark on Amazon Athena are serverless,
Athena scales automatically—running queries in parallel—s
Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3. Examples include CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC. You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.
Athena integrates with Amazon QuickSight for easy data visualization. You can use Athena to generate reports or to explore data with business intelligence tools or SQL clients connected with a JDBC or an ODBC driver. For more information, see What is Amazon QuickSight in the Amazon QuickSight User Guide and Connecting to Amazon Athena with ODBC and JDBC drivers.
Athena integrates with the AWS Glue Data Catalog, which offers a persistent metadata store for your data in Amazon S3. This allows you to create tables and query data in Athena based on a central metadata store available throughout your Amazon Web Services account and integrated with the ETL and data discovery features of AWS Glue. For more information, see Integration with AWS Glue and What is AWS Glue in the AWS Glue Developer Guide.
Amazon Athena makes it easy to run interactive queries against data directly in Amazon S3 without having to format data or manage infrastructure. For example, Athena is useful if you want to run a quick query on web logs to troubleshoot a performance issue on your site. With Athena, you can get started fast: you just define a table for your data and start querying using standard SQL.
You should use Amazon Athena if you want to run interactive ad hoc SQL queries against data on Amazon S3, without having to manage any infrastructure or clusters. Amazon Athena provides the easiest way to run ad hoc queries for data in Amazon S3 without the need to setup or manage any servers.
For a list of AWS services that Athena leverages or integrates with, see AWS service integrations with Athena.
Query services like Amazon Athena, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR all address different needs and use cases.

Amazon EMR

Run Hadoop, Spark, and Presto
Running SQL queries,
Run a wide variety of scale-out data processing tasks for applications such as machine learning, graph analytics, data transformation, streaming data, and virtually anything you can code. =
Use custom code to process and analyze extremely large datasets with the latest big data processing frameworks such as Spark, Hadoop, Presto, or Hbase.
Full control over the configuration of your clusters and the software installed on them.

Use Amazon Athena to query data that you process using Amazon EMR.
If you use EMR and already have a Hive metastore, you can run your DDL statements on Amazon Athena and query your data immediately without affecting your Amazon EMR jobs.

Amazon Redshift

A data warehouse
Pull together data from many different sources – like inventory systems, financial systems, and retail sales systems – into a common format, and store it for long periods of time.
Build sophisticated business reports from historical data, then a data warehouse like Amazon Redshift is the best choice.
The query engine in Amazon Redshift has been optimized to perform especially well on running complex queries that join large numbers of very large database tables. When you need to run queries against highly structured data with lots of joins across lots of very large tables, choose Amazon Redshift.

Apache Parket and AWS Glue

AWS Glue supports using the Parquet format.
A performance-oriented, column-based data format.
Use AWS Glue to read Parquet files from Amazon S3 and from streaming sources as well as write Parquet files to Amazon S3.
read and write bzip and gzip archives containing Parquet files from S3.
Configure compression behavior on the S3 connection parameters instead of in the configuration discussed on this page.

AWS Global accelerator

https://aws.amazon.com/global-accelerator/

improve the availability, performance, and security of your public applications.
provides two global static public IPs that act as a fixed entry point to your application endpoints, such as Application Load Balancers, Network Load Balancers, Amazon Elastic Compute Cloud (EC2) instances, and elastic IPs.
Use cases

Use traffic dials to route traffic to the nearest Region or achieve fast failover across Regions.
Accelerate API workloads by up to 60%, leveraging TCP termination at the edge.
Global static IP
Simplify allowlisting in enterprise firewalling and IoT use cases.
Low-latency gaming and media workloads
Use custom routing to deterministically route traffic to a fleet of EC2 instances.

Amazon Kinesis Data Streams

https://aws.amazon.com/kinesis/data-streams/

serverless streaming data service that makes it easy to capture, process, and store data streams at any scale.
Stream Data from iOS or devices is directed to Kinesis Data Streams
KDS then ingests and stores data streams for processing - clickstream, service logs, sensor data, in app user events
Use KDS with Lambda, Managed Service for Apache Flunk , Spark on Amazon ERM, Ec2 to output into dashboards or real time applications
Great for real time analytics

Amazon Kinesis

https://aws.amazon.com/kinesis/data-firehose/

(ETL) service that reliably captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.
Lambda functions to transform the data
Ingest from AWS SDK, AWS Services, Kinessis Data Streams
Build it transformations are supported
Write to s3, Redshift, API Gateway,

Amazon Redshift Spectrum

https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html

https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html

query data directly from files on Amazon S3.
you need an Amazon Redshift cluster and a SQL client that's connected to your cluster so that you can run SQL commands. The cluster and the data files in Amazon S3 must be in the same AWS Region.
efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables.

Search This Blog

AWS Architecture Study Guide