Getting Started

CloudWanderer is made up of three core components.

  1. The cloud interface (e.g. CloudWandererAWSInterface). Responsible for discovering the resources that exist in your cloud provider.

  2. The storage connector (e.g. DynamoDbConnector). Responsible for storing the discovered resources in your storage mechanism of choice.

  3. The CloudWanderer class. Responsible for bringing the interface and storage connectors together to make them easier to work with.

Testing with the Memory Connector

If you don’t mind that your data is thrown away as soon as your python executable stops you can test CloudWanderer using the Memory Storage Connector!

>>> import cloudwanderer
>>> wanderer = cloudwanderer.CloudWanderer(
...     storage_connectors=[cloudwanderer.storage_connectors.MemoryStorageConnector()]
... )

It’s wise to do this in an interactive environment otherwise you may spend an inordinate amount of time re-querying your AWS environment!

Testing with a local DynamoDB

DynamoDB has a Docker Image that allows you to run a local, persistent DynamoDB. This provides us with a cheap and easy way to start trying out CloudWanderer.

$ docker run -p 8000:8000 -v $(pwd):/data amazon/dynamodb-local \
    -Djava.library.path=./DynamoDBLocal_lib \
    -jar DynamoDBLocal.jar \
    -sharedDb -dbPath /data/

This starts a DynamoDB docker image on your local machine and tells it to persist data into the current directory in a shared database file shared-local-instance.db. This allows the data to persist even if you stop the container.

>>> from cloudwanderer.storage_connectors import DynamoDbConnector
>>> local_storage_connector = DynamoDbConnector(
...     endpoint_url='http://localhost:8000'
... )

This creates an alternative storage connector that points at your local DynamoDB

>>> wanderer = cloudwanderer.CloudWanderer(storage_connectors=[local_storage_connector])

This passes the storage connector that points at your local DynamoDB into a new wanderer and now all subsequent CloudWanderer operations will occur against your local DynamoDB!

Writing Resources

Writing all Resources from all Regions

Writing all Supported Resources in all regions is as simple as using the write_resources() method.

>>> import cloudwanderer
>>> storage_connector = cloudwanderer.storage_connectors.DynamoDbConnector()
>>> storage_connector.init()
>>> wanderer = cloudwanderer.CloudWanderer(storage_connectors=[storage_connector])
>>> wanderer.write_resources()

In that block we are:

  1. Creating a storage connector (in this case DynamoDB)

  2. Initialising the storage connector (in this case creating a dynamodb table called cloud_wanderer

  3. Creating a wanderer and using write_resources() to get all resources in all regions.

Important: This will create DynamoDB table in your AWS account and write a potentially large number of records to it which may incur some cost. See earlier examples for how to test against a local DynamoDB or memory.

Writing VPCs from all Regions

Writing VPCs is as simple as passing the resource_types argument.

>>> wanderer.write_resources(resource_types=['vpcs'])

Excluding Resource Types

Some resource types take a very long time to query (e.g. EC2 Images) and depending on what you’re using your data for may not be worth the time.

>>> wanderer.write_resources(exclude_resources=['ec2:images'])

Writing Resource by URN

If you’re writing an event driven discovery mechanism it can be very useful to be able to update an individual resource without discovering all of the other resources of that type as well.

>>> from cloudwanderer import URN
>>> urn = URN(
...     account_id="123456789012",
...     region="eu-west-2",
...     service="ec2",
...     resource_type="vpc",
...     resource_id="vpc-1111111111",
... )

>>> wanderer.write_resource(urn=urn)


If the resource is not found it will delete it from your storage connector. This applies to both write_resource() and write_resources().

Reading Resources

Retrieving all VPCs from all Regions

>>> vpcs = storage_connector.read_resources(service='ec2', resource_type='vpc')
>>> for vpc in vpcs:
...     print('vpc_region:', vpc.urn.region)
...     vpc.load()
...     print('vpc_state: ', vpc.state)
...     print('is_default:', vpc.is_default)
vpc_region: us-east-1
vpc_state:  available
is_default: True
vpc_region: eu-west-2
vpc_state:  available
is_default: True

You’ll notice here we’re calling a property urn in order to print the region. URNs are CloudWanderer’s way of uniquely identifying a resource.

You can also see we’re printing the vpc’s state and is_default attributes. It’s very important to notice the load() call beforehand which loads the resource’s data. Resources returned from any read_resources() call on DynamoDbConnector are lazily loaded unless you specify the urn= argument. This is due to the sparsely populated global secondary indexes in the DynamoDB table schema.

Once you’ve called load() you can access any property of the AWS resource that is returned by its describe method. E.g. for VPCs see EC2.Client.describe_vpcs. These attributes are stored as snake_case instead of the APIs camelCase, so isDefault becomes is_default.

Reading Subresources

What is a Subresource?

In CloudWanderer, a subresource is a resource which does not have its own unique identifier in the cloud provider. It depends upon its parent resource for its identity.

An example of a subresource is a AWS IAM Role Inline Policy. The Role has an ARN (AWS’s unique identifier), but the policy does not. When interacting with the AWS API you can only retrieve an inline policy by specifyng the policy name and the role name/ARN. This makes it qualify as a subresource in CloudWanderer terminology.

This is unlike Boto3, where a subresource is any resource dependent on a parent resource (e.g. a subnet is a subresource of a VPC). A subnet does not fit the CloudWanderer definition of a subresource however, because a subnet has its own unique identifier and can therefore be retrieved from the API without specifying the VPC of which it is a part.

How do I list Subresources?

Let’s say we want to get a list of role policies. We can start by getting the role

>>> role = next(storage_connector.read_resources(service='iam', resource_type='role'))
>>> role.load()

Next we need to find out what policies are attached, we can either do this with the secondary attributes.

>>> role.get_secondary_attribute('role_inline_policy_attachments')
[{'PolicyNames': ['test-role-policy'], 'IsTruncated': False}]
>>> role.get_secondary_attribute(jmes_path='[].PolicyNames[0]')

Or we can do it with the subresource_urns property.

>>> role.subresource_urns
[URN(account_id='123456789012', region='us-east-1', service='iam', resource_type='role_policy', resource_id_parts=['test-role', 'test-role-policy'])]

Then we can lookup the inline policy

>>> inline_policy_urn = role.subresource_urns[0]
>>> inline_policy = storage_connector.read_resource(urn=inline_policy_urn)
>>> inline_policy.policy_document
{'Version': '2012-10-17', 'Statement': {'Effect': 'Allow', 'Action': 's3:ListBucket', 'Resource': 'arn:aws:s3:::example_bucket'}}

Reading Secondary Attributes

What is a Secondary Attribute?

Some resources require additional API calls beyond the initial list or describe call to retrieve all their metadata. These are known as Secondary Attributes. These secondary attributes are written as part of write_resources().

How do I retrieve Secondary Attributes?

Let’s say we want to get the value of enableDnsSupport for a VPC. We can get this one of two ways, either by looping over the dictionaries in secondary_attributes on cloudwanderer_metadata, or by calling get_secondary_attribute() with a JMESPath.

>>> first_vpc = next(storage_connector.read_resources(service='ec2', resource_type='vpc'))
>>> first_vpc.load()

>>> first_vpc.cloudwanderer_metadata.secondary_attributes[0]['EnableDnsSupport']
{'Value': True}

>>> first_vpc.get_secondary_attribute(name='vpc_enable_dns_support')
[{'VpcId': 'vpc-11111111', 'EnableDnsSupport': {'Value': True}}]

>>> first_vpc.get_secondary_attribute(jmes_path='[].EnableDnsSupport.Value')

This special way of accesssing secondary attributes ensures that secondary attributes do not conflict with primary attributes if they have the same name.

Deleting Stale Resources

CloudWanderer deletes resources which no longer exist automatically when you run: write_resources().

This has some complexity with regional resources that only exist via global APIs. For example S3 buckets are regional resources, but S3 is a global service so when you call write_resources() for S3 buckets in us-east-1 you will get buckets from all regions due to the nature of the API.

This also means that you will delete S3 buckets that no longer exist from all regions when you call write_resources() in us-east-1.

Individual Resources

Deleting individual resources (if necessary), can be done by calling delete_resource() directly on the storage connector.


>>> vpc = next(storage_connector.read_resources(
...     service='ec2',
...     resource_type='vpc',
... ))
>>> str(vpc.urn)
>>> storage_connector.delete_resource(urn=vpc.urn)
>>> vpc = storage_connector.read_resource(
...     urn=vpc.urn
... )
>>> print(vpc)