| Title: | Azure storage authentication toolkit |
|---|---|
| Description: | Handles authentication and basic tasks with Azure blob storage. |
| Authors: | Fran Barton [aut, cre] (ORCID: <https://orcid.org/0000-0002-5650-1176>), Tom Jemmett [ctb] |
| Maintainer: | Fran Barton <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.1 |
| Built: | 2026-05-20 19:30:57 UTC |
| Source: | https://github.com/The-Strategy-Unit/azkit |
Check that a container looks like a real container
check_container_class(container)check_container_class(container)
container |
An Azure container object, as returned by get_container |
Will error if x is equal to "", or if it is otherwise missing or invalid.
With the exception that if x is NULL, then NULL will be passed through.
check_nzchar(x, message, pf = parent.frame())check_nzchar(x, message, pf = parent.frame())
x |
The object to be checked |
message |
A custom error message, as a string. Will be shown to the
user if the check does not pass. Can include |
pf |
Set as |
This function makes it easy to use the is_scalar_* functions from {rlang}
to check the type of x, and that length(x) == 1, and supports the
seamless use of glue strings in the custom error message.
Possible values for the type parameter are: "character", "logical", "list",
"integer", "double", "string", "bool", "bytes", "raw", "vector", "complex".
check_scalar_type(x, type, message, pf = parent.frame())check_scalar_type(x, type, message, pf = parent.frame())
x |
The object to be checked |
type |
A string defining the R object type that |
message |
A custom error message, as a string. Will be shown to the
user if the predicate check does not succeed. Can include |
pf |
Set as |
If the predicate function is true of x then x is returned. Otherwise,
an error is thrown with a custom message.
check_that(x, predicate, message, pf = parent.frame())check_that(x, predicate, message, pf = parent.frame())
x |
The object to be checked |
predicate |
The predicate function used to check |
message |
A custom error message, as a string. Will be shown to the
user if the predicate check does not succeed. Can include |
pf |
Set as |
This function makes it easy to use the {purrr} functions every(),
some() and none() to handle vector inputs of length >= 1, and supports
the seamless use of glue strings in the custom error message.
Not suitable for checking if length(x) == 1 as it will check vectors
element-wise, so will potentially return TRUE even if length(x) > 1
check_vec( x, predicate, message, which = c("every", "some", "none"), pf = parent.frame() )check_vec( x, predicate, message, which = c("every", "some", "none"), pf = parent.frame() )
x |
The object to be checked |
predicate |
The predicate function used to check elements of |
message |
A custom error message, as a string. Will be shown to the
user if the predicate check does not succeed. Can include |
which |
One of "every", "some", "none", as a string. Defines which
|
pf |
Set as |
check_scalar_type functionA custom error message generator for the check_scalar_type function
cst_error_msg(text)cst_error_msg(text)
text |
The main text of the error message |
check_that functionA custom error message generator for the check_that function
ct_error_msg(text)ct_error_msg(text)
text |
The main text of the error message |
check_vec functionA custom error message generator for the check_vec function
cv_error_msg(text)cv_error_msg(text)
text |
The main text of the error message |
resource parameter in get_auth_token
A helper function to generate appropriate values. Ensure that the version
argument matches the aad_version argument to get_auth_token.
It's unlikely that you will ever want to set authorise to FALSE but it's
here as an option since AzureAuth::get_azure_token supports it. Similarly,
you are likely to want to keep refresh turned on (this argument has no
effect on v1 tokens, it only applies to v2).
generate_resource( version = 1, url = "https://storage.azure.com", path = "/.default", authorise = TRUE, refresh = TRUE )generate_resource( version = 1, url = "https://storage.azure.com", path = "/.default", authorise = TRUE, refresh = TRUE )
version |
numeric. The AAD version, either 1 or 2 (1 by default) |
url |
The URL of the Azure resource host |
path |
For v2, the path designating the access scope |
authorise |
Boolean, whether to return a token with authorisation scope, (TRUE, the default) or one that just provides authentication. You are unlikely to want to turn this off |
refresh |
Boolean, applies to v2 tokens only, whether to return a token that has a refresh token also supplied. |
A scalar character, or (in most v2 situations) a character vector
This function retrieves an Azure token for a specified resource.
This method avoids the need to refresh by re-authenticating online. It seems
that this only works with v1 tokens. (v2 tokens always seem to refresh via
online re-authentication, but they ought to refresh automatically.)
To instead generate a completely fresh token, set force_refresh = TRUE in
get_auth_token
get_auth_token( resource = generate_resource(), tenant = "common", client_id = NULL, auth_method = "authorization_code", aad_version = 1, force_refresh = FALSE, ... ) refresh_token(token)get_auth_token( resource = generate_resource(), tenant = "common", client_id = NULL, auth_method = "authorization_code", aad_version = 1, force_refresh = FALSE, ... ) refresh_token(token)
resource |
For v1, a simple URL such as |
tenant |
A string specifying the Azure tenant. Defaults to
|
client_id |
A string specifying the application ID (aka client ID). If
|
auth_method |
A string specifying the authentication method. Defaults to
|
aad_version |
Numeric. The AAD version, either 1 or 2 (1 by default) |
force_refresh |
logical. Whether to use a stored token if available
( |
... |
Optional arguments (eg |
token |
An Azure authentication token |
It will try to get a managed token when used within a managed resource such as Azure VM or Azure App Service.
If this method does not return a token, it will try to retrieve a user token
using the provided parameters, requiring the user to have authenticated
using their device. If force_refresh is set to TRUE, a fresh web
authentication process should be launched. Otherwise it will attempt to use
a cached token matching the given resource, tenant and aad_version.
An Azure token object
An Azure authentication token
## Not run: # Get a token for the default resource token <- get_auth_token() # Force generation of a new token via online reauthentication token <- get_auth_token(force_refresh = TRUE) # Get a token for a specific resource and tenant token <- get_auth_token( resource = "https://graph.microsoft.com", tenant = "my-tenant-id" ) # Get a token using a specific app ID token <- get_auth_token(client_id = "my-app-id") # Use a secret token <- get_auth_token( tenant = "my-tenant-id", client_id = "my-app-id", auth_method = "client_credentials", password = "123459878&%^" ) ## End(Not run)## Not run: # Get a token for the default resource token <- get_auth_token() # Force generation of a new token via online reauthentication token <- get_auth_token(force_refresh = TRUE) # Get a token for a specific resource and tenant token <- get_auth_token( resource = "https://graph.microsoft.com", tenant = "my-tenant-id" ) # Get a token using a specific app ID token <- get_auth_token(client_id = "my-app-id") # Use a secret token <- get_auth_token( tenant = "my-tenant-id", client_id = "my-app-id", auth_method = "client_credentials", password = "123459878&%^" ) ## End(Not run)
It may be helpful to set the environment variable "AZ_STORAGE_EP". This can contain your usual Azure storage endpoint URL should you not wish to pass it in explicitly to the function. You may find it helpful to use list_container_names to get a list of available container names.
get_container( container_name, endpoint_url = Sys.getenv("AZ_STORAGE_EP"), token = get_auth_token() )get_container( container_name, endpoint_url = Sys.getenv("AZ_STORAGE_EP"), token = get_auth_token() )
container_name |
Name of the container as a string. |
endpoint_url |
An Azure endpoint URL. |
token |
An Azure authentication token, or a function that returns one. Uses get_auth_token by default. |
An Azure blob container (list object of class "blob_container")
This function checks if the Azure Instance Metadata Service (IMDS) is available by attempting to make a request to the IMDS endpoint. The result is cached in an environment variable for future use, saving the need for repeated checks.
imds_available()imds_available()
You can also set the IMDS_AVAILABLE environment variable manually to
"TRUE" or "FALSE" to override the automatic check, which can be useful for
testing or in environments where the check may not work correctly.
Return a list of container names that are found at the endpoint
list_container_names( endpoint_url = Sys.getenv("AZ_STORAGE_EP"), token = get_auth_token() )list_container_names( endpoint_url = Sys.getenv("AZ_STORAGE_EP"), token = get_auth_token() )
endpoint_url |
An Azure endpoint URL. |
token |
An Azure authentication token, or a function that returns one. Uses get_auth_token by default. |
A character vector of all container names found
Lists all files (recursively, if desired) found in a container within a
given directory (dir). The search can be restricted to files with a
specific extension.
list_files(container, dir = "", ext = "", recursive = FALSE)list_files(container, dir = "", ext = "", recursive = FALSE)
container |
An Azure container object, as returned by get_container |
dir |
(optional) The directory of the container to list files within.
|
ext |
(optional) A string giving the extension of a particular file type
to restrict the list to. No need to include the initial ".". The default,
|
recursive |
logical: whether to list files recursively. Default |
The function does not support filtering by file name, only by file extension.
The returned file list (character vector) contains the full paths to the
files, ready to be passed perhaps to a read_azure_* function, or filtered
further. If you just want the names of the files without the folder path,
use basename to extract these.
A vector of file names, or an empty character vector if none found
## Not run: list_files(get_container("example"), ext = "csv") ## End(Not run)## Not run: list_files(get_container("example"), ext = "csv") ## End(Not run)
Read a csv file from Azure storage
read_azure_csv(container, file, ...)read_azure_csv(container, file, ...)
container |
An Azure container object, as returned by get_container |
file |
string The path to the file to be read. |
... |
optional arguments to be passed through to readr::read_delim |
A tibble
Read any file from Azure storage
read_azure_file(container, file, ...)read_azure_file(container, file, ...)
container |
An Azure container object, as returned by get_container |
file |
string The path to the file to be read. |
... |
optional arguments to be passed through to AzureStor::download_blob |
A raw data stream
Read a json file from Azure storage
read_azure_json(container, file, ...)read_azure_json(container, file, ...)
container |
An Azure container object, as returned by get_container |
file |
string The path to the file to be read. |
... |
optional arguments to be passed through to yyjsonr::read_json_raw |
A list
Read a json.gz file from Azure storage
read_azure_jsongz(container, file, ...)read_azure_jsongz(container, file, ...)
container |
An Azure container object, as returned by get_container |
file |
string The path to the file to be read. |
... |
optional arguments to be passed through to yyjsonr::read_json_file |
A list
Read a parquet file from Azure storage
read_azure_parquet(container, file, ...)read_azure_parquet(container, file, ...)
container |
An Azure container object, as returned by get_container |
file |
string The path to the file to be read. |
... |
optional arguments to be passed through to arrow::read_parquet |
A tibble
## Not run: read_azure_parquet(cont, "data/folder/path/1.parquet") ## End(Not run)## Not run: read_azure_parquet(cont, "data/folder/path/1.parquet") ## End(Not run)
Read an rds file from Azure storage
read_azure_rds(container, file, ...)read_azure_rds(container, file, ...)
container |
An Azure container object, as returned by get_container |
file |
string The path to the file to be read. |
... |
optional arguments to be passed through to
AzureStor::storage_load_rds. For example, a compression type (one of
c("unknown", "gzip", "bzip2", "xz", "zstd", "none")) can be provided using
the argument |
The data object that was stored in the rds file
Read in data from an Azure table
read_azure_table( table_name, table_endpoint = Sys.getenv("AZ_TABLE_EP"), token = get_auth_token(), filter = NULL, select = NULL, top = NULL )read_azure_table( table_name, table_endpoint = Sys.getenv("AZ_TABLE_EP"), token = get_auth_token(), filter = NULL, select = NULL, top = NULL )
table_name |
Name of the table to be read. |
table_endpoint |
An Azure table endpoint URL. |
token |
An Azure authentication token, or a function that returns one. Uses get_auth_token by default. |
filter |
An OData filter string to filter the results. |
select |
An OData select string to specify which properties to return. |
top |
An integer specifying the maximum number of records to return. |
A tibble
Read in data from an Azure table
read_azure_table_single_entity( table_name, partition_key, row_key, table_endpoint = Sys.getenv("AZ_TABLE_EP"), token = get_auth_token() )read_azure_table_single_entity( table_name, partition_key, row_key, table_endpoint = Sys.getenv("AZ_TABLE_EP"), token = get_auth_token() )
table_name |
Name of the table to be read. |
partition_key |
The partition key of the entity to be read. |
row_key |
The row key of the entity to be read. |
table_endpoint |
An Azure table endpoint URL. |
token |
An Azure authentication token, or a function that returns one. Uses get_auth_token by default. |
A tibble