Using Presto/Trino CLI w/ TLS & Passwords Enabled

Not much of a post here, just recording a good way to automatically run queries in the presto cli.

./presto-cli-350-executable.jar \
      --server https://your.cluster.dns:443 \
      --catalog hive \
      --schema default \
      --client-request-timeout "10s" \
      --user "john.humphreys" \
      --password \
      --execute "select * from some_db.some_table limit 10"

If you need the password as well, you can pipe it into the command with the yes command in Linux. I also suspect there is an environment variable you can pre-set for that, but I didn’t dig in to double check.

Python – Find IPs for DNS Name

We were recently trying to find all the IPs we needed to open in a firewall from an Apache proxy. So, we had to resolve a huge number of DNS records to IPs (and relevant ports) programmatically.

I found this very elegant way of getting all the IPs for a DNS name, I hope you find it useful!

import socket
net_info = socket.getaddrinfo("stackoverflow.com", None)
ip_list = set([x[0] for x in [x[4] for x in net_info]])
print(ip_list)

Output:

{'151.101.129.69', '151.101.1.69', '151.101.193.69', '151.101.65.69'}

Using Athena From DBeaver with your IAM Role / Profile

I just spent about 30 minutes working out how to connect to DBeaver using my normal AWS credentials file / default credentials.

Thankfully I stumbled across this GitHub and it worked like a charm: https://github.com/dbeaver/dbeaver/issues/3918#issuecomment-511484596.

Here are the relevant notes (slightly modified for easier understanding):

  1. Do your normal AWS login process to refresh your credentials (in our case, we use okta + gimme_aws_creds for this).
  2. Go to driver properties on your DBeaver Athena connection and set:
    • AwsCredentialsProviderClass to com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider
    • AwsCredentialsProviderArguments equal to the name of the profile you want to use (see ~/.aws/config to see which profiles you have) – we use “default”.
  3. Test Connection and it should work.

Manually Load Docker Image – Avoid Rate Limiting

You can manually load a docker image to a server when you need to. This is useful to get around DockerHub rate limiting in an urgent situation.

# On your laptop.
docker pull busybox:1.4.1
docker save busybox:1.34.1 > busybox-1-34-1.tar
aws s3 cp busybox1-341.tar s3://your-s3-bucket/busybox-1-34-1.tar

# On remote node.
aws s3 cp s3://your-s3-bucket/busybox-1-34-1.tar /tmp/busybox-1-34-1.tar
docker load -i busybox-1-34-1.tar

You can use anything that both your local host and the target host have access to. I just used s3 as it was most convenient in my case. We have SSH disabled on our production nodes, or you could have just SSH’d it across too.

PrestoSQL / Presto UI – Get stats programmatically via API

If you’re having trouble getting the /ui/api/stats info programmatically, you can use this script. Its ill-advised as they may change those APIs at any time; but as some of the UI stats are better/more correct than prometheus stats, you may need them as we did.

% COOKIE_VALUE=$(curl --location --request POST 'https://some.company.com/ui/login' \
--data-urlencode 'username=john.humphreys' \
--data-urlencode 'password=<password>' --cookie-jar - --output /dev/null  --silent | awk '{print $7}' | tail -1l)

curl 'https://some.company.com/ui/api/stats' -H $''"Cookie: Presto-UI-Token=$COOKIE_VALUE"'' | jq --color-output

{
  "runningQueries": 8,
  "blockedQueries": 0,
  "queuedQueries": 0,
  "activeCoordinators": 1,
  "activeWorkers": 35,
  "runningDrivers": 3957,
  "totalAvailableProcessors": 2450,
  "reservedMemory": 2770000473,
  "totalInputRows": 1133212564136,
  "totalInputBytes": 10872687401451,
  "totalCpuTimeSecs": 777021
}