Querying LDAP From Python the Easy Way

Historically, using LDAP in python could be fairly painful because you had to install python-ldap, which could be hard depending on your environment. E.g. getting that installed in a Jupyter notebook where I work proved impossible without changing the underlying docker image for the notebook.

Most search results will still lead you to python-dap, but now you can and should use python-ldap3 instead. This library is pure-python and does not have any awkward OS dependencies. So, it “just works” and is much lighter.

Here is an example of how to login with a service account and query a user via email.

import ldap3

# Put in params up top.
SERVICE_ACCOUNT="<user>"
SERVICE_ACCOUNT_PASSWORD="<password>"
LDAP_URI="ldaps://<your-ldap-dns>:636"

search_base = 'DC=foo,DC=bar'
search_filter = '(&(mail=john.doe@somecompany.com))'
attrs = ["*"]

server = ldap3.Server(LDAP_URI)
with ldap3.Connection(server, auto_bind=True, user=SERVICE_ACCOUNT, password=SERVICE_ACCOUNT_PASSWORD) as conn:
    conn.search(search_base, search_filter, attributes=attrs)
    print(len(conn.entries))

To run this, you just have to do a quick pip install as shown below. I recommend you use the latest version; but I locked it here just to remind you that locking a version is smart in most python projects. Version drift causes many production issues.

pip install ldap3==2.9.1

Using Presto/Trino CLI w/ TLS & Passwords Enabled

Not much of a post here, just recording a good way to automatically run queries in the presto cli.

./presto-cli-350-executable.jar \
      --server https://your.cluster.dns:443 \
      --catalog hive \
      --schema default \
      --client-request-timeout "10s" \
      --user "john.humphreys" \
      --password \
      --execute "select * from some_db.some_table limit 10"

If you need the password as well, you can pipe it into the command with the yes command in Linux. I also suspect there is an environment variable you can pre-set for that, but I didn’t dig in to double check.

Python – Find IPs for DNS Name

We were recently trying to find all the IPs we needed to open in a firewall from an Apache proxy. So, we had to resolve a huge number of DNS records to IPs (and relevant ports) programmatically.

I found this very elegant way of getting all the IPs for a DNS name, I hope you find it useful!

import socket
net_info = socket.getaddrinfo("stackoverflow.com", None)
ip_list = set([x[0] for x in [x[4] for x in net_info]])
print(ip_list)

Output:

{'151.101.129.69', '151.101.1.69', '151.101.193.69', '151.101.65.69'}

Using Athena From DBeaver with your IAM Role / Profile

I just spent about 30 minutes working out how to connect to DBeaver using my normal AWS credentials file / default credentials.

Thankfully I stumbled across this GitHub and it worked like a charm: https://github.com/dbeaver/dbeaver/issues/3918#issuecomment-511484596.

Here are the relevant notes (slightly modified for easier understanding):

  1. Do your normal AWS login process to refresh your credentials (in our case, we use okta + gimme_aws_creds for this).
  2. Go to driver properties on your DBeaver Athena connection and set:
    • AwsCredentialsProviderClass to com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider
    • AwsCredentialsProviderArguments equal to the name of the profile you want to use (see ~/.aws/config to see which profiles you have) – we use “default”.
  3. Test Connection and it should work.

Manually Load Docker Image – Avoid Rate Limiting

You can manually load a docker image to a server when you need to. This is useful to get around DockerHub rate limiting in an urgent situation.

# On your laptop.
docker pull busybox:1.4.1
docker save busybox:1.34.1 > busybox-1-34-1.tar
aws s3 cp busybox1-341.tar s3://your-s3-bucket/busybox-1-34-1.tar

# On remote node.
aws s3 cp s3://your-s3-bucket/busybox-1-34-1.tar /tmp/busybox-1-34-1.tar
docker load -i busybox-1-34-1.tar

You can use anything that both your local host and the target host have access to. I just used s3 as it was most convenient in my case. We have SSH disabled on our production nodes, or you could have just SSH’d it across too.