Is Presto Slow When Returning Millions of Records / Big Result Set?

What Was Our Problem?

We were having issues with people reporting that Presto was slow when they were exporting hundreds of millions of records from much larger tables.  The queries were simple where clause filters selecting a few fields from some hundred-billion record tables.

Was It Actually Slow?

No! At least, not when parallelized well and tested properly.  I wrote a Java app to

  • Split a query into N =100 parts by using where clauses with a modulus on an integer column.
  • Query presto in parallel with 30 threads, going through the 100 queries.
  • Output results to standard out.

In an external bash script, I also grepped the results just to show some statistics I outputted.

This was slow!

Why Was it Slow?

First of all, let’s talk about the presto cluster setup:

  • 1 coordinator.
  • 15 workers.
  • All m5.8xlarge = 128GB RAM / 32 processor cores.

This is pretty decent.  So, what could our bottlenecks  be?

  1. Reading from s3.
  2. Processing results in workers.
  3. Slow coordinator due to having 15 workers talk through it.
  4. Slow consumer (our client running the queries).

To rule out 1/2/3 respectively I:

  1. Did a count(*) query which would force a scan over all relevant s3 data.  It came back pretty fast (in 15 seconds or so).
  2. Added more workers.  Having more workers  had minimal effect on the final timings, so we’re not worker bound.
  3. Switched the coordinator to a very large, compute-optimized node type.  This had minimal effect on the timings as well.

So, the problem appears to be with the client!

Why Was the Client Slow?

Our client really wasn’t doing a lot.  It was running 30 parallel queries and outputting the results, which were being grepped.  It was a similarly sized node to our presto coordinator, and it had plenty of CPU, RAM, decent network and disks (EBS).

It turned out though that once we stopped doing the grep and once we stopped writing the results to stdout, and we just held counters/statistics on the results we read, it went from ~25 minutes to ~2 minutes.

If we had run this in Spark or some other engine with good parallel behavior, we would have seen the workload distribute better over more nodes with sufficient ability to parallel process their portions of the records.  But, since we were running on a single node, with all results, the threads/CPU and other resoruces we were using capped out and could not go any faster.

Note: we did not see the client server as having high utilization, but some threads were at 100%.  So, the client app likely had a bottleneck we could avoid if we improved it.

Summary

So… next time you think presto can’t handle returning large numbers of results from the coordinator, take some time to evaluate your testing methodology.  Presto isn’t designed to route hundreds of millions of results, but it does it quite well in our experience.

 

Add a New UDF SQL Function to Presto – Support unix_timestamp From Hive

Some Context

We deploy presto in-house and manage a fork of the code base so that we can introduce company-specific features for auditing, security, or integrations.

One of the main features we provided is support for hive views (which I now see is coming into presto-sql version 330 (but that’s not really relevant here).

https://prestosql.io/docs/current/release/release-330.html

Add experimental support for executing basic Hive views. To enable this feature, the hive.views-execution.enabled configuration property must be set to true. (#2715)

A recent feature request for our implementation was to add support for the unix_timestamp function so that this expression worked:

"unix_timestamp"(a.creation_datetime, 'yyyy-MM-dd')

The Code

There is some good information on making functions here: https://prestosql.github.io/docs.prestosql.io/current/develop/functions.html.  But I mostly just referenced other DateTime functions in the DateTimeFunctions class within the presto code base (which also happens to be where I added the new unix_timestamp function implementation shown below.

    @ScalarFunction("unix_timestamp")
    @LiteralParameters({"x", "y"})
    @SqlType(StandardTypes.DOUBLE)
    public static double unixTimestamp(@SqlType("varchar(x)") Slice datetime, @SqlType("varchar(y)") Slice formatString)
    {
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat(formatString.toStringUtf8());
        simpleDateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
        try {
            Date parse = simpleDateFormat.parse(datetime.toStringUtf8());
            return parse.toInstant().toEpochMilli() / 1000d;
        }
        catch (ParseException e) {
            throw new PrestoException(INVALID_FUNCTION_ARGUMENT, "'" + datetime.toStringUtf8() +
                    " cannot be parsed by format string " + formatString.toStringUtf8() + ".");
        }
    }

This function takes a varchar date or column, and a varchar format string, and it parses out the date in UTC to the unix time (seconds since 1970). This should follow the hive behavior for this overload (hopefully!).

Now, you can call the function (for this overload) in presto-sql just like you can in hive. We still have to do more testing and we need to provide more overloads, but this is a working implementation, so I hope it helps you do the same.

Launch Spring-Boot JAR From Different Main Class

I found this very useful stack overflow:

https://stackoverflow.com/a/36552613/857994

It shows you how to start a spring-boot JAR from a different main class.  It’s a quirky solution, but it worked great.  Here’s a slightly more obvious/cleaner copy of the command (mostly for my own future reference):

java -cp presto-ws-3.2.2.jar -Dloader.main=com.company.PrestoQueryRunner org.springframework.boot.loader.PropertiesLauncher

Spring Boot @Autowired Abstract/Base Class

Properly using (and not overusing) abstract classes is vital to having a cleanly organized and DRY application.  Developers tend to get confused when trying to inject services into abstract or base classes in Spring Boot though.

Setter injection actually works fine in base classes; but as noted here, you should  make sure to use the final keyword on the setter to prevent it from being overridden.

public abstract class AbstractSchemaService {
    ...
    private StringShorteningService shortener;

    @Autowired
    public final void setStringShorteningService(StringShorteningService shortener) {
        this.shortener = shortener;
    }
    ...
}

Java – Get Parent Class Private Variable in Sub Class

I’m working on making a derived version of the Presto Hive plugin.  Unfortunately, the base class I need to inherit from uses its own private class loader, and the function I need to override (which is override-able!) for some reason requires that class loader as a parameter.

Anyway, long story short, I need to get the parent object’s private field to use it in the sub class I’m creating.  Reflection to the rescue!

Note: This is not generally good programming practice. Understand what the code does and why it does it before doing this.

Solution

//Class A file.

public class ClassA {
    private String name;
    public ClassA() {
        this.name = "Hello World!";
    }
}

// Class B file.

import java.lang.reflect.Field;

public class ClassB extends ClassA {
    public ClassB() {
        super();
    }

    public void printSuperPrivateMember() throws Exception {
        Field nameField = ClassA.class.getDeclaredField("name");
        nameField.setAccessible(true);
        System.out.println((String) nameField.get(this));
    }

    public static void main(String[] args) throws Exception {
        ClassB b = new ClassB();
        b.printSuperPrivateMember();
    }
}

Spring Time out REST HTTP Calls With RestTemplate

No Timeouts By Default!

Spring’s RestTemplate is an extremely convenient way to make REST calls to web services.  But most people don’t realize initially that these calls have no timeout by default.  This means no connection timeout and no data call timeout.  So, potentially, your app can make a call that should take 1 second and could freeze up for a very long time if the back end is behaving badly.

Setting a Timeout

There are a lot of ways of doing this, but the best one I’ve seen recently (from this stackoverflow post) is to create the RestTemplate in an @Configuration class and then inject it into your services.  That way you know the RestTemplate you are using everywhere was configured properly with your desired timeouts.

Here is a full example.

package com.company.cloudops.config;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.web.client.RestTemplateBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;
import java.time.Duration;

@Configuration
public class AppConfig {

    @Value("${rest.template.timeout}") private int restTemplateTimeoutMs;

    @Bean
    public RestTemplate restTemplate(RestTemplateBuilder builder) {
        return builder
                .setConnectTimeout(Duration.ofMillis(restTemplateTimeoutMs))
                .setReadTimeout(Duration.ofMillis(restTemplateTimeoutMs))
                .build();
    }
}

To use this RestTemplate in another Spring bean class, just pull it in with:

@Autowired private RestTemplate template;

Connection Pooling With Spring 2.0 Hikari – Verify Idle Timeouts are Working

Use Case

I’ve been working on an odd API project where each user needs their own connection to various back-end databases/data-sources.  This is a break from the norm because in general, you set up a connection pool of, say, 10 connections and everyone shares it and you’re golden.

If you have 500 users throughout the day though and each one gets some connections, that would be a disaster.  So, in my case making sure the pool is of limited size and making sure the idle timeout works is pretty vital.  So, I started playing around to see how I can verify old connections are really being removed.

My Configuration

I had started with an Apache BasicDataSource (old habits die hard).  But then when I enabled debug I didn’t see connections being dropped, or info on them being logged at all for that matter.  Before bothering with trace, I started reading about Hikari which is a connection pool I see spring using a lot… and it looked pretty awesome! See some good performance and usage info right here.

Anyway! I switched to Hikari quick which was easy since its already in Spring Boot 2.X (which I habitually use for everything these days).

Here’s my Spring config class/code. I have it set in properties to allow a minimum of 0 connections, to time out connections after 60 seconds, and to have a maximum of 4 connections. Connections are tested with “select 1” which is pretty safe on most databases.

@Configuration
public class Config {

    //Configuration for our general audit data source.
    private @Value("${audit.ds.url}") String auditDsUrl;
    private @Value("${audit.ds.user}") String auditDsUser;
    private @Value("${audit.ds.password}") String auditDsPassword;

    @Bean
    public DataSource auditDataSource() {
        HikariConfig config = new HikariConfig();
        config.setJdbcUrl(auditDsUrl);
        config.setUsername(auditDsUser);
        config.setPassword(auditDsPassword);
        config.setMaximumPoolSize(4);
        config.setMinimumIdle(0);
        config.setIdleTimeout(60000);
        config.setConnectionTestQuery("select 1");
        config.setPoolName("Audit Pool");
        config.setValidationTimeout(10000);
        return new HikariDataSource(config);
    }

    @Bean
    public NamedParameterJdbcTemplate auditJdbcTemplate() {
        return new NamedParameterJdbcTemplate(auditDataSource());
    }
}

Verifying it Works

After sending a query to my API, where it uses a basic JDBC template to execute the query, I see the logs do this (note that I removed the date/time/class/etc for brevity).

Audit Pool - Before cleanup stats (total=0, active=0, idle=0, waiting=0)
Audit Pool - After cleanup stats (total=0, active=0, idle=0, waiting=0)
Audit Pool - Before cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - After cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - Before cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - After cleanup stats (total=1, active=0, idle=1, waiting=0)
Audit Pool - After cleanup stats (total=0, active=0, idle=0, waiting=0)
Audit Pool - Closing connection PG...: (connection has passed idleTimeout)

So, we can see that it went from 0 connections total, to 1 connection total. The connection looks idle pretty quick because it was a short query that was done before the regular output log. Then after a minute, the connection gets closed and the total goes back to 0.

So, we’re correctly timing out idle connections using our settings. Also, we’re getting our pool name (Audit Pool) in the logs which is awesome too!