Java Regex Capture/Extract Multiple Values

Use Case

When you’re trying to parse complex log lines or extract data from complex strings, regular expression capture groups are about the most useful tool you could possibly ask for.

This example is taken from work where I had to parse and analyze some logs for loading data to a database. A log sample would look like this:

/data/SXF_SX_4906_2019-04-13.01.43.24.143.log:2019-04-13 01:43:28,320 INFO com.x.dc.db.schemagen.batch.listener.JobResultListener [tx.id=IF-TX-ID-a23c195c-673a-47ab-ab0c-7b8591821169] [main] Inside sendEmailNotification method: subject is prod alert:DB copy job STARTED for the dataset:4906

The Code

The relevant part of the code is here:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

private static final String capturePattern =
"^/.*/SXF_SX_(\\d+)_(\\d{4}-\\d{2}-\\d{2}.\\d{2}.\\d{2}.\\d{2}.\\d{3}).log:(.*) INFO.*" +
"copy job (.*) for the dataset:.*"

//Leaving out rest of class, this is just the regex parsing portion.
//isValid, fulLLogEntry, dataSetId, fileTimestamp, logTimestamp, status are all
//member variables in a class where this function is a member.
public DbLoadLog(String line) {

    isValid = true;

    Pattern r = Pattern.compile(capturePattern);
    Matcher m = r.matcher(line);

    //If you wanted to run over a multi-line-string/file, you could put
    //m.find() in a while loop and keep going; but I'm just analyzing specific lines.
    if (m.find()) {
        fullLogEntry = line;
        dataSetId = Integer.valueOf(m.group(1));
        fileTimestamp = m.group(2);
        logTimestamp = m.group(3);
        status = m.group(4);
    }
    else {
        isValid = false;
    }
}