VectorWise DB (Actian) PDT Issues

Ingres Vector(wise) uses a PDT (positional delta tree) to store in-memory updates to tables which have not yet been flushed to disk. When executing incremental updates to tables using standard SQL operations, instead of the Actian-recommended combine operations, sometimes errors occur due to data in the PDT being used improperly. These errors are just due to bugs in VW which Actian tends to address quickly – but knowing how to detect them and narrow your problems down to them is a useful skill when working with this product. It allows you to create a useful bug report and move on to other problems so you don’t waste too much of your own time.

When does this matter?

Usually these problems manifest themselves when you are moving data (inserts, updates, or deletions) from one table with pending in memory updates to another table. So, for example, if you stage your incremental updates in table X and go to merge them to table Y, you may find that table X’s in-memory changes conflict with the data you are currently working with.

What are the visible effects?

PDT issues can cause a number of problems to occur as you might imagine. You may be executing an update, and it may tell you that the update is ambiguous because you have a record in your current data set and a record in the PDT which both match the given key. Again, this should not happen, but it does sometimes due to various bugs. Similarly, you may be told that a duplciate TID exists while doing a deletion if a similar deletion has occurred in the past in the same table and it’s still in the PDT. Finally, you may also be told that a duplicate insertion is occuring for the same reasons. This last one can result in a primary key violation or you literally ending up with two records in your target table when only one was in your current data set (the PDT version and the version in the staging table both get inserted).

How do I tell if PDTs currently exist?

Assuming I create two test tables and insert two records into table ‘test_one’, the command “vwinfo -M ” displays information as follows:

 +--------------------------------+--------------------------------+--------------------+--------------------+--------------------+--------------------+
 |schema_name                     |table_name                      |pdt_mem             |pdt_inserts_count   |pdt_deletes_count   |pdt_modifies_count  |
 +--------------------------------+--------------------------------+--------------------+--------------------+--------------------+--------------------+
 |$ingres                         |iiprimitives                    |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_expr                   |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_expr_global            |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_io                     |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_io_global              |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_op                     |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_op_global              |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_parse_tree             |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_parse_tree_global      |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_query                  |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_query_global           |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_stage                  |                   0|                   0|                   0|                   0|
 |$ingres                         |iivwprof_stage_global           |                   0|                   0|                   0|                   0|
 |ingres                          |test_one                        |                2542|                   2|                   0|                   0|
 |ingres                          |test_two                        |                   0|                   0|                   0|                   0|
 +--------------------------------+--------------------------------+--------------------+--------------------+--------------------+--------------------+

Most of this is arbitrary output for system tables, bu the bottom two lines show the tables created for this sample. We can see both inserts accounted for as well as the overall memory used for the PDTs for each table. Table 2 uses none since it hasn’t been altered yet. If we had performed deletions or updates over multiple transactions, they would be represented here as well. If you cancel out operations in the same transaction (insert and delete the same record, for example), those operations will not show up in the PDTs when the transaction is committed.

How can I flush the PDTs?

Since PDTs can cause problems, the best thing you can do is to flush them to disk if you know there is an issue, after which all values in this “vwinfo” output will be 0. Two operations can flush PDTs; the first is ‘combine’ and the second is ‘truncate’. The combine command “call vectorwise(combine ‘tablename’);” can be executed from any SQL prompt and it will flush in memory updates to disk. You can also use it to do table arithmetic like table_1 – table_2 + table_2 to delete all of the values of table 2 from table one and then reinsert them. Ingres actually recommends that you use combine to do incremental updates in the first place rather than using straight SQL. The second command, truncate, is of the form: “modify to truncated;” and can also be executed from any SQL prompt. You should not use the truncate command on your data tableas as it will wipe them out. You can ue them on your staging tables after every transaction to ensure their PDTs are totally removed though, assuming you’re okay with the staging tables having no data between transactions.

A new article will be written to talk about the usage of combine in more detail, so please look for that if you are interested in learning more.

Java LRU Cache with LinkedHashMap

We created a LRU cache from scratch in an earlier article located here. This is somewhat error prone and takes a reasonable amount of work due to the fact that you have to track additions to the doubly-linked-list and the cache in parallel. Tracking all of the null/not-null empty/full edge cases gets rather burdensome. Enter the Java LinkedHashMap.

The LinkedHashMap Class

The LinkedHashMap class provides functionality by default which is very similar to what we need to form a LRU cache. If we derive from it, we can pass the constructor args (capacity + 1, 1.1f, true). The first two arguments tell it to leave one extra items space and not to re-size until its load is 1.1x the amount of elements, and the third argument tells it to keep its items in access order (most recently accessed at the head) in its backing linked list. These arguments were suggested by Pete Ford in this article. We avoid the resize with 1.1f just for efficiency, and we leave one extra element’s worth of space as elements are added before they are removed (so, for a 5 element cache we would technically have 6 elements for a second when the last element was added, before it is automatically removed. This brings us to our next point…

Automatic Removal

The automatic removal is build in to LinkedHashMap as well, but we need to give it some help. It doesn’t have the removal policy that we need by default. So, we have to explicitly tell it to remove an element when it’s size breaches the capacity we would like to be our maximum. We can tell the underlying LinkedHashMap to use this policy by overriding removeEldestEntry(Map.Entry eldest), as you will see in the below code.

The Code

The following code, minus the main which is just present as an example, is only about 8 lines excluding white space and braces. It creates a class deriving from LinkedHashMap, saves its capacity, and overrides the remove-eldest-entry function as we talked about. It provides all the required functionality from our first article despite its simplicity. So, enjoy!

 import java.util.LinkedHashMap;
 import java.util.Map;

 public class LRUCacheLHM<K,V> extends LinkedHashMap<K,V> {

     private int capacity;

     public LRUCacheLHM(int capacity) {
         //1 extra element as add happens before remove (101), and load factor big
         //enough to avoid triggering resize.  True = keep in access order.
         super(capacity + 1, 1.1f, true);
         this.capacity = capacity;
     }

     @Override
     public boolean removeEldestEntry(Map.Entry eldest) {
         return size() > capacity;
     }

     //--------------------------------------------------------------------
     // That's it; the rest is just a main method to show example usage.
     //--------------------------------------------------------------------

     public static void main(String[] args) {

         //Create cache and add 8 elements.  The first 3 (hey, how, are) should drop off.
         LRUCacheLHM<Integer, String> cache = new LRUCacheLHM<>(5);
         cache.put(0, "Hey");
         cache.put(1, "how");
         cache.put(2, "are");
         cache.put(3, "you");
         cache.put(4, "doing?");
         cache.put(5, "I'm");
         cache.put(6, "good");
         cache.put(7, "thanks");
         System.out.println(cache);

         //Should move 3/you to the head.
         cache.get(3);
         System.out.println(cache);

         //Should put 8/x at the head.
         cache.put(8, "x");
         System.out.println(cache);
     }
 }

 

The Output

Running this class’s main() method will produce the following output which lets you see the eldest items being removed, and the most recently accessed items being moved forward in the list. The toString() representation of the cache shows its iteration order, and thus its removal order (back of the list first).

 {3=you, 4=doing?, 5=I'm, 6=good, 7=thanks}
 {4=doing?, 5=I'm, 6=good, 7=thanks, 3=you}
 {5=I'm, 6=good, 7=thanks, 3=you, 8=x}

Java LRU Cache (Simplistic Version)

This sample shows how to create a LRU cache, which is a map where the least recently used element is available in constant time.

How Does it Work?

Making an LRU cache can seem difficult, but there is a relatively simple trick to it. The key is in combining two data structures, a map and a linked list. Basically, we store a map and and an integer to represent its maximum capacity. Whenever we add an item to the map, we also add it to the front of the linked list. When the map is at its maximum capacity and we add another item, the last item in the linked list represents the least-recently used item in the map. So, we remove that least recently used item from the cache and the map and add the new item, returning the old item to the caller. We also need to make sure to update the map and the list when items are accessed with get() or when they are removed. An item accessed with get() becomes the most recently used item and has to be promoted to the front of the list.

Also note that we are really tracking the most recently used item and the least recently used item at the same time. One is the front of the linked list and the other is the back of the linked list.

Sample Implementation

The following code is heavily commented and fully functional. You may double click it to and copy it so that you can run it in your IDE of choice to help understand it better.

 import java.util.LinkedList;
 import java.util.Map;
 import java.util.TreeMap;

 //This data structure is designed to hold a limited-size cache
 //in the form of a map, and to track the most and least recently
 //used elements in the cache so that they can be retrieved in
 //constant time.
 public class LRUCache {

     //Private members for the capacity, the map itself, and the
     //list used to age elements.
     private final int capacity;
     private Map<String, String> cache;
     private LinkedList<String> ageList;

     //Constructor for initializing the cache.
     public LRUCache(final int capacity) {
         this.capacity = capacity;
         cache = new TreeMap<>();
         ageList = new LinkedList<>();
     }

     //Put a new element into the cache.
     public String put(final String key, final String value) {

         //If the cache is full, remove the oldest element and
         //save it as the result.  The oldest element is the
         //last element in the age list.
         String result = null;
         if (cache.size() >= capacity) {
             final String removed = ageList.removeLast();
             cache.remove(removed);
             result = removed;
         }

         //Add the new element to the cache and the front of the
         //age list.
         cache.put(key, value);
         ageList.add(0, key);

         //Return the removed element, or null if the cache was
         //not yet full.
         return result;
     }

     //Retrieve an existing element from the cache.
     public String get(final String key) {

         //Try and retrieve the value from the cache.
         final String value = cache.get(key);

         //If we found the value, re-add it to the front of the
         //age list to record that it has been recently used.
         if (value != null) {
             ageList.remove(key);
             ageList.addFirst(key);
         }

         //Return the value or null if it was not fund.
         return value;
     }

     //Remove a value from the cache.  If the given value is
     //really found in the cache, make sure to remove it from
     //the age list as well.
     public String remove(final String key) {
         final String value = cache.remove(key);
         if (value != null) {
             ageList.remove(key);
         }
         return value;
     }

     //Get the most recently used element in the cache.
     public String getNewestElement() {
         return ageList.getFirst();
     }

     //Get the least recently used element in the cache.
     public String getOldestElement() {
         return ageList.getLast();
     }

     //Return the cache contents.
     @Override
     public String toString() {
         return cache.toString();
     }

     public static void main(String[] args) {
         final LRUCache cache = new LRUCache(10);
         cache.put("A", "Armadillo");
         cache.put("B", "Bronco");
         cache.put("C", "Camel");
         cache.put("D", "Dingo");
         cache.put("E", "Elephant");
         cache.put("F", "Frog");
         cache.put("G", "Giraffe");
         cache.put("H", "Hawk");
         cache.put("I", "Iguana");
         cache.put("J", "Jaguar");

         System.out.println(cache);
         cache.put("K", "Koala");
         System.out.println(cache);
         System.out.println("Looking up B = " + cache.get("B"));
         cache.put("L", "Lion");
         System.out.println(cache);
         System.out.println("Removing D = " + cache.remove("D"));
         System.out.println(cache);
         System.out.println("Adding M.");
         cache.put("M", "Monkey");
         System.out.println(cache);
         System.out.println("Adding N.");
         cache.put("N", "Newt");
         System.out.println(cache);
     }
 }

Output From Example

 {A=Armadillo, B=Bronco, C=Camel, D=Dingo, E=Elephant, F=Frog, G=Giraffe, H=Hawk, I=Iguana, J=Jaguar}
 {B=Bronco, C=Camel, D=Dingo, E=Elephant, F=Frog, G=Giraffe, H=Hawk, I=Iguana, J=Jaguar, K=Koala}
 Looking up B = Bronco
 {B=Bronco, D=Dingo, E=Elephant, F=Frog, G=Giraffe, H=Hawk, I=Iguana, J=Jaguar, K=Koala, L=Lion}
 Removing D = Dingo
 {B=Bronco, E=Elephant, F=Frog, G=Giraffe, H=Hawk, I=Iguana, J=Jaguar, K=Koala, L=Lion}
 Adding M.
 {B=Bronco, E=Elephant, F=Frog, G=Giraffe, H=Hawk, I=Iguana, J=Jaguar, K=Koala, L=Lion, M=Monkey}
 Adding N.
 {B=Bronco, F=Frog, G=Giraffe, H=Hawk, I=Iguana, J=Jaguar, K=Koala, L=Lion, M=Monkey, N=Newt}

Binary Tree – Is Balanced? – Java

This is a basic algorithm for determining whether or not a binary tree is balanced.

How it Works

This algorithm builds on the Binary Tree Depth problem, and is just slightly more complex. From the referenced tree depth problem, the depth of a binary tree is equal to the level of its deepest node.

A binary tree is considered balanced if at any level, the left sub-tree is balanced, and the right sub-tree is balanced, and the difference between the height of the left sub-tree and right sub-tree is no more than 1. For this to make more sense, consider that every node in a binary tree is actually a smaller binary tree. At its simplest case, a leaf is a tree of size 1, and it is balanced because the depth of its left and right sub-trees is equal (0). So, if we ever find a sub-tree (non-leaf) where there is depth 2 on the left and 0 on the right, or vice-versa, that sub-tree is not balanced, and thus the entire tree is not balanced. In essence, we’re just recursively making sure that the depth of the sub-trees at any level varies by no more than 1.

Sample Implementation

Here is a sample implementation. You may double-click to copy the contents of the sample into your IDE so that you may play with it in a debugger to learn more.

public class BinaryTree {

     //A simple class ot represent a node in the binary tree.
     private static class Node {
         public Node(int v) { this.value = v; }
         int value;
         Node left;
         Node right;
     }

     //This main method demonstrates the isBalanced function.
     public static void main(String[] args) {

         //Create a tree where the largest depth is 5 on the left.
         Node root = new Node(5);
         root.left = new Node(3);
         root.left.right = new Node(4);
         root.left.left = new Node(2);
         root.left.left.left = new Node(1);

         //We'll give the right side maximum depth 3.
         root.right = new Node(7);
         root.right.left = new Node(6);
         root.right.right = new Node(8);
         root.right.right = new Node(9);

         //Check if the tree is balanced (it is).
         System.out.println(isBalanced(root));

         //Unbalance the tree and recheck (will fail).
         root.left.left.left.left = new Node(0);
         System.out.println(isBalanced(root));
     }

     //Determine if the given tree is balanced by recursively
     //checking if each subtree is balanced, and if the depth
     //of the subtrees varies by no more than 1.
     public static boolean isBalanced(Node root) {

         //A null tree is considered balanced.
         if (root == null) return true;

         return
             //The depth difference between the  left and right
             //subtrees is no more than 1.
             (Math.abs(depth(root.left) - depth(root.right)) < 2)

             //and both subtrees are balanced.
             && isBalanced(root.left) && isBalanced(root.right);
     }

     //This a recursive function for finding the depth of a tree.
     //The trick here is that as we recurse down to the next level
     //of the tree in each case, we add 1 to the depth.  We also
     //make sure that we only track the maximum depth between the
     //two subtrees as that is all we care about.
     public static int depth(Node root) {
         if (root == null) return 0;
         return 1 + Math.max(depth(root.left), depth(root.right));
     }
 }

Output of Example

This example simply outputs “true \n false” as the tree is balanced in the first check, and it is not balanced in the second check.

Binary Tree Depth – Java

This is a basic algorithm for finding the depth of a binary tree.

How it Works

The depth of a binary tree is defined as being the level of its deepest node. So, for example, a tree that is just a root has depth = 1. A tree whose root has a left node and a right node has depth = 2 as either node is just one from the root. A tree whose root has a left node, that also has its own left node has depth = 3, and so on. Only the deepest path is relevant in determining the tree depth.

The algorithm that we use to get the depth is very concise. All it does is recursively traverse the tree while tracking the count at each level. The result of each level is the maximum depth (count) between the left and right sub-trees. So, at the root, we start with count = 1 + max(count(left-subtree), count(right-subtree)). We continue this, adding one at each level, until we get to the leaf nodes where we get the final counts. Since we only retain the maximum count at each level, by the time the original call at which started with the root node returns, we have only the maximum count (depth) of the tree left.

Sample Implementation

Here is a sample implementation. You may double-click to copy the contents of the sample into your IDE so that you may play with it in a debugger to learn more.

 public class BinaryTreeDepth {

     //A simple class to represent a node in the binary tree.
     private static class Node {
         public Node(int v) { this.value = v; }
         int value;
         Node left;
         Node right;
     }

     //This main method demonstrates the algorithm's use.
     public static void main(String[] args) {

         //Create a tree where the largest depth is 5 on the left.
         Node root = new Node(5);
         root.left = new Node(3);
         root.left.right = new Node(4);
         root.left.left = new Node(2);
         root.left.left.left = new Node(1);
         root.left.left.left.left = new Node(0);

         //We'll give the right side maximum depth 3.
         root.right = new Node(7);
         root.right.left = new Node(6);
         root.right.right = new Node(8);
         root.right.right = new Node(9);

         //Print the depth of the tree.
         System.out.println(depth(root));
         System.out.println(isBalanced(root));
     }

     //This a recursive function for finding the depth of a tree.
     //The trick here is that as we recurse down to the next level
     //of the tree in each case, we add 1 to the depth.  We also
     //make sure that we only track the maximum depth between the
     //two subtrees as that is all we care about.
     public static int depth(Node root) {
         if (root == null) return 0;
         return 1 + Math.max(depth(root.left), depth(root.right));
     }
 }

Binary Tree – Mirror Image Algorithm

This algorithm is designed to take in a binary tree (not necessarily a BST/Binary Search Tree) and reverse the nodes at each level to generate the mirror image of that binary tree.

How Does it Work?

The mirror image of a binary tree is simply the result of reversing the nodes at every level. So, the trick is to recursively visit every node, starting with the root and going to its left and right sub-trees. For every node visited, including the root itself, you just swap the left and right sub-trees. When there are no nodes left (i.e. the left and right sub-trees of the current node are null), then the algorithm returns.

Sample Implementation

Here is a sample implementation of the algorithm. It is fully working code, so you may double click on it and copy the whole example into your preferred IDE to test it and debug it for better understanding. Sample output is provided below the sample.

 //Algorithm to get the mirror image of a binary tree and
 //print out the before and after images.
 public class MirrorTree {

     private static class Node {
         public int value;
         public Node left;
         public Node right;
         public Node(int value) { this.value = value; }
     }

     //Example of algorithm being used.
     public static void main(String[] args) {

         //Create a tree on which to run the mirror algorithm.
         Node root = new Node(1);
         root.left = new Node(2);
         root.left.left = new Node(3);
         root.left.right = new Node (4);
         root.right = new Node(5);
         root.right.right = new Node(6);
         root.right.right.right = new Node(7);

         //Print out the original tree.
         System.out.println("Tree before mirror:");
         printTree(root);

         //Run the mirror algorithm.
         mirror(root);

         //Print out the resulting tree.
         System.out.println("\nTree after mirror:");
         printTree(root);
     }

     //The mirror algorithm itself.
     public static void mirror(Node root) {

         //Terminate when the root is null.
         if (root == null) return;

         //Swap the left and right subtrees.
         Node tmp = root.left;
         root.left = root.right;
         root.right = tmp;

         //Recursively repeat on left and right subtrees.
         mirror(root.left);
         mirror(root.right);
     }

     //Utility function to print the binary tree as an example.
     public static void printTree(Node root) {
         if (root == null) return;
         printTree(root, 0);
     }

     //Private utility function for formatting the tree display.
     private static void printTree(Node root, int level) {
         if (root == null) return;
         printTree(root.left, level + 1);
         for (int i = 0; i < level; ++i) System.out.print("  ");
         System.out.print(root.value + "\n");
         printTree(root.right, level + 1);
     }
 }

 

Output From Example

This is an admittedly lazy way of printing a binary tree, but if you look at it as if the left side of your screen is the top of the screen, it gets the job done. You can see that that every node in this tree is displaying the mirror image of its original in the second printout.

 Tree before mirror:
     3
   2
     4
 1
   5
     6
       7
 
 Tree after mirror:
       7
     6
   5
 1
     4
   2
     3

 

Merge Sort Algorithm in Java

This is a standard implementation of the merge-sort algorithm. It is heavily commented to help you understand it.

Merge-sort is a divide and conquer based algorithm with a very simple premise.  It keeps dividing the provided data set in half until only 1 element remains in each recursive call.  Then it merges the sorted sub-sections of the array on the way back up.

  1. Take an array as input.
  2. Recursively divide the array in half and merge sort each half (so, one recursive call in each invocation for the left half of the array, and one for the right half).
  3. When just one element remains, return that element.
  4. After the recursive calls, sort the two halves.  Following the recursive logic, each half will already be sorted in itself.  So, the merge just requires repeatedly taking the next smallest element from either of the halves until both are empty.

 

 import java.util.Arrays;

 public class MergeSort {

     public static void main(String[] args) {
         int[] arr1 = {1,7,2,9,3,5,2,3,4,7,8,9,6,1,3};
         int[] arr2 = {1,7,2,9,3,5,2,3,4,7,8,9,6,1,3};
         System.out.println("Starting array (unosrted)");
         System.out.println(Arrays.toString(arr1));
         mergeSort(arr1);
         System.out.println("Personal mergesort result:");
         System.out.println(Arrays.toString(arr1));
         Arrays.sort(arr2);
         System.out.println("Built in sort result:");
         System.out.println(Arrays.toString(arr2));

     }

     public static void mergeSort(int[] a) {
         mergeSort(a, new int[a.length], 0, a.length - 1);
     }

     private static void mergeSort(int[] array, int[] buffer,
             int start, int end) {

         //Continue until start is equal to end (i.e. there is only
        //one element left.
         if (start < end) {

             //Calculate the midpoint.
             int mid = (start + end) >>> 1;

             //Recurse and sort the lower and upper halves of the
             //array.
             mergeSort(array, buffer, start, mid);
             mergeSort(array, buffer, mid + 1, end);

             //Merge the resulting arrays together.
             merge(array, buffer, start, mid, end);
         }
     }

     private static void merge(int[] array, int[] buffer, int start,
            int mid, int end) {

         //Copy the full range of elements from the array to the
        //temporary buffer.
         for (int i = start; i <= end; ++i) buffer[i] = array[i];

         int low = start;
         int high = mid + 1;
         int curr = start;

         //While there are still low-half and high-half elements to
        //compare, take the smaller one and add it to the real
        //array from the helper buffer.  Must track lower array
        //index, upper array index, and real array index. Also must
        //remember that this ends when we get to the end of either
        //half of the array.
         while (low <= mid && high <= end) {
             if (buffer[low] <= buffer[high])
                array[curr++] = buffer[low++];
             else 
                array[curr++] = buffer[high++];
         }

         //We got to the end of one of the halves first.  If it was
        //the bottom half, then the upper half is already in the
        //correct position which is fine.  If we got to the end of
        //the upper half first then we need to copy the bottom
        //half's remainder to the end.
         while (low <= mid) {
             array[curr++] = buffer[low++];
         }
     }
 }