Debugging HBase Unit Tests

This is likely an obvious process for those who use IDE’s and develop in Maven daily but for those who do operations or otherwise need to work on the JUnit tests in HBase only infrequently, here’s how I worked when submitting a patch for HBASE-16700.

First, create your code:

Here I was adding a MasterObserver coprocessor to HBase, so I could work relatively easily writing my code as it was simply one class. I was able to do the following — very crude — workflow:

  1. Add the following to my HBase master’s hbase-site.xml:
    <property>
    <name>hbase.coprocessor.master.classes</name>
    <value>org.apache.hadoop.hbase.security.access.AccessController</value>
    </property>

  2. export CLASSPATH=$(hbase classpath)
  3. vi <my code>.java
  4. javac <my code>.java
  5. Copy my class file into my HBase master’s lib directory
  6. Restart my HBase master

Next, create a test:

This is the novel part to operators, you simply need to create a file under the relevant directory for the feature you are committing but in traditional Java fashion it will be under src/test while your feature will go under src/main. HBase has some guidelines on writing a test. Similarly, a useful class for writing HBase-server tests which need a minicluster is HBaseTestingUtility. Remember to write positive and negative tests (prove that your code does what you expect and handles unexpected operations gracefully).

Testing your test

To test your test you can ask Maven to run a build and test just your test class via the following: mvn -X test '-Dtest=org.apache.hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver'.

Now, the -x is not necessary, it runs Maven in debug mode which is useful here. As to see the log output in your test you will want to run it standalone and Maven in debug mode will give you the proper incantation with the classes it built. You will see a line akin to the following while your test is forked off: Forking command line: /bin/sh -c cd hbase/hbase-server && /usr/lib/jvm/jdk1.8.0_101/jre/bin/java -enableassertions -Dhbase.build.id=2016-11-13T22:53:41Z -Xmx2800m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -jar hbase/hbase-server/target/surefire/surefirebooter5454815236698078750.jar hbase/hbase-server/target/surefire/surefire4890497615179486565tmp hbase/hbase-server/target/surefire/surefire_09143864480388952525tmp
Running org.apache.hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver

This line is useful as you can copy-and-paste it to run your test manually. A particularly useful feature is seeing log output. But also with this line you can attach a debugger too!

Attaching a debugger

To attach a debugger, one needs to launch Java with some options for it to wait until the debugger attaches. I was using the particular incantation: export JAVA_DEBUG='-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000'. I would simply add $JAVA_DEBUG just after the /usr/bin/java. Here we ask the process to listen on port 8000 for the debugger to connect (and as one could guess, suspend=n will not wait for a debugger to connect).

Java ships a command-line debugger (jdb) but it has no command line history or class tab completion which is a pain. I used Andrew Pimlott’s rlwrap-jdb to provide these features. I could spin up a debugger with: CLASSPATH=hbase/hbase-server/target/test-classes/org/apache/hadoop/hbase/security/access/:hbase/hbase-server/target/ ./list-java-breakpoints 2>/dev/null > breakpoints_file && ./rlwrap-jdb --breakpoints-file breakpoints_file jdb -attach 8000.

Running a debugger on an already running process

As a side-note, getting familiar with the debugger is quite useful, as one can use this on production systems to inspect an already running Java daemon. From the JPDA Connection and Invocation documentation one can track down a number of Java debugger connector processes. The useful one for an already running process is the SA PID Attaching Connector run via jdb -connect sun.jvm.hotspot.jdi.SAPIDAttachingConnector:pid=<pid>.

Similarly, today I often take jmap -dump:format=b,file=<filename> dumps of misbehaving Java processes for later analysis with jhat but figure in the future I should perhaps investigate using sun.jvm.hotspot.jdi.SACoreAttachingConnector on core files of the misbehaving process to get a different view of the world.

Finding HBase Region Locations

HBase Region Locality

HBase provides information on region locality via JMX per region server via the hbase.regionserver.percentFilesLocal. However, there is a challenge when running a multi-tenant environment or doing performance analysis. This percent of files local is for the entire region server but of course each region server can serve regions for multiple tables. And further, each region can be made up of multiple store files each with their own location.

If one is doing a performance evaluation for a table, these metrics are not sufficient!

How to See Each Region

To see a more detailed breakdown, we can use HDFS to tell us where a file’s blocks live. Further, we can point HDFS to the files making up a table by looking under the HBase hbase.rootdir and build up a list of LocatedFileStatus objects for each file. Nicely, LocatedFileStatus provides getBlockLocations() which can provide the serving hosts for each HDFS block.

Lastly, all we need to do is correlate which region servers have local blocks for regions they are serving; now we can come up with a table locality percentage.

Implementation

One can do nifty things in the HBase shell as it is really a full JRuby shell. Particularly, one can enter arbitrary Java to run which works great debugging — or running performance tests. The following is the needed JRuby, which can be saved to a file and executed via hbase shell <file name> or simply copy and pasted into the shell.

require 'set'
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.TableName
import org.apache.hadoop.io.Text
 
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.util.NoSuchElementException
import java.io.FileNotFoundException
 
# Return a Hash of region UUIDs to hostnames with column family stubs
#
# tableName - table to return regions for
#
# Example
# getRegionUUIDs "TestTable"
# # => {"3fe594363a2c13a3550f752db147194b"=>{"host" => "r1n1.example.com", "cfs" => {"f1" => {}, "f2" => {}},
#       "da19a80cc403daa9a8f82ac9a1253e9d"=>{"host" => "r1n2.example.com", "cfs" => {"f1" => {}, "f2" => {}}}}
#
def getRegionUUIDs(tableName)
  c = HBaseConfiguration.new()
  tableNameObj = TableName.valueOf(tableName)
  t = HTable.new(c, tableNameObj)
  regions = t.getRegionsInRange(t.getStartKeys[0],
                                t.getEndKeys[t.getEndKeys.size-1])
  # get all column families -- XXX do all regions have to host all CF's?
  cfs = HTable.new(c, tableNameObj).getTableDescriptor.getFamilies().map{ |cf| cf.getNameAsString() }
 
  r_to_host = regions.map{|r| [r.getRegionInfo().getEncodedName(), Hash["host" => r.getHostname(), "cfs" => Hash[cfs.map{|cf| [cf, Hash.new()] }]]] }
 
  Hash[r_to_host]
end
 
def findHDFSBlocks(regions, tableName)
  # augment regions with HDFS block locations
  augmented = regions.clone
  c = HBaseConfiguration.new()
  fs = FileSystem.newInstance(c)
  hbase_rootdir = c.select{|r| r.getKey() == "hbase.rootdir"}.first.getValue
  tableNameObj = TableName.valueOf(tableName)
  nameSpace = tableNameObj.getNamespaceAsString
  baseTableName = tableNameObj.getQualifierAsString
  # use the default namespace if nongiven
  nameSpace = "default" if nameSpace == tableName
 
  regions.each do |r, values|
    values["cfs"].keys().each do |cf|
      rPath = Path.new(Pathname.new(hbase_rootdir).join("data", nameSpace, baseTableName, r, cf).to_s)
      begin
        files = fs.listFiles(rPath, true)
      rescue java.io.FileNotFoundException
        next
      end
 
      begin
        begin
          fStatus = files.next()
          hosts = fStatus.getBlockLocations().map { |block| Set.new(block.getHosts().to_a) }
          augmented[r]["cfs"][cf][File.basename(fStatus.getPath().toString())] = hosts
        rescue NativeException, java.util.NoSuchElementException
          fStatus = false
        end
      end until fStatus == false
    end
  end
  augmented
end
 
def computeLocalityByBlock(regions)
  non_local_blocks = []
  regions.each do |r, values|
    values["cfs"].each do |cf, hFiles|
      hFiles.each do |id, blocks|
        blocks.each_index do |idx|
          non_local_blocks.push(Pathname.new(r).join(cf, id, idx.to_s).to_s) unless blocks[idx].include?(values["host"])
        end
      end
    end
  end
  non_local_blocks
end
 
def totalBlocks(regions)
  regions.map do |r, values|
    values["cfs"].map do |cf, hFiles|
      hFiles.map do |id, blocks|
        blocks.count
      end
    end
  end.flatten().reduce(0, :+)
end
 
tables = list
tables.each do |tableName|
  puts tableName
  begin
    regions = getRegionUUIDs(tableName)
    hdfs_blocks_by_region = findHDFSBlocks(regions, tableName)
    non_local_blocks = computeLocalityByBlock(hdfs_blocks_by_region)
    total_blocks = totalBlocks(hdfs_blocks_by_region)
    puts non_local_blocks.length().to_f/total_blocks if total_blocks > 0 # e.g. if table not empty or disabled
  rescue org.apache.hadoop.hbase.TableNotFoundException
    true
  end
end

 

One will get output of the form table-name newline float of locality percentage (0.0-1.0). Should the table be offline, deleted (TableNotFoundException), an HDFS block moved, etc. the exception will be swallowed. In the case of a table not being calculated, no float will return in the output (line simply skipped); in the case of HDFS data not being found, the locality computation will assume that block to be non-local.

Post-Script

Some nice follow-on work to make this data into a useful metric, might be to augment with the size of the blocks (in records or bytes) and determine a locality percentage on size not only blocks. Further, for folks using stand-by regions breaking out locality of replicated blocks may be important as well.

Map/Reduce diff(1)

This has sadly been a draft for years, so time to release it…

diff(1)

For those who use Unix, you have likely come across two files and wanted to see what was different between the two. Certainly, one can compare size (highly inaccurate), use a hash function (if a strong cryptographic hash, it will be accurate — but very information free) or one can use the obvious choice, diff(1). One usually gets output like

$ cat << EOF > one
foo
blah
baz raz
has
EOF
$ cat << EOF > two
blah
yar raz
has
EOF
$ diff one two
1d0
< foo
3c2
< baz raz
---
> yar raz

Here we see that the left file (file one) has an extra entry on line one and line three differs between the two files. Further, we can see that the algorithm matched lines, as blah was matched between the files despite the leading foo in file one.

Map/Reduce

Map/Reduce gained visibility after Google’s initial publication and certainly now that Hadoop has gained significant adoption. For my work, I mostly use Apache Pig which is a high-level language which compiles down to a map/reduce plan and runs on Hadoop Map/Reduce, Apache Tez and Apache Spark.

There are UDF approaches (such as the Pig built-in DIFF). The built-in DIFF does have one flaw for this work, in that it only accepts two bags (non-repeating, unordered data-structure) and as each set of data would be a bag, each file must fit into a container’s memory — not something efficient for differencing two large files.

For implementing code to generate a difference, I settled on two easy ways easy ways to operate. One was a UNION based approach, the other was a JOIN based approach. This allowed me to get the data from each file in one Pig data-structure (a relation), however, the approaches differ dramatically in row size of the relation.

Despite data size differences the run time performance a number of years ago was roughly parallel using Hadoop Map/Reduce. I found on 2012 hardware it took 10 minutes to difference over 200GB (1,055,687,930 rows) using LZO compressed input with 18 nodes. Further, each approach only takes one Map/Reduce cycle.

Also, one has to decide the quality of diff one would like; options range from line-numbers enumerating the records (lines) before the join if one were beginning a context-diff implementation to something as simple as should a match be reported or the count of matches (if a line is duplicated in a single source).

Simply, unlike the Unix diff(1) tool, order is not important; effectively the JOIN approach performs sort -u <foo.txt> | diff while UNION performs sort <foo> | diff.

Implementation

UNION

The UNION operator in Pig is like the SQL UNION operator. For differencing, one only needs to augment each file’s data with the data’s source, group and then count sources to find matches. While more lines of code than a JOIN approach, one can easily add in more metadata to each line (such as if the line is duplicated in each file but of a different quantity of repication).

Code

SET job.name 'Diff(1) Via Join'

-- Erase Outputs
rmf first_only
rmf second_only

-- Process Inputs
a = LOAD 'a.csv.lzo' USING com.twitter.elephantbird.pig.load.LzoPigStorage('\n') AS First: chararray;
b = LOAD 'b.csv.lzo' USING com.twitter.elephantbird.pig.load.LzoPigStorage('\n') AS Second: chararray;

-- Combine Data
combined = JOIN a BY First FULL OUTER, b BY Second;

-- Output Data
SPLIT combined INTO first_raw IF Second IS NULL,
                    second_raw IF First IS NULL;
first_only = FOREACH first_raw GENERATE First;
second_only = FOREACH second_raw GENERATE Second;
STORE first_only INTO 'first_only' USING PigStorage();
STORE second_only INTO 'second_only' USING PigStorage();

 

JOIN

One can perform a difference via an outer-join as well. Here one has a more compact expression to achieve the desired results only doing a FULL OUTER join to only return records (lines) which appear in one file but not the other; then one can return the results to report the asymmetry. The JOIN approach does collapse duplicates (so, if one file has more duplicates than the other, this approach will not output the duplicate).

Code

SET job.name 'Diff(1)'

-- Erase Outputs
rmf first_only
rmf second_only

-- Process Inputs
a_raw = LOAD 'a.csv.lzo' USING com.twitter.elephantbird.pig.load.LzoPigStorage('\n') AS Row: chararray;
b_raw = LOAD 'b.csv.lzo' USING com.twitter.elephantbird.pig.load.LzoPigStorage('\n') AS Row: chararray;

a_tagged = FOREACH a_raw GENERATE Row, (int)1 AS File;
b_tagged = FOREACH b_raw GENERATE Row, (int)2 AS File;

-- Combine Data
combined = UNION a_tagged, b_tagged;
c_group = GROUP combined BY Row;

-- Find Unique Lines
%declare NULL_BAG 'TOBAG(((chararray)\'place_holder\',(int)0))'

counts = FOREACH c_group {
             firsts = FILTER combined BY File == 1;
             seconds = FILTER combined BY File == 2;
             GENERATE
                FLATTEN(
                        (COUNT(firsts) - COUNT(seconds) == (long)0 ? $NULL_BAG :
                            (COUNT(firsts) - COUNT(seconds) > 0 ?
                                TOP((int)(COUNT(firsts) - COUNT(seconds)), 0, firsts) :
                                TOP((int)(COUNT(seconds) - COUNT(firsts)), 0, seconds))
                        )
                ) AS (Row, File); };

-- Output Data
SPLIT counts INTO first_only_raw IF File == 1,
                  second_only_raw IF File == 2;
first_only = FOREACH first_only_raw GENERATE Row;
second_only = FOREACH second_only_raw GENERATE Row;
STORE first_only INTO 'first_only' USING PigStorage();
STORE second_only INTO 'second_only' USING PigStorage();

 

Reference

CloudStack

Compute Infrastructure-as-a-Service

Today’s software development world is hosted on massive computing machines — lots of memory, lots of disk space, lots of CPU power. However, software development and testing is still often done at small scale; developers use vi, run unit tests running in python and run build scripts written for ant and mvn. How can one best use these massive machines for their development at small scale and still run tests on them at large scale, when necessary?

Unless you’ve been living in a cave the last few years, virtualization has been firmly burned in your mind by IT marketing material. In particular, taking those massive machines and cutting them up into many smaller virtual machines is the solution converged on by much of the industry. I agree! And, here are my notes on how I moved my group into this era.

CloudStack

Enter Citrix, Cloud.COM and now the Apache Foundation; CloudStack is an incubating Apache project. CloudStack is a very slick application which effectively implements Amazon’s EC2 UI, API and features — including a nice web front-end for starting and managing your VMs, storage and networks. However, as with all new software and certainly a piece of software which is as complex as a data center in a box, there are bugs and lots of knobs to turn for configuration.

Setup

I took a very conservative approach to working with CloudStack. I need only many VMs, running on the same network with little isolation and with only workable storage space and reliability. I do not need very high performance or high reliability. I only needed to slice up a few machines in the same physical datacenter. Further, I am currently using CloudStack 3.0.2 on CentOS 6.3 using the KVM hypervisor; as CloudStack development moves VERY quickly, I expect their upcoming 4.0 release will be very different and further, as the OS vendors do not stand still, I’m sure a different RedHat based distro or even CentOS version would be quite different.

I followed the CloudStack Quick Install guide and set up a Basic Zone. As I did want my VMs reachable from the outside (CloudStack that is) world, I needed to ensure I selected a network offering supporting Security Groups (DefaultSharedNetworkOfferingWithSGService).

Further, I reused another machine I had handy with a terabyte of disk space as my NFS server but did enable local storage for user VMs to stretch all the disks I could get at. I used one IP network for my management and guest networks. (I do hope to get the machines running bonded 1GigE soon for their physical connections though.) Simply to reduce IP usage, were I to do it again, I would have use a second non-routable (RFC1918 address space) and setup the management server to have acted as a NAT box to my broader network.

Configuration

CloudStack is much more centralized than some other infrastructure-as-a-server cloud offerings. One only needs to understand a few roles and daemons to understand the major touch points to CloudStack:

  • Management Server
    • This runs the Tomcat server which hosts the UI and does most of the coordination activities amongst the various CloudStack components.
    • /etc/init.d/cloud-management
    • /var/log/cloud/management/management-server.log
    • /var/log/cloud/management/catalina.out
  • Usage-Server
    • This is the usage server which collects metrics from CloudStack for external analysis (e.g. billing)
    • /etc/init.d/cloud-usage
    • /var/log/cloud/usage/usage.log
  • Agent
    • This runs on the CloudStack machines which host guest VMs.
    • /etc/init.d/cloud-agent
    • /var/log/cloud/agent/agent.log

This centralization of components makes configuration and debugging an easier process but still managing all the how-to documents for a system as big as CloudStack became a bit daunting; below are my most used how-to’s and pitfalls which I ran across.

Agent Reboots

One issue which was very confusing, was when I initially setup my compute hardware as CloudStack agents. They would immediate reboot; and cause the machine to keep rebooting! (This was not behavior I was expecting.) This taught me to check the logs early and check the logs often, as I found (in /var/log/cloud/agent/agent.log):

2012-10-09 16:18:50,466{GMT} WARN  [resource.computing.KVMHAMonitor] (Thread-27:) write heartbeat failed: Failed to create /mnt/031d9475-063d-30b5-b910-7ee710ff81b0/KVMHA//hb-172.20.7.136; reboot the host

Luckily, others had been here before. The fix was nicely documented and ever so easy:sed -i 's/reboot/#reboot/g' /usr/lib64/cloud/agent/scripts/vm/hypervisor/kvm/kvmheartbeat.sh. It also showed me an invaluable setting to enable outputting the DEBUG messages from the CloudStack agent: sed -i 's/INFO/DEBUG/g' /etc/cloud/agent/log4j-cloud.xml

Agent dies at start with: Unable to start agent: Unable to find the guid

Next, I had issues with starting the agent. The wizard or I would run cloud-setup-agent and then checking /etc/init.d/cloud-agent status would show the agent dead. This to was an easy fix which
again someone else had documented. One simply needs to add the following to their /etc/cgconfig.conf and restart their cfconfig service:

group virt {
  cpu {
    cpu.shares = 9216;
  }
}

Set your hostname

While the Quick Install Guide says to ensure your hostname is set (e.g. checked via the hostname --fqdn command) ensure that you have /etc/hosts and /etc/sysconfig/network set with your fully-qualified hostname. One error you may see, can be found in the ever helpful CloudStack forum.

Automatic VM Password Generation

To add password generation and reset support to your own templates, you can follow the instructions for CloudStack 4.0; I have tested the Linux script, at least. (There is also the ability to use ssh key-pairs, like Amazon EC2 does, but I have not yet tried that but it is well documented, if not supported by the UI.)

LDAP

Setting up LDAP for CloudStack is quite easy but it requires doing some setting outside the UI, and with the API, as documented in the instructions (or original). (There are some notes on using port 8096, as the documentation does.) There is also one bug CS-14680 which has to be worked around, as the LDAP authentication does not use MD5 hashing like the built-in MySQL authentication does.

Due to CS-14680, if you need to allow authentication against both LDAP and the built-in MySQL, then a bit of HTML changes are necessary too. The changes are documented in CS-16325.

Lastly, as one needs to setup the accounts for CloudStack to use from LDAP, there is a Ruby script which can synchronize your LDAP server to CloudStack. But remember, if setting up accounts from a LDAP server which might control sensitive services (e.g. Active Directory) in my case, you will likely want to use SSL on your Management Server, so that passwords are encrypted.

Usage

Cleanly Restart a Host

If you need to restart one of your VM hosting machines, there is a bit more forethought required than one would normally have for a Linux box. The steps are:

  1. Mark it in maintenance
  2. Then, restart
  3. Mark it as available

If a machine is not properly shutdown:

  1. Get it back online by toggling the maintenance state of the host
  2. Look at zone’s system VMs — they might be in wedged starting state and need to be unstuck
    1. May need to enable/disable zone
    2. Restart the management server
    3. Ensure the VMs are not running on the host they claim (using virsh) and set them to stopped in MySQL)

Storage Migration

One very cool feature of CloudStack is that you can migrate your VMs (live!) and you can migrate the storage they are running upon too (storage migration). This is especially useful, if using local storage and needing to move a VM off for host maintenance; but beware there is a good performance optimization which is left to be made to lessen the load on secondary storage when do a storage migration.

Local Storage

Using local storage is of huge help if your infrastructure does not have much shared storage. However, if you are like me, it is easiest to create templates which have relatively small root disks (say 20GB) but for many needs, you will then need to attach the bulk of the storage as an extra volume. While there is a check box to have a system’s root disk be local, there is no equivalent for a disk offering (for making said extra volumes).

I tried to implement local storage disk offerings by using storage tags. I set a tag on the local primary storage pools with tag “LOCAL” and made a disk offering requiring the volumes to be made on pools with only with tag “LOCAL”, but this failed. I could create the volume (but that only makes a database record in CloudStack and does not actually pick out storage; when I attached the storage to the VM (and CloudStack would actually create the volume), it failed. I got:

2012-10-18 22:27:45,385 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-95:job-524) Checking if storage pool is suitable, name: cloud0.domain ,poolId: 211
2012-10-18 22:27:45,385 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-95:job-524) Is localStorageAllocationNeeded? false
2012-10-18 22:27:45,385 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-95:job-524) Is storage pool shared? false
2012-10-18 22:27:45,385 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-95:job-524) StoragePool is not of correct type, skipping this pool
2012-10-18 22:27:45,385 DEBUG [storage.allocator.FirstFitStoragePoolAllocator] (Job-Executor-95:job-524) FirstFitStoragePoolAllocator returning 0 suitable storage pools
2012-10-18 22:27:45,385 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-95:job-524) No suitable pools found for volume: Vol[142|vm=106|ROOT] under cluster: 6
2012-10-18 22:27:45,385 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-95:job-524) No suitable pools found
2012-10-18 22:27:45,385 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-95:job-524) No suitablestoragePools found under this Cluster: 6
2012-10-18 22:27:45,385 DEBUG [cloud.deploy.FirstFitPlanner] (Job-Executor-95:job-524) Could not find suitable Deployment Destination for this VM under any clusters, returning.

This was annoying, the volume I explicitly wanted local, CloudStack was trying to make shared and was ruling out the local pools! However, thankfully someone who was trying to run their cloud without any shared storage and hit upon a solution in CS-11840. (Despite the original filer claiming this failed; UPDATE disk_offering SET use_local_storage = 1 WHERE display_text LIKE "%LOCAL%"; worked for me on 3.0.2.) This did not solve the whole problem immediately, however, as I was trying with local VMs and local storage getting the following error:

2012-10-18 16:11:51,537 DEBUG [cloud.async.AsyncJobManagerImpl] (http-6443-exec-4:null) submit async job-440, details: AsyncJobVO {id:440, userId: 3, accountId: 3, sessionKey: null, instanceType: Volume, instanceId: 104, cmd: com.cloud.api.commands.AttachVolumeCmd, cmdOriginator: null, cmdInfo: {"response":"json","id": "f0089a1b-32f4-4e49-89fa-0dfe0935b4b4","sessionkey":"r2D/wVCutA/UwOkAXxtfzSDjU7o\u003d","ctxUserId":"3","virtualMachineId":"fe907255-063a-4e72-95a3-43abe53f1867 ","_":"1350591111196","projectid":"6c8ef680-752f-47d4-a0a1-fe9d68197a18","ctxAccountId":"3","ctxStartEventId":"4420"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 964251491601, completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2012-10-18 16:11:51,538 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-11:job-440) Executing com.cloud.api.commands.AttachVolumeCmd for job-440
2012-10-18 16:11:51,721 INFO  [cloud.api.ApiDispatcher] (Job-Executor-11:job-440) Please specify a volume that has been created on a shared storage pool.

But. once I realized that I do not care if the small root disk is shared, as long as the massive storage volume is local, I had success with shared storage VMs having local volumes.

Deleting a Zone

If you need to delete a zone, you will want to make sure to follow the correct steps to delete the zone or it is possible to get the zone wedged in an un-deletable state (e.g. CS-14297: [Can] not delete primary storage without going into the database). The steps and correct order are outlined in CS-15991.

Troubleshooting

Much troubleshooting in is done via investigating the MySQL database which underlies Cloudstack, sometimes things get wedged enough that they require changes. The database schema is very easy to understand. While one needs to be restrained in modification (as referential integrity can be compromised causing confusion, if a row is removed or an incorrect ID is entered) there are constraints running around to try and prevent errant state.

Storage Issues

Storage issues can be some of the most insidious issues one will encounter with CloudStack. Errors can be bizarre! Issues I ran across with templates alone:

  • Templates are listed in the UI under “Templates” but not visible when I go to create a VM
    • Usually you can see that the template is still downloading or had an error when clicked on in the UI under templates.
    • These issues can often be resolved by verifying the Secondary Storage VM (SSVM) is working okay. First, make sure your SSVM even started by going to Infrastructure->Zones->System VMs and ensuring your SSVM is “Running”. Luckily, there is a nice write-upon how to check the SSVM for its other common sicknesses.
    • Deleting a template or a template is wedged with Failed post download script: Checksum failed, not proceeding with install fixed in CS-14555. This happened to me on the built-in CentOS 5.6 VM template, which I simply wanted to remove, but since wedged had no UI option to remove it. As such, I followed the (now slightly outdated) steps in a forum thread.
  • Uploading a template/ISO fails
    • Get “Connection Refused” as a template status trying to upload but this is by design that templates can only be uploaded from accepted sites. I simply had to change the secstorage.allowed.internal.sitesconfiguration variable to allow the host.
    • Got “Please specify a valid qcow2” uploading a template may fail due to the file name not ending in .qcow2.
    • Trying to upload an ISO kept failing for me reporting java.lang.IllegalStateException: java.lang.IllegalStateException: unsupported protocol: 'ftp' with an http:// URL, so I filed a bug CLOUDSTACK-370 which awesomely got two responses before I could even git clone the CloudStack source and reproduce the issue.

One painful issue I encountered was when my CloudStack hosts acquired hostnames in DNS, a few days after the cloud was setup. The NFS server providing my primary storage had the cloud machine’s IP addresses in /etc/exports ACL to ensure they could mount, write with no root squashing. But when the hosts entered DNS, the server started rejecting their mount requests and write updates too! (I was using a wild-card for the hosts’ IP addresses for access control which may explain why the hosts were rejected after acquiring hostname (this NFS guide provide’s Do not use wildcards in IP addresses, as they are intermittent in IP addresses..) This lead to the quizzical error, when trying to start a new VM:

2012-10-18 22:18:06,516 DEBUG [storage.allocator.AbstractStoragePoolAllocator] (Job-Executor-93:job-522) Cannot allocate this pool 207 for storage since its allocated percentage: Infinity has crossed the allocated pool.storage.allocated.capacity.disablethreshold: 0.85, skipping this pool

Luckily, again I was not the first to encounter this weird issue with infinity issue; indeed like the previous poster, my MySQL database had 0 for the allocated and available bytes for my primary storage and after fixing the /etc/exports all was happy.

Addding Hosts

For a few quick tests, I had removed one of my agent nodes and came across this strange issue when trying to add the host back to the original zone:

libvir: Storage error : Storage pool not found: no pool with matching uuid
2012-10-15 03:52:00,287{GMT} WARN  [utils.nio.Task] (Agent-Handler-1:) Caught the following exception but pushing on
java.lang.NullPointerException
        at com.cloud.agent.storage.LibvirtStorageAdaptor.createStoragePool(LibvirtStorageAdaptor.java:504)
        at com.cloud.agent.storage.KVMStoragePoolManager.createStoragePool(KVMStoragePoolManager.java:57)
        at com.cloud.agent.resource.computing.LibvirtComputingResource.initialize(LibvirtComputingResource.java:2978)
        at com.cloud.agent.Agent.sendStartup(Agent.java:316)
        at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:846)
        at com.cloud.utils.nio.Task.run(Task.java:79)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)

This had me puzzled for a bit, as I had not needed to use virsh(1) much before this adventure, It seems the old storage pools were still present which was preventing their being re-created. I believe the error was akin to:

[clayb@cloud_machine ~]$ sudo virsh pool-create /tmp/test_pool.xml
error: Failed to create pool from /tmp/test_pool.xml
error: operation failed: Storage source conflict with pool: '031d9475-063d-30b5-b910-7ee710ff81b0'

Eventually, I did the following to good success:

  1. Checked if virsh reported any storage pools in existence (since this host was not successfully added, it should not have had any) — virsh pool-list
  2. Ensured all pools were destroyed with virsh pool-remove <pool>
  3. Cleaned-up any residual files in /etc/libvirt/storage/
  4. Cleaned-up my residual files in my machine’s local storage volume /var/lib/libvirt/images

Passwords

CloudStack encrypts passwords from what I have seen in the MySQL database and configuration files. Indeed this is a change for the 3.0 release. To encrypt passwords like CloudStack one can do the following.

Admin Password Reset

When needing to reset the administrator password for CloudStack, one must resort to modifying the MySQL database, but the procedure is quite painless.

System VM passwords

There is a useful setting if you want to ensure the system VMs are only accessible via SSH key, called system.vm.random.password which should be good. I have verified the /etc/shadow file has a different hash for root between system VM instances, but I did have a problem on my first setting of this. I got the following log message, after seeing the management server was in a wonky state (no MySQL logins worked):

[cbaenziger1@cloud_machine ~]$ grep 'Error while decrypting:' /var/log/cloud/management/management-server.log
2012-10-15 06:27:28,120 DEBUG [utils.crypt.DBEncryptionUtil] (main:null) Error while decrypting: VG3fYbhx

Your failed decrypting string will likely vary; mine did! I verified and tried resolving the issue by doing the following:

mysql> USE cloud;
mysql> SELECT name,value FROM configuration WHERE value LIKE "%VG3fYbhx%";
| system.vm.password | VG3fYbhx |
mysql> UPDATE configuration SET value = "false" WHERE name = "system.vm.random.password";
Query OK, 1 row affected (0.07 sec)
Rows matched: 1  Changed: 1  Warnings: 0

But, I still had issues starting the Management Server:

2012-10-15 06:38:28,977 DEBUG [utils.crypt.DBEncryptionUtil] (main:null) Error while decrypting: VG3fYbhx
2012-10-15 06:38:28,978 ERROR [utils.component.ComponentLocator] (main:null) Unable to load configuration for management-server from components.xml net.sf.cglib.core.CodeGenerationException: org.jasypt.exceptions.EncryptionOperationNotPossibleException-->null

Realizing that the value in system.vm.password did not look like an encrypted password, I looked in the database for another encrypted string I could use and ended up copying the value from secstorage.copy.password. Then, I could start the Management Server; and have since re-enabled system.vm.random.password but I do not see the value in system.vm.password changing.

Default Passwords

I have also seen one security disclosure on CloudStack. And while CloudStack seems solid, like I do my Hadoop cluster (which as of CDH3U5 does not have such wholistic security) I will certainly keep my Cloud infrastructure off the hostile Internet.

Make sure to change the default for the admin user too!

Configuration Files

Many systems have requirements to store configuration parameters. In these systems, a number of choices can be made for how to store that data; sometimes this diversity is painful, however. Choices for storing configuration data are often:

  • Firefox uses and Apple often chooses to use sqlite3 databases1
  • Python programs often use ConfigParser to process initialization (.ini) files
  • Apache Ant amongst many other applications, consume XML configurations
  • Java programs often use Properties files — in XML or traditional form
  • Java Script Object Notation (JSON) is used by programs for configuration; a number of my group’s programs use this, for example
  • Domain Specific Languages (DSLs) are sometimes used. For example, the Puppet configuration management system has its own DSL written in Ruby

This diversity of configuration formats sometimes sees cross pollination, however. Sometimes, an application only reads in one format but another application only outputs another format. Sometimes, one has a toolset which works with only one and many an application grown organically can find itself using many formats itself.

Annoyingly, not all formats support the same set of features either. For example, SQLite3 and XML can be multidimensional; SQLite3 supports multiple N-row by M-columns sized tables in a SQLite3 file, while XML support a hierarchical tree structure of tags with with multiple leaves using attributes on tags. JSON is comparable to XML, offering rich structure for organizing one’s data. The initialization file implementation in Python is only a two-level hierarchy; Java Properties files are flat but often use Java dot-notation to make namespaces which can represent an arbitrarily deep hierarchy. Domain specific languages can be as rich or simple as desired, but there is no commonality or properties inherent in such a configuration format.

This asymmetry can make conversion across formats difficult in general but one should always be able to go from a less rich to a more rich structure. And when possible, it is nice to have some tools to go between them.

Java Properties Files

Using with Python

One can find a recipe to read and write Java Properties files from Python. This re-implementation of the java.util.Properties class provides a convenient interface for working with properties files:

>>> import properties
>>> p=properties.Properties()
>>> with file("my.properties") as f:
...     p.read(f)
>>> p.getPropertyDict()['some_property_I_want']
'this_is_not_the_property_value_you_want!'
>>> p.setProperty('some_property_I_want', 'with_the_value_I_want!')
>>> with file("my.properties") as f:
...     p.store(f)

Properties in XML

One can write an XML version of a Java properties file within Java by simply calling the storeToXML() method on a Properties() object.

Oozie’s XML outputs

I use a lot of Hadoop programs which store their outputs in various XML forms, but one which always drives me nuts is Apache Oozie. Oozie will dump out a workflow job configuration in XML; but not a standard Java XML properties file. Oozie takes in the workflow properties as a non-XML Java properties file provided but it will not accept the XML it produces. However, via the joys of XML Style Sheet Transforms, we can write a simple script which can convert between the two!

An example (Oozie) Properties file in XML:

<configuration>
  <property>
    <name>date</name>
    <value>2011-12-01T00:00Z</value>
  </property>
  <property>
    <name>endTime</name>
    <value>2011-12-01T23:59Z</value>
  </property>
  <property>
    <name>frequency</name>
    <value>1440</value>
  </property>
  <property>
    <name>group.name</name>
    <value>users</value>
  </property>
  <property>
    <name>jobTracker</name>
    <value>jobtracker.example.com:9001</value>
  </property>
  <property>
    <name>nameNode</name>
    <value>hdfs://namenode.example.com:9000</value>
  </property>
  <property>
    <name>oozie.coord.application.path</name>
    <value>/export/my_workflow/coordinator.xml</value>
  </property>
  <property>
    <name>oozie.wf.application.path</name>
    <value>hdfs://namenode.example.com:9000/user/john_doe/my_workflow/workflow.xml</value>
  </property>
  <property>
    <name>queueName</name>
    <value>default</value>
  </property>
  <property>
    <name>startTime</name>
    <value>2011-12-01T00:00Z</value>
  </property>
  <property>
    <name>user.name</name>
    <value>john_doe</value>
  </property>
</configuration>

General XSLT transformation from XML to Java properties file

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" version="1.0" omit-xml-declaration="yes"/>
  <xsl:template match="/*">
    <xsl:for-each select="property">
      <xsl:value-of select="name"/><xsl:text>=</xsl:text><xsl:value-of select="value"/><xsl:text>&#xa;</xsl:text>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

Resulting Java properties file

date=2011-12-01T00:00Z
endTime=2011-12-01T23:59Z
frequency=1440
group.name=users
jobTracker=jobtracker.example.com:9001
nameNode=hdfs://namenode.example.com:9000
oozie.coord.application.path=/export/my_workflow/coordinator.xml
oozie.wf.application.path=hdfs://namenode.example.com:9000/user/john_doe/my_workflow/workflow.xml
queueName=default
startTime=2011-12-01T00:00Z
user.name=john_doe

For those who are not very programming language literate, on Linux, one can nicely use the simple libxml tool xsltproc(1) to run this conversion. For example, to take in my_config in Java properties XML format and product the same file in Java properties format one would run: xsltproc to_property.xslt my_config.xml > my_config.properties

JSON

JSON provides a rich language for expression similar to XML. JSON is often used for data interchange, now often used in AJAX web-requests, etc. However, JSON,

Using with Python

Python has a very feature-rich JSON module which takes the JSON objects and arrays and all their pairs and members representing them akin to native Python list() and dict() objects. Further, the JSON module can provide very rich encoding and decoding functionality, as evidenced in the module’s PyDoc and particular when using hooks for encoding and decoding.

Alias does not work in KSH functions

Alias does not work as expected in KSH functions

One will sometimes read that a function is recommended over an alias, but it is not always obvious why. Certainly, one can do more in a more syntactically elegant way in a function than an alias; but why else?

Today, I ran into an aggravating situation. I found that a command which is set-up by sourcing a script with some alias commands in it. But, the script was failing to resolve the alias. I got alias_command: command not found instead of proper resolution. Even more aggravating, running the type built-in on the alias showed that it existed, and was set as expected, but to no success when calling it.

The problem seems to be that the alias is unavailable when the function is called. I do not quite understand why, but in the O’Reilly book Classic Shell Scripting: Hidden Commands that Unlock the Power of Unix, Figure 7-1 shows that alias resolution and where functions are looked up, happen at very different points in the parsing stack; it seems though that upon the eval loop for a calling a function should resolve the alias?

See below for a simple test-case to present the issues; and show some work-arounds.

Simple alias definition in a function

This fails! This is the initial example of the failed behavior.

#!/bin/ksh

# aliases do not seem to work in the function in which they are defined
function alias_in_function_does_not_work {
    print "\n\nalias in a function does not work:"
    alias bar='ls'
    type bar
    bar
}

alias_in_function_does_not_work
alias in a function does not work:
bar is an alias for ls
/tmp/t.sh[11]: alias_in_function_does_not_work[8]: bar: not found [No such file or directory]

Simple alias definition in a function with eval

This works! The extra evalin the following code block causes the shell to properly parse the alias.

#!/bin/ksh

# aliases work in functions if preceeded with eval
function alias_with_eval_works {
    print "\n\nalias in a function works with eval:"
    alias foobar='ls'
    type foobar
    eval foobar
}

alias_with_eval_works
alias in a function works with eval:
foobar is an alias for ls
file

Functions can replace aliases successfully

This works! A function can replace an alias and perform (often) the same behavior. However, more thought is needed if you want to use alias substitution in clever ways.

#!/bin/ksh

# a function being called by a function is okay
function baz {
    print "functions work:"
    ls
}

function function_works {
    print "\n\nfunctions calling functions work:"
    baz
}

function_works
functions calling functions work:
functions work:
file

Where the alias is defined matters

I do not recommend this! Here we show the code is indeed linearly parsed. A function can used an alias defined earlier in the code. (But this gets awfully convoluted quickly!).

#!/bin/ksh

# a pre-existing alias can be called only if
# after the function definition in the script
function pre_existing_alias_does_not_work {
    print "\n\nalias already defined does not yet work:"
    type foo
    foo
}

# example showing aliases do not resolve in functions
print "\n\nbare alias works:"
alias foo='ls'
type foo
foo

pre_existing_alias_does_not_work

# a pre-existing alias can be called only if
# after the function definition in the script
function pre_existing_alias_now_works {
    print "\n\nalias already defined now works:"
    type foo
    foo
}

pre_existing_alias_now_works
bare alias works:
foo is an alias for ls
file

alias already defined does not yet work:
foo is an alias for ls
/tmp/t.sh[16]: pre_existing_alias_does_not_work[7]: foo: not found [No such file or directory]

alias already defined now works:
foo is an alias for ls
file

Defining aliases in functions fails for other functions

This fails! One can not create an alias in a function and then use it in another function, but one can in the main-line code.

#!/bin/ksh

function setup_alias {
    print "\n\nalias setup..."
    alias foo='ls'
}

# a pre-existing alias can not be called if
# the alias was defined in a function in the script
function use_alias {
    print "\n\nalias already defined does not work:"
    type foo
    foo
}

setup_alias
use_alias
print "\n\nbare alias works:"
type foo
foo
alias setup...

alias already defined does not work:
foo is an alias for ls
/tmp/t.sh[18]: use_alias[14]: foo: not found [No such file or directory]

bare alias works:
foo is an alias for ls
file

In summary…

If you write shell scripts with functions, alias resolution really matters but may not be obvious as to how it is getting resolved or why. Certainly, if you have answers or resources to better explain this, please leave a comment.

Accessing Kerberized HDFS via Jython

Why Kerberos?

So, you want to do some testing of your shiny new Oozie workflow, or you want to write some simple data management task — nothing complex — but your cluster is Kerberized?

Certainly there are many reasons to use Kerberos on your cluster. A cluster with no permissions is dangerous in even the most relaxed development environment, while simple Unix authentication can suffice for some sharing of a Hadoop cluster — but to be reasonably sure people are not subverting your ACL’s or staring at data they should not be, Kerberos is currently the answer.
Read More »

When do people work?

Ever wonder when people are actually working?

It can be hard answering, “when are people at work?” On a distributed team, with many co-workers and the typical corporate dotted-line type relationships, it is even harder! Inevitably communications on schedule shifts and desired schedules go un-communicated. A few years ago, this occurred for folks I worked with.
Read More »

How I work with IPS repos from the slim_source gate

Oh, how I knew System V packages…

Back in the bad old days before ON and slim_source had moved to building only IPS packages, one could pkgadd -d <location> SUNW<package> and easily drop their test code on a machine. Now with the move to IPS packages getting the test code to a machine can be much easier but set up is a bit more complicated. There is a tool to do this automatically for ON called onu (see it in action here). However, for slim_source it is pretty easy to do manually — once you know what you need to do.

Read More »

Network Interactions of a Net Booted X86 AI Client

What all does an X86 do while net booting and installing?

I often get asked how the OpenSolaris Automated Installer works. The big question is how all the pieces tie together. To help answer these questions I have drafted a few UML sequence diagrams showing the boot process of an X86 type machine net booting and installing via the Automated Installer.

PXE running DHCP PXE running TFTP GRUB live-fs-root manifest-locator script auto-installer

Read More »