Configuration Files

Many systems have requirements to store configuration parameters. In these systems, a number of choices can be made for how to store that data; sometimes this diversity is painful, however. Choices for storing configuration data are often:

  • Firefox uses and Apple often chooses to use sqlite3 databases1
  • Python programs often use ConfigParser to process initialization (.ini) files
  • Apache Ant amongst many other applications, consume XML configurations
  • Java programs often use Properties files — in XML or traditional form
  • Java Script Object Notation (JSON) is used by programs for configuration; a number of my group’s programs use this, for example
  • Domain Specific Languages (DSLs) are sometimes used. For example, the Puppet configuration management system has its own DSL written in Ruby

This diversity of configuration formats sometimes sees cross pollination, however. Sometimes, an application only reads in one format but another application only outputs another format. Sometimes, one has a toolset which works with only one and many an application grown organically can find itself using many formats itself.

Annoyingly, not all formats support the same set of features either. For example, SQLite3 and XML can be multidimensional; SQLite3 supports multiple N-row by M-columns sized tables in a SQLite3 file, while XML support a hierarchical tree structure of tags with with multiple leaves using attributes on tags. JSON is comparable to XML, offering rich structure for organizing one’s data. The initialization file implementation in Python is only a two-level hierarchy; Java Properties files are flat but often use Java dot-notation to make namespaces which can represent an arbitrarily deep hierarchy. Domain specific languages can be as rich or simple as desired, but there is no commonality or properties inherent in such a configuration format.

This asymmetry can make conversion across formats difficult in general but one should always be able to go from a less rich to a more rich structure. And when possible, it is nice to have some tools to go between them.

Java Properties Files

Using with Python

One can find a recipe to read and write Java Properties files from Python. This re-implementation of the java.util.Properties class provides a convenient interface for working with properties files:

>>> import properties
>>> p=properties.Properties()
>>> with file("my.properties") as f:
...     p.read(f)
>>> p.getPropertyDict()['some_property_I_want']
'this_is_not_the_property_value_you_want!'
>>> p.setProperty('some_property_I_want', 'with_the_value_I_want!')
>>> with file("my.properties") as f:
...     p.store(f)

Properties in XML

One can write an XML version of a Java properties file within Java by simply calling the storeToXML() method on a Properties() object.

Oozie’s XML outputs

I use a lot of Hadoop programs which store their outputs in various XML forms, but one which always drives me nuts is Apache Oozie. Oozie will dump out a workflow job configuration in XML; but not a standard Java XML properties file. Oozie takes in the workflow properties as a non-XML Java properties file provided but it will not accept the XML it produces. However, via the joys of XML Style Sheet Transforms, we can write a simple script which can convert between the two!

An example (Oozie) Properties file in XML:

<configuration>
  <property>
    <name>date</name>
    <value>2011-12-01T00:00Z</value>
  </property>
  <property>
    <name>endTime</name>
    <value>2011-12-01T23:59Z</value>
  </property>
  <property>
    <name>frequency</name>
    <value>1440</value>
  </property>
  <property>
    <name>group.name</name>
    <value>users</value>
  </property>
  <property>
    <name>jobTracker</name>
    <value>jobtracker.example.com:9001</value>
  </property>
  <property>
    <name>nameNode</name>
    <value>hdfs://namenode.example.com:9000</value>
  </property>
  <property>
    <name>oozie.coord.application.path</name>
    <value>/export/my_workflow/coordinator.xml</value>
  </property>
  <property>
    <name>oozie.wf.application.path</name>
    <value>hdfs://namenode.example.com:9000/user/john_doe/my_workflow/workflow.xml</value>
  </property>
  <property>
    <name>queueName</name>
    <value>default</value>
  </property>
  <property>
    <name>startTime</name>
    <value>2011-12-01T00:00Z</value>
  </property>
  <property>
    <name>user.name</name>
    <value>john_doe</value>
  </property>
</configuration>

General XSLT transformation from XML to Java properties file

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" version="1.0" omit-xml-declaration="yes"/>
  <xsl:template match="/*">
    <xsl:for-each select="property">
      <xsl:value-of select="name"/><xsl:text>=</xsl:text><xsl:value-of select="value"/><xsl:text>&#xa;</xsl:text>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

Resulting Java properties file

date=2011-12-01T00:00Z
endTime=2011-12-01T23:59Z
frequency=1440
group.name=users
jobTracker=jobtracker.example.com:9001
nameNode=hdfs://namenode.example.com:9000
oozie.coord.application.path=/export/my_workflow/coordinator.xml
oozie.wf.application.path=hdfs://namenode.example.com:9000/user/john_doe/my_workflow/workflow.xml
queueName=default
startTime=2011-12-01T00:00Z
user.name=john_doe

For those who are not very programming language literate, on Linux, one can nicely use the simple libxml tool xsltproc(1) to run this conversion. For example, to take in my_config in Java properties XML format and product the same file in Java properties format one would run: xsltproc to_property.xslt my_config.xml > my_config.properties

JSON

JSON provides a rich language for expression similar to XML. JSON is often used for data interchange, now often used in AJAX web-requests, etc. However, JSON,

Using with Python

Python has a very feature-rich JSON module which takes the JSON objects and arrays and all their pairs and members representing them akin to native Python list() and dict() objects. Further, the JSON module can provide very rich encoding and decoding functionality, as evidenced in the module’s PyDoc and particular when using hooks for encoding and decoding.

Leave a Reply