Category Archives: Hadoop

Posts about Apache Hadoop and the related ecosystem

Map/Reduce diff(1)

This has sadly been a draft for years, so time to release it… diff(1) For those who use Unix, you have likely come across two files and wanted to see what was different between the two. Certainly, one can compare size (highly inaccurate), use a hash function (if a strong cryptographic hash, it will be […]

Configuration Files

Many systems have requirements to store configuration parameters. In these systems, a number of choices can be made for how to store that data; sometimes this diversity is painful, however. Choices for storing configuration data are often: Firefox uses and Apple often chooses to use sqlite3 databases1 Python programs often use ConfigParser to process¬†initialization (.ini) […]

Accessing Kerberized HDFS via Jython

How to access HDFS on a Kerberos secured Hadoop cluster — code and background!