Sunday, May 28, 2017

FreeBSD with Poudriere on ZFS with custom compiler toolchain

The SolarNetwork main infrastructure has always run on FreeBSD. FreeBSD is great for allowing packages to be built with options suited to how you want to use it, by building packages from source via the ports tree. FreeBSD has evolved over the years since SolarNetwork started to distributing binary packages via the pkg tool. That can save a lot of time, not having to compile all the software used from source, but doesn't work if some package needs a different set of compiled-in options than provided by FreeBSD itself. Additionally, I'd been compiling the packages using a specific version of Clang/LLVM rather than the one used by FreeBSD (originally because one package wouldn't compile without a newer compiler version than used by FreeBSD).

Fast forward to now, and FreeBSD has a tool called poudriere, which can compile a set of packages with exactly the options needed and publish them as a FreeBSD package repository, from which any FreeBSD machine can then use to download the binary packages from and install them via pkg. It's a bit like starting your own Linux distro, picking just the software and compile options you need and distributing them as pre-built binary packages.

Finally I took the time to set up a FreeBSD build machine running poudriere (in a virtual machine) and can much more easily perform updates on the SolarNetwork infrastructure. There was just one major stumbling block along the way: I didn't know how to get pourdriere to use the specific version of Clang I needed. There is plenty of information online about setting up poudriere, but I wasn't able to find information online about getting it to use a custom compiler toolchain. After some trial and error, here's how I finally ended up accomplishing it:

Create toolchain package repository

Poudriere works with FreeBSD jails to manage package repositories. Each package distribution uses its own jail with its own configuration such as what compiler options to use and which packages to compile. The first task is to create a package repository with the toolchain packages needed, in my case this is provided by the devel/llvm39 package. This toolchain repository can then be installed in other poudriere build jails to serve as their compiler.

Once poudriere was installed and configured properly, the steps looked like this:

# Create jail
poudriere jail -c -j toolchain_103x64 -v 10.3-RELEASE
mkdir /usr/local/etc/poudriere.d/toolchain_103x64-options

# Create port list (for this jail, just the toolchain needed, devel/llvm39)
echo 'devel/llvm39' >/usr/local/etc/poudriere.d/toolchain-port-list

# Update to latest (each time build)
poudriere jail -u -j toolchain_103x64
poudriere ports -u -p HEAD

# Configure options
poudriere options -j toolchain_103x64 -p HEAD \
    -f /usr/local/etc/poudriere.d/toolchain-port-list

# Build packages
poudriere bulk -j toolchain_103x64 -p HEAD \
    -f /usr/local/etc/poudriere.d/toolchain-port-list

After quite some time (llvm takes a terribly long time to compile!) the toolchain packages were built and I had nginx configured to serve them up via HTTP.

Create target system package repository

Now it was time to build the packages for a specific target system. In this case I am using the example of building a Postgres 9.6 based database server system, but the steps are the same for any system.

First, I created the system's poudriere jail:

# Create jail
poudriere jail -c -j postgres96_103x64 -v 10.3-RELEASE

# Create port list for packages needed
echo 'databases/postgresql96-server' \
    >/usr/local/etc/poudriere.d/postgres96-port-list

echo 'databases/postgresql96-contrib' \
    >>/usr/local/etc/poudriere.d/postgres96-port-list

echo 'databases/postgresql-plv8js' \
    >>/usr/local/etc/poudriere.d/postgres96-port-list

# Configure options
poudriere options -j postgres96 -p HEAD \
    -f /usr/local/etc/poudriere.d/postgres96-port-list

Second, install the llvm39 toolchain, using the custom toolchain repository:

# chroot into the build jail
chroot /usr/local/poudriere/jails/postgres96_103x64

# enable dns resolution for the build server (if DNS names to be used)
echo 'nameserver 192.168.1.1' > /etc/resolv.conf

# Copy /usr/local/etc/ssl/certs/poudriere.cert from HOST
# to /usr/local/etc/ssl/certs/poudriere.cert in JAIL
mkdir -p /usr/local/etc/ssl/certs
# manually copy poudriere.cert here

Then I configured pkg to use the toolchain repository via a /usr/local/etc/pkg/repos/poudriere.conf file:

poudriere: {
    url: "http://poudriere/packages/toolchain_103x64-HEAD/",
    mirror_type: "http",
    signature_type: "pubkey",
    pubkey: "/usr/local/etc/ssl/certs/poudriere.cert",
    enabled: yes,
    priority: 100
}

The URL in this configuration resolves to the directory where poudriere build the packages, served by nginx. Next I installed the toolchain, explicitly telling pkg to use this repository:

pkg update
pkg install -r poudriere llvm39

# clean up and exit the chroot
rm /etc/resolv.conf
exit

Now I can configure poudriere to use the toolchain by creating a /usr/local/etc/poudriere.d/postgres96_103x64-make.conf file with content like this:

# Use clang
CC=clang39
CXX=clang++39
CPP=clang-cpp39

DEFAULT_VERSIONS+=pgsql=9.6 ssl=openssl

The next step is what took me the longest to figure out, probably because I had not studied how poudriere works with ZFS very carefully. It turns out poudriere makes a snapshot of the jail named clean, and then clones that snapshot each time it performs a build. So all I needed to do was re-create that snapshot:
 
# Recreate snapshot for build
zfs destroy zpoud/poudriere/jails/postgres96_103x64@clean
zfs snapshot zpoud/poudriere/jails/postgres96_103x64@clean

Finally, the build can begin normally, and the custom toolchain will be used:

# Build packages
poudriere bulk -j postgres96_103x64 -p HEAD \
    -f /usr/local/etc/poudriere.d/postgres96-port-list

Update target system to use poudriere repository

Once the system's build is complete, it is possible to configure pkg on that system to use the toolchain repository via a /usr/local/etc/pkg/repos/poudriere.conf file:

poudriere: {
    url: "http://poudriere/packages/postgres96_103x64-HEAD/",
    mirror_type: "http",
    signature_type: "pubkey",
    pubkey: "/usr/local/etc/ssl/certs/poudriere.cert",
    enabled: yes,
    priority: 100
}

Then I copied the certificate from the build host to the file as configured above. I no longer want to use the default FreeBSD packages on this system, so I created a /usr/local/etc/pkg/repos/freebsd.conf file to disable it, with the following content:

FreeBSD: {
    enabled: no
}

Done! Now, after running pkg update, all packages will install from the poudriere repository, and I no longer need to compile the software on the system itself.

Friday, May 26, 2017

VivaTech 2017

This year Greenstage will be at VivaTech in France from June 15th to 17th.  We will be sharing our distributed energy solutions with the world and showing off our latest and greatest R&D.

Viva Technology
If you are planning on coming along, check us out in the VivaTech Vinci Energy Lab.

See you there!

Sunday, December 11, 2016

SolarQuant: Experimental Deep Learning for SolarNetwork : Part 1


Experiments with Energy Signatures

One of the interesting aspects of energy management is being able to react to trends in your energy flows to optimise the way energy is used at your home or business. Understanding the patterns in a building’s energy use allows energy managers to plan and scope the kinds of energy sources needed and figure out how much money that energy will cost them today and in the future. Until recently, this has been something studied by local authorities and planners within large corporations or organisations. However, as rooftop solar systems are becoming more and more common, these energy flows are something you can actually inspect yourself. With SolarNetwork, we’ve demonstrated easy and affordable ways to roll-out energy monitoring across your home or business, using a variety of devices. Once that data is stored in SolarNet, the cloud repository, you can also get to it via the SolarQuery RESTful API from just about any development platform you choose. What we are experimenting with now is one of those applications which we’re calling SolarQuant.

SolarQuant is an application server being written in PHP and MySQL, accessing a non-linear neural network program called emergent. The aim is to produce a "point and shoot" application server that can watch the data coming from a SolarNode - whether it is a single circuit or multiple circuits - and develop an "energy signature" to characterise how energy is used or generated at that location.

emergent is only one of many open-source “deep learning” applications that “learn” based on the examples within a prepared data set. The method of how a neural network works, as I understand it, is that it is a software able to find patterns in data after being exposed to a set of “trials” which represent the examples of how a system has been performing in a certain context represented by input variables. It looks at each of these examples, and runs through them iteratively as a set, creating what are a series of weights that drive a non-linear solution to the problem. It is iterative because it aims to find the best solution by modifying those input weights to see if that helps it get closer to the solution - but it needs to run through all the examples to gauge whether this modification was an improvement. The theory is that by doing this over and over, always following what worked best last time, the neural network gains “experience” with the data, and eventually builds a model - described mostly by the weights it settles to - that may closely describe the real world as measured by the data you collected. And it’s not something that's a “rule of thumb” or a simple calculation - it is a site-specific, non-linear solution that possibly can only be approached using this empirical methodology. I don’t know if the “Three-body Problem” is a proper analogy but I always think of the fact that while two planets in space can have their orbits with relation to each other described by a formula involving their respective masses and the distance between them, there is no such formula for 3 planetary bodies. It’s not that the orbits of 3 planets cannot be found, it’s just that it relies on circumstantial, empirical data and not a formula you can plug values into. Similarly, I believe there is no real ‘formula’ to how you use or generate energy - it just depends on the day, the time, what you’re doing, what kind of loads you have and possibly what the weather is like. We’re calling that resulting pattern your energy signature because the line of energy use - mostly a continuous time series - has that kind of shape:

Example of an Energy Signature (lighting)

Example of an Energy Signature (hot water)
 
ARM vs. x86-64 vs. GPU

So there are a few neural network applications and they run on different processors. And on top of that, there are different ‘algorithms’ that each use to accomplish the development of correlations. In our tests, we are currently using a 3-layer network Back-Propagation algorithm which suited the task, and it is one of the most standard types of neural networks. Much more complex architectures exist today and their complexity is actively handled by what seems to be a geometric growth of faster and cheaper multi-processor machines connected in clusters.

Possibly the most interesting innovation these days for deep learning has been the use of GPUs, using software frameworks that can tap the power of their parallel processing ability. Once such software framework is called CUDA (Compute Unified Device Architecture) from nVidia. This framework has been recently supported by emergent, and from the tests carried out by members of that project, the promised acceleration versus CPU is compelling - possibly a 2X - 10X multiple over a 4-core Intel processor and this performance can be achieved from a rather average gaming video card. Given the timeframe needed to develop an accurate energy signature for a building, we thought it would be good to test this out with SolarQuant.

Selecting your video card At the moment, the only GPU harnessing libraries that are supported by emergent are CUDA based, which leverages the nVidia chipsets. What seems to be one measure of expected performance is the number of CUDA “cores” on the video card. Given it was an experiment, I purchased the lowest cost nVidia card I could, namely the GTX750Ti, for around $238 NZD. This device seems to have 640 cores. To put this in context, you can see the list of different nVidia adapters and their cores here. So there are clearly more powerful units out there with several thousand CUDA cores - wow! That’s something to look forward to. Given the beast of a machine that could be built to do this job, I thought we better give these machines names to keep track. So this one is going to be called Taniwha One as an 4-core X86-64 box on emergent v7 and Taniwha Two as a 640-core CUDA computer on emergent v8.

Ubuntu 16.04 and LAMP I had the most recent version of Ubuntu, 16.10, loaded on a 3.4Ghz quad-core Intel i7 2600 machine, but when it came to compiling the special CUDA version of emergent, it did not work due to a change in the Qt5 webengine module which was needed for the application. So I installed in parallel a boot of Ubuntu 16.04, and also installed the CUDA 8.0 SDK available for download (quite large I thought at 1.9GB, but nothing major these days with broadband) from nVidia for Ubuntu 16.04 here. I followed the Build Linux directions for emergent here but with the added argument to the ./configure command:
--cuda
It chugged away for quite a while, but once it finished I set my environmental variable for LD_LIBRARYPATH using my .bashrc file:

export LD_LIBRARY_PATH=$HOME/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH


And rebooted. I was then able to login and type into a command prompt window:

emergent_cuda


and up came the familiar GUI development toolset - but it looked pretty stylish this time compared with version 7.
emergent version 8
Emergent version 8
One of the nice things about emergent has been that you can do all your network design, testing and data loading using a GUI tool. The flashy and responsive visualisation of the neural network learning process shows the training process as it happens, and the falling SSE, or Sum Squared Error as it “learns”. For our application, that means that the lower the SSE goes, the better the signature we’ve developed for this SolarNode. You can see a short video of emergent running on an early network we built here but essentially what matters is that over "Batches" of "Epochs" the network trains on data sets, and develops a signature to describe that data set. This chart below shows for example 3 Batches, each with 1000 epochs, an epoch being one run through your dataset of several thousands trials. As for results of this training, you can see that while not exact, the actual (orange) versus the neural network model of that data once trained (blue) is pretty close.
3 Batches showing falling SSE as the NN learns

Training results (orange=actual, blue=trained)

Compiling emergent v7 and emergent v8 emergent has a new version (v8) but because it is still pretty new, a lot of the development SolarQuant underway has been tested on a stable version (v7.01) which runs well on both Debian and Ubuntu Linux from my experience. And because emergent can run as graphical application or via the command line, it’s a great environment to try out new correlations in a once-off manner with the GUI on your workstation or laptop, or run scheduled training sessions overnight with cron scripting via the commandline. Changing between the two versions on the same boot OS I believe is possible... but just to keep things clean and working, I have separated the two boot partitions. You wouldn't think a laptop could be powerful enough to run a massive compute intensive program like emergent, but it runs fine alongside a LAMP server on a $269 (USD) Dell Inspiron 14. I recently saw this machine however, and thought it might be a great workstation for this kind of work. And depending on how fast this GTX750Ti is, it may be something to start saving for - the GTX1060 has 1920 cores! - a multiple of 3 from Taniwha Two. Taniwha Three perhaps? In each of the emergent runtimes, you can set the number of epochs you want to run through - which is essentially how many times you go through your data set, and then batches of these epochs. You can see those 3 falling peaks in the chart above - those are batches. You can even split those batches across multiple processors I understand for further scalability, although I have not tried this yet. After each successful run, you can also output the weights matrix that your network finally got to - these small files hold the cumulative product of the training. You reference this weights matrix when you want to bounce new data off a trained network to get an "answer". Rebuilding my Back-Propagation network The only "algorithm" within emergent that can currently take advantage of the CUDA framework is the "bp" or Back-Propagation algorithm. While not the most advanced out there, it is probably appropriate for these kinds of time series experiments, and luckily that is what I've been testing with in SolarQuant. One of the cool aspects of emergent is that it contains a whole scripting language of its own which you can use to customise what you want it to do when your network project gets loaded. By passing arguments to the command line for emergent, you can tell it which program you want to run, which batches you want to run, and other important information like input data files to work with or matrix weights to start with. Because version 8 is a whole new build, the project file that I developed in v7 cannot be simply opened in v8, but has to be “rebuilt” which is essentially creating it again, defining the layers and setting the programs, using the new GUI. We’ll see how that process goes, but I intend to be able to do side by side comparisons between Taniwha One and Taniwha Two. Next post: Check in on how we’re doing with performance testing in Part 2...

Monday, December 5, 2016

University of Canterbury Motorsport (UCM)

Greenstage is proud to be supporting the University of Canterbury Motorsport team with their first full electric entry in the Formula SAE competition.  They have put together a beast of a car which is looking likely to do the business!

UCM's 2016 Formula SAE entry 
Earlier in the year Greenstage provided advice on battery pack building and the UCM team paid a few visits to make use of our Myachi Unitech battery welder (which we use for nickle or copper foil welding during pack building with 18650 or 26650 cells).

The engineering skills and quick iterations and improvements in the UCM pack designs were impressive.
UCM engineers in action
It's pleasing to see these skills and capabilities being fostered by University of Canterbury.  I would strongly recommend any budding Engineering student to get involved in a Formula SAE team if they get the chance (especially if it's electric).

Good luck to the UCM team as they head into the competition phase!

Monday, September 19, 2016

SolarNetwork guides updated

I've been updating and adding to our user and developer guides for the SolarNetwork platform. Remember the whole SolarNetwork platform is open source, so it's easy to jump in and start using it.  If you need any help or you have a project that needs development resources, please do get in touch.  We are ready and waiting to help.

Tuesday, July 28, 2015

PicoGrid Energy Audit

While there is no doubt that renewable energy has the potential to be very cost-effective and a highly beneficial option for many buildings, it is important to select a reliable project partner with your best interests at heart.

Before going solar, you need to determine the right type and size of system based on your roof size location and angle and based on your electricity consumption habits.  That way you can ensure you are investing in the best possible system for your future.

A PicoGrid Energy Audit allows you to better understand this context by providing you with a personalised assessment of your energy consumption habits and the generation potential of your roof.

We believe that the best way to promote the development of renewable energy is to make renewable energy projects more business-friendly for all project partners. Indeed, the aim is not to sell you the biggest solar PV system possible at the cheapest possible price, but to help you select the most suitable solution for your building in order to maximize the financial return (ROI) and hence reduce your costs.

What you want to avoid buying is an oversized PV system that exports excessive power to the grid, with low feed-in tariffs in New Zealand, that is just wasting money.  Better to stagger the system over time to suit your needs and to invest in appropriate energy efficiency measures to improve the financial return of the project.

This is all about knowing and understanding your energy information and this is what the PicoGrid Energy Audit will do for you.  Below is an overview of the PicoGrid Energy Audit process: 

PicoGrid Energy Audit - For optimal Solar PV project design.
For more information on PicoGrid Energy Audits, please check out the PicoGrid Energy Audit website, or contact us on 0508 742 647 or via email to book an appointment.