Tuesday, October 24, 2017

SolarQuant gets a push forward from University of Auckland

One of the exciting developments at Greenstage Power has been some collaboration with University of Auckland on our experimental machine learning module called SolarQuant. This is the stand alone app server that takes consumption or generation data and aims to learn how that energy flow happened, based on the context of when it happened and what the environmental conditions were at the time. We had a visitor to New Zealand from MIT Engineering named Paige Studer, and she was instrumental at giving SolarQuant a push forward. We interviewed Paige below on the project:


Tell us a little about what you worked on at UofA
During my time at the University of Auckland, I had the privilege to work on SolarQuant, which is a program that is working to accurately predict a building’s energy consumption given a set of inputs such as time, weather, temperature, etc. When I arrived at the UofA, the current state was that SolarQuant could take inputs and a building’s energy consumption to find weights for each of the inputs. Then given only the inputs and found weights, it would map how closely the calculated energy consumption matched the actually energy consumption. The next step was to see if we could get similar results by taking predicted inputs, getting a calculated energy consumption, and compare it with the actual inputs with actual consumption.
One of the main factors in being able to do this was formatting the predicted weather so that it looked the same as the actual weather, with the exception of a type id showing that it was predicted and not actual. The predicted weather was taken from Norwegian weather, in the form of an XML file. The program would go through the file and find entries that had all of the information that we needed and added them to an initial array. This array with the predicted weather data had problems, such as not being sorted, having repeating information, etc. This initial array needed to be cleaned up and adjusted to make it look like real data. A second array was constructed so that the time of each prediction was in chronological order and separated by thirty minutes, without any repeating or missing times. Once this was completed, the program would go through that array, create weather datum objects and place those objects into a database to be used in the future.
Because the future weather in the database looked the same as actual weather in the database, we could use it on the SolarQuant platform. From here the program takes the future weather data, downloads it, and instead of training it on energy consumption, it skips straight to the questioning stage since it is the future there is no energy consumption to train off of. After this, John was going to add his code and we would hopefully see predicted energy consumption and eventually compare this with actual energy consumption for the same time period.
Do you think it will work?
Yes, of course I think it will work! Theoretically it will, so if it doesn’t right away it would be due to some bugs in the code that can be fixed. I’m very excited to see where it goes in the future once it is working, because there are some pretty cool applications. One in particular that I find to be interesting is if we can accurately predict the weather and a building’s energy consumption, given a solar/battery system, you could potentially become smarter at when to charge and discharge your battery.
As a developer what are the challenges SolarQuant is going to have – what should we get ready for?
I think that SolarQuant will only be getting better and faster, and that it will be important to stay flexible and be able to adjust with the program. For instance, one thing that John and I had talked about was possibly using a different weather source for predicted weather and how to handle it. Do you make one function that can handle all different weather sources, make a function for each weather source, etc? Being open to change in the code and sources in the future will make a difference in how well SolarQuant will continue to progress. I think one idea that John reiterated that was helpful is we want to walk before we run, meaning let’s make small additions/changes and make sure that works before progressing. We don’t want to write all this code and have it not work without us knowing why.
Did you like NZ? We heard you went bungy jumping!?
New Zealand was absolutely awesome! I loved meeting new people, learning about the Maori culture, and especially loved the adventure atmosphere of New Zealand. On the weekends I was able to go on lots of side trips, my favorite being Queenstown where I did the Kawarau Bridge bungy jump, Waitomo black water rafting, and sand boarding while I was visiting the Bay of Islands.
What are the next plans, where to?
I will begin working at a solar energy company in Southern California that specializes in getting schools solar energy, often in the form of carports. I will be an Assistant Project Engineer there, and I hope to learn more about solar energy projects, continue to grow my skillsets, and make a positive impact on the community.

SolarQuant comparing predicted consumption (blue) with actual (orange) consumption after training on 1 year of data

Some of the results from Paige's work coming in are shown above in a screenshot of the SolarQuant interface. The trained network predicted a time series in light blue here, and the actual power consumption is shown in orange. Thanks once again to Dr. Nirmal Nair at University of Auckland ECE who made this possible! And developers can checkout SolarQuant as it progresses here: git@github.com:SolarNetwork/solarquant.git

Sunday, May 28, 2017

FreeBSD with Poudriere on ZFS with custom compiler toolchain

The SolarNetwork main infrastructure has always run on FreeBSD. FreeBSD is great for allowing packages to be built with options suited to how you want to use it, by building packages from source via the ports tree. FreeBSD has evolved over the years since SolarNetwork started to distributing binary packages via the pkg tool. That can save a lot of time, not having to compile all the software used from source, but doesn't work if some package needs a different set of compiled-in options than provided by FreeBSD itself. Additionally, I'd been compiling the packages using a specific version of Clang/LLVM rather than the one used by FreeBSD (originally because one package wouldn't compile without a newer compiler version than used by FreeBSD).

Fast forward to now, and FreeBSD has a tool called poudriere, which can compile a set of packages with exactly the options needed and publish them as a FreeBSD package repository, from which any FreeBSD machine can then use to download the binary packages from and install them via pkg. It's a bit like starting your own Linux distro, picking just the software and compile options you need and distributing them as pre-built binary packages.

Finally I took the time to set up a FreeBSD build machine running poudriere (in a virtual machine) and can much more easily perform updates on the SolarNetwork infrastructure. There was just one major stumbling block along the way: I didn't know how to get pourdriere to use the specific version of Clang I needed. There is plenty of information online about setting up poudriere, but I wasn't able to find information online about getting it to use a custom compiler toolchain. After some trial and error, here's how I finally ended up accomplishing it:

Create toolchain package repository

Poudriere works with FreeBSD jails to manage package repositories. Each package distribution uses its own jail with its own configuration such as what compiler options to use and which packages to compile. The first task is to create a package repository with the toolchain packages needed, in my case this is provided by the devel/llvm39 package. This toolchain repository can then be installed in other poudriere build jails to serve as their compiler.

Once poudriere was installed and configured properly, the steps looked like this:

# Create jail
poudriere jail -c -j toolchain_103x64 -v 10.3-RELEASE
mkdir /usr/local/etc/poudriere.d/toolchain_103x64-options

# Create port list (for this jail, just the toolchain needed, devel/llvm39)
echo 'devel/llvm39' >/usr/local/etc/poudriere.d/toolchain-port-list

# Update to latest (each time build)
poudriere jail -u -j toolchain_103x64
poudriere ports -u -p HEAD

# Configure options
poudriere options -j toolchain_103x64 -p HEAD \
    -f /usr/local/etc/poudriere.d/toolchain-port-list

# Build packages
poudriere bulk -j toolchain_103x64 -p HEAD \
    -f /usr/local/etc/poudriere.d/toolchain-port-list

After quite some time (llvm takes a terribly long time to compile!) the toolchain packages were built and I had nginx configured to serve them up via HTTP.

Create target system package repository

Now it was time to build the packages for a specific target system. In this case I am using the example of building a Postgres 9.6 based database server system, but the steps are the same for any system.

First, I created the system's poudriere jail:

# Create jail
poudriere jail -c -j postgres96_103x64 -v 10.3-RELEASE

# Create port list for packages needed
echo 'databases/postgresql96-server' \
    >/usr/local/etc/poudriere.d/postgres96-port-list

echo 'databases/postgresql96-contrib' \
    >>/usr/local/etc/poudriere.d/postgres96-port-list

echo 'databases/postgresql-plv8js' \
    >>/usr/local/etc/poudriere.d/postgres96-port-list

# Configure options
poudriere options -j postgres96 -p HEAD \
    -f /usr/local/etc/poudriere.d/postgres96-port-list

Second, install the llvm39 toolchain, using the custom toolchain repository:

# chroot into the build jail
chroot /usr/local/poudriere/jails/postgres96_103x64

# enable dns resolution for the build server (if DNS names to be used)
echo 'nameserver 192.168.1.1' > /etc/resolv.conf

# Copy /usr/local/etc/ssl/certs/poudriere.cert from HOST
# to /usr/local/etc/ssl/certs/poudriere.cert in JAIL
mkdir -p /usr/local/etc/ssl/certs
# manually copy poudriere.cert here

Then I configured pkg to use the toolchain repository via a /usr/local/etc/pkg/repos/poudriere.conf file:

poudriere: {
    url: "http://poudriere/packages/toolchain_103x64-HEAD/",
    mirror_type: "http",
    signature_type: "pubkey",
    pubkey: "/usr/local/etc/ssl/certs/poudriere.cert",
    enabled: yes,
    priority: 100
}

The URL in this configuration resolves to the directory where poudriere build the packages, served by nginx. Next I installed the toolchain, explicitly telling pkg to use this repository:

pkg update
pkg install -r poudriere llvm39

# clean up and exit the chroot
rm /etc/resolv.conf
exit

Now I can configure poudriere to use the toolchain by creating a /usr/local/etc/poudriere.d/postgres96_103x64-make.conf file with content like this:

# Use clang
CC=clang39
CXX=clang++39
CPP=clang-cpp39

DEFAULT_VERSIONS+=pgsql=9.6 ssl=openssl

The next step is what took me the longest to figure out, probably because I had not studied how poudriere works with ZFS very carefully. It turns out poudriere makes a snapshot of the jail named clean, and then clones that snapshot each time it performs a build. So all I needed to do was re-create that snapshot:
 
# Recreate snapshot for build
zfs destroy zpoud/poudriere/jails/postgres96_103x64@clean
zfs snapshot zpoud/poudriere/jails/postgres96_103x64@clean

Finally, the build can begin normally, and the custom toolchain will be used:

# Build packages
poudriere bulk -j postgres96_103x64 -p HEAD \
    -f /usr/local/etc/poudriere.d/postgres96-port-list

Update target system to use poudriere repository

Once the system's build is complete, it is possible to configure pkg on that system to use the toolchain repository via a /usr/local/etc/pkg/repos/poudriere.conf file:

poudriere: {
    url: "http://poudriere/packages/postgres96_103x64-HEAD/",
    mirror_type: "http",
    signature_type: "pubkey",
    pubkey: "/usr/local/etc/ssl/certs/poudriere.cert",
    enabled: yes,
    priority: 100
}

Then I copied the certificate from the build host to the file as configured above. I no longer want to use the default FreeBSD packages on this system, so I created a /usr/local/etc/pkg/repos/freebsd.conf file to disable it, with the following content:

FreeBSD: {
    enabled: no
}

Done! Now, after running pkg update, all packages will install from the poudriere repository, and I no longer need to compile the software on the system itself.

Friday, May 26, 2017

VivaTech 2017

This year Greenstage will be at VivaTech in France from June 15th to 17th.  We will be sharing our distributed energy solutions with the world and showing off our latest and greatest R&D.

Viva Technology
If you are planning on coming along, check us out in the VivaTech Vinci Energy Lab.

See you there!

Sunday, December 11, 2016

SolarQuant: Experimental Deep Learning for SolarNetwork : Part 1


Experiments with Energy Signatures

One of the interesting aspects of energy management is being able to react to trends in your energy flows to optimise the way energy is used at your home or business. Understanding the patterns in a building’s energy use allows energy managers to plan and scope the kinds of energy sources needed and figure out how much money that energy will cost them today and in the future. Until recently, this has been something studied by local authorities and planners within large corporations or organisations. However, as rooftop solar systems are becoming more and more common, these energy flows are something you can actually inspect yourself. With SolarNetwork, we’ve demonstrated easy and affordable ways to roll-out energy monitoring across your home or business, using a variety of devices. Once that data is stored in SolarNet, the cloud repository, you can also get to it via the SolarQuery RESTful API from just about any development platform you choose. What we are experimenting with now is one of those applications which we’re calling SolarQuant.

SolarQuant is an application server being written in PHP and MySQL, accessing a non-linear neural network program called emergent. The aim is to produce a "point and shoot" application server that can watch the data coming from a SolarNode - whether it is a single circuit or multiple circuits - and develop an "energy signature" to characterise how energy is used or generated at that location.

emergent is only one of many open-source “deep learning” applications that “learn” based on the examples within a prepared data set. The method of how a neural network works, as I understand it, is that it is a software able to find patterns in data after being exposed to a set of “trials” which represent the examples of how a system has been performing in a certain context represented by input variables. It looks at each of these examples, and runs through them iteratively as a set, creating what are a series of weights that drive a non-linear solution to the problem. It is iterative because it aims to find the best solution by modifying those input weights to see if that helps it get closer to the solution - but it needs to run through all the examples to gauge whether this modification was an improvement. The theory is that by doing this over and over, always following what worked best last time, the neural network gains “experience” with the data, and eventually builds a model - described mostly by the weights it settles to - that may closely describe the real world as measured by the data you collected. And it’s not something that's a “rule of thumb” or a simple calculation - it is a site-specific, non-linear solution that possibly can only be approached using this empirical methodology. I don’t know if the “Three-body Problem” is a proper analogy but I always think of the fact that while two planets in space can have their orbits with relation to each other described by a formula involving their respective masses and the distance between them, there is no such formula for 3 planetary bodies. It’s not that the orbits of 3 planets cannot be found, it’s just that it relies on circumstantial, empirical data and not a formula you can plug values into. Similarly, I believe there is no real ‘formula’ to how you use or generate energy - it just depends on the day, the time, what you’re doing, what kind of loads you have and possibly what the weather is like. We’re calling that resulting pattern your energy signature because the line of energy use - mostly a continuous time series - has that kind of shape:

Example of an Energy Signature (lighting)

Example of an Energy Signature (hot water)
 
ARM vs. x86-64 vs. GPU

So there are a few neural network applications and they run on different processors. And on top of that, there are different ‘algorithms’ that each use to accomplish the development of correlations. In our tests, we are currently using a 3-layer network Back-Propagation algorithm which suited the task, and it is one of the most standard types of neural networks. Much more complex architectures exist today and their complexity is actively handled by what seems to be a geometric growth of faster and cheaper multi-processor machines connected in clusters.

Possibly the most interesting innovation these days for deep learning has been the use of GPUs, using software frameworks that can tap the power of their parallel processing ability. Once such software framework is called CUDA (Compute Unified Device Architecture) from nVidia. This framework has been recently supported by emergent, and from the tests carried out by members of that project, the promised acceleration versus CPU is compelling - possibly a 2X - 10X multiple over a 4-core Intel processor and this performance can be achieved from a rather average gaming video card. Given the timeframe needed to develop an accurate energy signature for a building, we thought it would be good to test this out with SolarQuant.

Selecting your video card At the moment, the only GPU harnessing libraries that are supported by emergent are CUDA based, which leverages the nVidia chipsets. What seems to be one measure of expected performance is the number of CUDA “cores” on the video card. Given it was an experiment, I purchased the lowest cost nVidia card I could, namely the GTX750Ti, for around $238 NZD. This device seems to have 640 cores. To put this in context, you can see the list of different nVidia adapters and their cores here. So there are clearly more powerful units out there with several thousand CUDA cores - wow! That’s something to look forward to. Given the beast of a machine that could be built to do this job, I thought we better give these machines names to keep track. So this one is going to be called Taniwha One as an 4-core X86-64 box on emergent v7 and Taniwha Two as a 640-core CUDA computer on emergent v8.

Ubuntu 16.04 and LAMP I had the most recent version of Ubuntu, 16.10, loaded on a 3.4Ghz quad-core Intel i7 2600 machine, but when it came to compiling the special CUDA version of emergent, it did not work due to a change in the Qt5 webengine module which was needed for the application. So I installed in parallel a boot of Ubuntu 16.04, and also installed the CUDA 8.0 SDK available for download (quite large I thought at 1.9GB, but nothing major these days with broadband) from nVidia for Ubuntu 16.04 here. I followed the Build Linux directions for emergent here but with the added argument to the ./configure command:
--cuda
It chugged away for quite a while, but once it finished I set my environmental variable for LD_LIBRARYPATH using my .bashrc file:

export LD_LIBRARY_PATH=$HOME/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH


And rebooted. I was then able to login and type into a command prompt window:

emergent_cuda


and up came the familiar GUI development toolset - but it looked pretty stylish this time compared with version 7.
emergent version 8
Emergent version 8
One of the nice things about emergent has been that you can do all your network design, testing and data loading using a GUI tool. The flashy and responsive visualisation of the neural network learning process shows the training process as it happens, and the falling SSE, or Sum Squared Error as it “learns”. For our application, that means that the lower the SSE goes, the better the signature we’ve developed for this SolarNode. You can see a short video of emergent running on an early network we built here but essentially what matters is that over "Batches" of "Epochs" the network trains on data sets, and develops a signature to describe that data set. This chart below shows for example 3 Batches, each with 1000 epochs, an epoch being one run through your dataset of several thousands trials. As for results of this training, you can see that while not exact, the actual (orange) versus the neural network model of that data once trained (blue) is pretty close.
3 Batches showing falling SSE as the NN learns

Training results (orange=actual, blue=trained)

Compiling emergent v7 and emergent v8 emergent has a new version (v8) but because it is still pretty new, a lot of the development SolarQuant underway has been tested on a stable version (v7.01) which runs well on both Debian and Ubuntu Linux from my experience. And because emergent can run as graphical application or via the command line, it’s a great environment to try out new correlations in a once-off manner with the GUI on your workstation or laptop, or run scheduled training sessions overnight with cron scripting via the commandline. Changing between the two versions on the same boot OS I believe is possible... but just to keep things clean and working, I have separated the two boot partitions. You wouldn't think a laptop could be powerful enough to run a massive compute intensive program like emergent, but it runs fine alongside a LAMP server on a $269 (USD) Dell Inspiron 14. I recently saw this machine however, and thought it might be a great workstation for this kind of work. And depending on how fast this GTX750Ti is, it may be something to start saving for - the GTX1060 has 1920 cores! - a multiple of 3 from Taniwha Two. Taniwha Three perhaps? In each of the emergent runtimes, you can set the number of epochs you want to run through - which is essentially how many times you go through your data set, and then batches of these epochs. You can see those 3 falling peaks in the chart above - those are batches. You can even split those batches across multiple processors I understand for further scalability, although I have not tried this yet. After each successful run, you can also output the weights matrix that your network finally got to - these small files hold the cumulative product of the training. You reference this weights matrix when you want to bounce new data off a trained network to get an "answer". Rebuilding my Back-Propagation network The only "algorithm" within emergent that can currently take advantage of the CUDA framework is the "bp" or Back-Propagation algorithm. While not the most advanced out there, it is probably appropriate for these kinds of time series experiments, and luckily that is what I've been testing with in SolarQuant. One of the cool aspects of emergent is that it contains a whole scripting language of its own which you can use to customise what you want it to do when your network project gets loaded. By passing arguments to the command line for emergent, you can tell it which program you want to run, which batches you want to run, and other important information like input data files to work with or matrix weights to start with. Because version 8 is a whole new build, the project file that I developed in v7 cannot be simply opened in v8, but has to be “rebuilt” which is essentially creating it again, defining the layers and setting the programs, using the new GUI. We’ll see how that process goes, but I intend to be able to do side by side comparisons between Taniwha One and Taniwha Two. Next post: Check in on how we’re doing with performance testing in Part 2...

Monday, December 5, 2016

University of Canterbury Motorsport (UCM)

Greenstage is proud to be supporting the University of Canterbury Motorsport team with their first full electric entry in the Formula SAE competition.  They have put together a beast of a car which is looking likely to do the business!

UCM's 2016 Formula SAE entry 
Earlier in the year Greenstage provided advice on battery pack building and the UCM team paid a few visits to make use of our Myachi Unitech battery welder (which we use for nickle or copper foil welding during pack building with 18650 or 26650 cells).

The engineering skills and quick iterations and improvements in the UCM pack designs were impressive.
UCM engineers in action
It's pleasing to see these skills and capabilities being fostered by University of Canterbury.  I would strongly recommend any budding Engineering student to get involved in a Formula SAE team if they get the chance (especially if it's electric).

Good luck to the UCM team as they head into the competition phase!

Monday, September 19, 2016

SolarNetwork guides updated

I've been updating and adding to our user and developer guides for the SolarNetwork platform. Remember the whole SolarNetwork platform is open source, so it's easy to jump in and start using it.  If you need any help or you have a project that needs development resources, please do get in touch.  We are ready and waiting to help.