Can Haskell save energy?
an exploration of the SBV library
Published on April 25, 2014
I am maintaining the software Plug4Green, which is aiming at saving energy in data centres. It is based on Constraint Programming (CP), which is a programming paradigm devoted to solve Constraint Satisfaction Problems (CSP). Plug4Green is written in Java, using the libraries Choco and BtrPlace.
While CP was a very good choice and fulfilled most of the requirements, there are some practical drawbacks. The first being that the languages used, Choco and Java, are very verbose. Defining a new constraint takes a lot of lines of code and is very error prone. The debugging period for each new constraint is also quite long. This diminishes the flexibility of the tool: creating a new constraints to fit a new requirement arriving in a data centre should be easy and short.
I started exploring alternatives to the couple Constraint Programming/Java, using my favorite programming language: Haskell. Programs in Haskell tend to be much less verbose than in Java (in the order of ten time less lines). It is also a declarative language, like Constraint Programming is, so the expression of constraints is more clear and natural. Haskell is a pure language, which means that no function can have side effects. Every function, for the same arguments, will always yield the same results. This property (purity), combined with the strong type system of Haskell allow to reduce drastically the number of bugs.
In parallel, Satisfiability Modulo Theories (SMT) is an active research area, slightly different than Constraint Programming. A SMT problem is the problem of determining the satisfiability of logical formulas with respect to background theories. Modern SMT solvers integrate a Boolean satisfiability (SAT) solver with specialized solvers for a set of literals belonging to each theory. The constraints are stated over specific domains, typically: Booleans, integers, rationals, reals, finite domains, or combinations of them. The problem consists in finding an assignment to the variables that satisfy all constraints.
To show the usability of both SMT and pure functional languages to tackle energy efficiency problems, I implemented the classical problem of packing VMs on servers using the library SBV, with only one dimension for the sake of simplicity. In the example underneath (full code here), each VM has a demand in term of CPU, and each server has a certain CPU capacity to offer. The objective is to find the placement of the VMs on the servers that minimizes the number of servers needed. The only constraint applied is that the total CPU consumption of the VMs that will be running on a server must not exceed the capacity of that server.
--concrete IDs for VMs and servers
type VMID = Integer
type SID = Integer
--symbolic IDs of the servers
type SSID = SBV SID
--A VM is just a name and a cpuDemand
data VM = VM { vmName :: String,
cpuDemand :: Integer}
--a server has got a name and a certain amount of free CPU
data Server = Server { serverName :: String,
cpuCapacity :: Integer}
--list of VMs
vms :: Map VMID VM
= fromList $ zip [0..] [VM "VM1" 100, VM "VM2" 50, VM "VM3" 15]
vms
--list of servers
servers :: Map SID Server
= fromList $ zip [0..] [Server "Server1" 100, Server "Server2" 100, Server "Server3" 200]
servers
--number of servers ON (which we'll try to minimize)
numberServersOn :: Map VMID SSID -> SInteger
= count . elems . M.map (./= 0) . vmCounts
numberServersOn
--computes the number of VMs on each servers
vmCounts :: Map VMID SSID -> Map SID SInteger
= M.mapWithKey count servers where
vmCounts vmls = sum [ite (mysid .== literal sid) 1 0 | mysid <- elems vmls]
count sid _
--All the CPU constraints
cpuConstraints :: Map VMID SSID -> SBool
= bAnd $ elems $ M.mapWithKey criteria (serverCPUHeights vmls) where
cpuConstraints vmls criteria :: SID -> SInteger -> SBool
= (literal $ cpuCapacity $ fromJust $ M.lookup sid servers) .> height
criteria sid height
--computes the CPU consummed by the VMs on each servers
serverCPUHeights :: Map VMID SSID -> Map SID SInteger
= M.mapWithKey sumVMsHeights servers where
serverCPUHeights vmls sumVMsHeights :: SID -> Server -> SInteger
= sum [ite (sid' .== literal sid) (literal $ cpuDemand $ fromJust $ M.lookup vmid vms) 0 | (vmid, sid') <- M.assocs vmls]
sumVMsHeights sid _
--solves the VM placement problem
vmPlacementProblem :: IO (Maybe (Map VMID SID))
= minimize' numberServersOn cpuConstraints
vmPlacementProblem
= do
main <- vmPlacementProblem
s putStrLn $ show s
When run, this program returns the placement for the VMs that minimizes the number of necessary servers.
In this case, it will place all three VMs on the third server.
While it is difficult to compare, it is anyway striking that this program is shorter than its equivalent in Java/Choco (for example this implementation of bin packing).
The definition of a constraint takes only a few lines (for example numberServersOn
takes 2 lines) and flows with the program definition.
Furthermore, as it is usually the case in Haskell, the type signature of the functions are carrying a lot of information that can be used both by the programmer to understand and reason about the program, and by the compiler to prove its correctness.
For example, the type signature numberServersOn :: Map VMID SSID -> SInteger
makes it clear that the function numberServersOn
is a constraint that takes the positions of all the VMs on the servers (denoted as a mapping between the VM ids and the server symbolic ids) and returns a symbolic integer representing the necessary number of servers.
Furthermore, programming at the symbolic level, as it is required when designing a CSP, is not very different than programming in concrete Haskell.
All we have to do to “switch” to the symbolic level is to prepend SBV
to the concrete type: SBV Integer
(SBV is a type function).
What is elegant is that a lot of the Haskell standard functions, like the function \(sum\) in our example program, can be reused in a constraint programming program.
The definition of sum
in the standard library of Haskell is generic enough to be used also at the symbolic level.
On the other hand, programming in Choco is completely different than programming in concrete Java: all the operators are necessarily different, due to the low genericity of Java.
Therefore, the intuition of the Java programmer cannot be completely reused.
SBV is also a theorem prover, and that can be used to prove properties of the constraints expressed.
For example, we might want to prove some properties about our constraint vmCounts
.
This function counts the number of VMs present on each servers.
We want to prove the property that the count of VMs on a server has for absolute maximum the total numbers of VMs present in the data centre.
This is easily done, for example using GHCi, the interactive interpreter of Haskell.
Main> prove $ \x y -> bAll (.<= 2) $ vmCounts' [x, y]
Q.E.D.
This example shows how we can ask SBV to prove that the number of VMs per server computed by the constraint vmCounts
cannot exceed the total number of VMs (in this simplified example with only 2 VMs and a version of vmCounts defined for lists instead of maps).
SBV simply replies with Q.E.D
, showing that it found a proof for our property.
A proof is a much more powerful way to ensure the correctness of a program than testing: by testing a property of a program, one is effectively trying only a subset of all the possible behaviours of a program, while a proof is exhaustive (is fact, a proof corresponds to a superset of all program behaviours for a certain property).
What is left to do? Benchmarking, of course! There is no guaranty that SMT/SBV will scale up with this type of problems: what if we add more VMs, more servers, more dimensions (RAM, HD, network…) and more constraints? Furthermore, I plan to test scheduling problems, which is necessary when tackeling resource management in an energy efficient way.
Backlink: Reddit