**SQL Apprentice Question**

I am working

on a project that is related to the field of agriculture and that has

as an objective to find the "optimal values" of the operating

conditions that affect the outcome (the amount of meat produced i.e.

the weight) of an animal production (chicken broilers in my case). To

do so, I have to use historical data of previous productions as my

training dataset. The length a production cycle is typically around 44

days. For each production, a data acquisition system stores the

real-time and historical data of hundreds of parameters. These

parameters represent sensor measurements of all the operating

conditions (current temperature, set point temperature, humidity,

static pressure, etc...) and these are what I refer to as the inputs.

The operating costs and the production outcome are what I refer to as

outputs. The operating cost is indirectly computed from parameters

like water consumption, feed consumption, heater/cooling runtimes, and

lighting runtime; and the outcome of a production is defined by

parameters like animal mortality and conversion factor (amount of feed

in Lbs to produce 1Lb of meat). So the main objective of this project

is to find the set of "optimal daily values" (1value/day) for the

inputs that would minimize the operating costs and conversion ratio

outputs.

The biggest problem I am facing right now is the following: The

historical data that I have in the DB are time series for each measured

parameter. Some of these time series follow some kind of cyclic

pattern (e.g. daily water/feed consumption ...) while others follow an

increasing/decreasing trend (animal weight, total heater run time,

total water/feed consumption.....). My goal is to be able to come up

with a model that suggests a set of curves for the optimal daily values

throughout the length of the production cycle, one curve for each

measured input/output parameter. This model would allow the farmer to

closely monitor his production on a daily basis to make sure his

production parameters follow the "optimal curves" suggested by my

model. I have looked at ANN and I think it might be the solution to my

problem since it allows to model multiple input/outputs problems (Am I

wrong?), but I could not figure out a way to model the inputs/outputs

as time series (an array of values for each parameter). As far as I

know, all kinds of classifiers accept only single valued samples.

One approach would be to create one classifier/day (e.g. for day1:

extract a single value for each parameter and use these values as a

training sample and repeat this for all previous production to

construct the training set). The problem with this approach is that 44

or so classifiers will be constructed (hard to manage all of this) and

each of these resulting ANN will be some kind of "typical average"

of the training data but not necessarily the "optimal values"

leading to the best production outcome, if I am not mistaken.

Another approach would be to find a way to feed in the inputs and

outputs as time series (an array of 44 daily values for each

input/output parameter). In this case, there would be only one

resulting ANN and the training samples, would be a set of arrays for

each parameter, as opposed to single daily parameter values in the

first case. The problem is, I could not find any classifier that would

allow me to do that.

Another issue that I have is the amount of data. While a single

production cycle could represent 1-2GB of data, the length of the

production cycle (44 days) makes it difficult to have 100's of

production cycle historical data, as I could gather data for no more

than 7 full cycles/year. Fortunately, a farm can have many production

units (5-10 barns/site in big sites), so this makes it possible to have

40-70 cycles/yr. My question is: would this be enough to come up with

an acceptably accurate model or is it necessary to have hundreds of

samples?

**Celko Answers**

I seem to remember "Evolutionary Operation" -- EvOp -- from chemical

manufacturing. The basic idea is small adjustments in multiple factors

to get an optimal setting for a process. There was an assumption of a

local optimal point among the parameters, but a relatively small sample

is needed to adjust things.

## No comments:

Post a Comment