Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14271

Assistance formatting and solving a very large system of equations problem

$
0
0

This is a real world problem that has been solved in the past via thousands of manual iterations.

Please forgive my inexperience with stackoverflow and with how to format my question. Dave, I believe correctly, rewrote my question in the comments and I will quote him below. I would also thank him if I could figure out how.

This is likelier to get a good response if you make it more focused, if you define terms before you use them, and if you clarify what are the inputs. E.g., "A data point is a container which has a positive integer called 'value', an attribute called 'A' which contains an integer, and attributes B, C, D, each of which contains integer-float pairs where the floats for each attribute sum to 1.0. My input is about 20k of these data points, and my goal is to find new values for the points, leaving everything else unchanged, which maximizes (new - old value)

I have a collection of roughly 20,000 datapoints(Xi). Each datapoint contains exactly one value greater than zero. Each datapoint also has 4 attributes. For attribute A there are 99 possible categories and the datapoint may only belong to one. The remaining three attributes may split the value across multiple categories of the attribute. For example: 80% of the value of Xi belongs to category 2 of attribute B while the remaining 20% belongs to category 5 of attribute B.

I also have the previous values of each data point for several years (one value per year).

PointPrev ValueNew ValueDiffAttr AAttr BAttr CAttr D
X1687241: 100%2: 80%3: 100%7: 90%
5: 20%9: 10%
X256,00066,00010,0007: 100%1: 50%3: 90%2: 100%
5: 50%6: 10%

I need to solve for the New Value column subject to the following constraints:

New Value must be positive

Sum of all (Xi) = Known Value

Sum of all (Xi) * percent allocated to category per attribute = Target Value

Sum of all Target Values per category = Known Value

Attr A

A1: Target Value

A2: Target Value

...

A99: Target Value

A1 + A2 +... A99 = Known Value

Attr B

B1: Target Value

B2: Target Value

...

B27: Target Value

B1 + B2 +... B27 = Known Value

Attr C

C1: Target Value

C2: Target Value

...

C18: Target Value

C1 + C2 +... C18 = Known Value

Attr D

D1: Target Value

D2: Target Value

...

D18: Target Value

D1 + D2 +... D8 = Known Value

Soft Constraints

The value in the Diff column should be positive

Smaller values should have more freedom to change than larger values

Ultimate goal

Seek to minimize the value in the Diff column by row. When looking at the percent change from the previous value try and evenly distribute the change rather than have it all attributed to a single row.

As stated previously this problem is solved once per year via manual iteration. We have developed a workflow to first populate a starting value by converting the previous value to a percent of total and then multiplying that by the new known value which is always larger than the previous grand total. We use a BI tool to view all of the total differences between the starting value and the target values at the same time. From there manual iterations in the form of subtract 10,000 from X34 and add 10,000 to X452 are added to a running list and the BI tool is updated to show the new status. That running list can be many thousands of lines. Also, by "solving" I mean that we can eventually generate a solution that fits all of the constraints but we are well aware that it a solution and not the best solution.

We've made several attempts to automate the iteration portion of the process via python with some degree of success. We've also talked with reps from Matlab who were confident that they could get close using fmincon and we still might pursue that as a solution, but I would like to investigate alternatives.

What I am requesting in this post is not necessarily a solution (although I would accept one) but links to resources of similar problems (where a change to one element results in changes to others). Or possibly assistance in more clearly defining the problem mathematically. I have looked into several optimization and genetic algorithms but nothing seems to match what I'm after.


Viewing all articles
Browse latest Browse all 14271

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>