
how to do abstractionless with actual parametric compression
============================================================

The original "abstrationless" design is based on doing the refactorings based on
editing operations (copy & paste) to keep it manageable the amount of refactoring work that has to
be done behind the scenes. But it also makes the factoring level of the code inexact, and algorithms
quite complicated (a LOT of possible checks have to happen to make sure redundancy can not creep in).

Conceptually much simpler would be to see the whole problem as one of parametric compression, independent
of editing operations.

Compression is hard to make work, because simply expanding an entire code base to its flat form would not fit in memory.

But maybe this can be done as a compressed storage format.

Start with all source as tree nodes, with only sharing (thru a hash table), but not parametric abstraction.

This would still be too big, because even though the leaves are shared, variables (i.e. differences in the expanded version) end up at the leaves very often,
so the overal code size would still be half as big as a fully flat version, i.e. still too big.

Next you apply a bottom up 1-level function finder. Basically any node that has children which are used more
than once in a similar type node, is made into a function application. So say:

	1*10+1==2*10+1

is really

	b*a+b==2*a+b
	a = 10
	b = 1

where "a =" indicates sharing.
This now becomes:

	g(f(b))==g(f(2))
	f = X*10
	g = X+b
	b = 1

where X is a magical placeholder node, and f and g are 2 levels of simple functions that are found.
Now, this sofar has only increased the amount of nodes, so the next step would be to merge any function call node with a child function call node (f into g).

This is a pure gain if the occurances of f all are exactly inside g. We get:

	g(b)==g(2)
	g = X*10+b
	b = 1

Now assume I would type, by hand, some additional code:

	g(b)==g(2)&&3*10+1
	g = X*10+b
	b = 1

doing bottom up hashing test:

	g(b)==g(2)&&3*a+b
	g = X*a+b
	a = 10
	b = 1

finding functions:

	g(b)==g(2)&&f(3)+b
	g = f(X)+b
	f = X*10
	b = 1

	g(b)==g(2)&&h(f(3))
	g = h(f(X))
	h = X+b
	f = X*10
	b = 1

	g(b)==g(2)&&h(3)
	g = h(X)
	h = X*10+b
	b = 1

	h(b)==h(2)&&h(3)
	h = X*10+b
	b = 1

Which results in the same code after a while.
Now if instead the code would have been:

	g(b)==g(2)&&3*10+5
	g = X*10+b
	b = 1

We would get:

	g(b)==g(2)&&f(3)+5
	g = f(X)+b
	f = X*10
	b = 1

and it would end there, which is also fine.

Function call merging can also be done if there's more occurances of f than there are of g, by simply moving f inside g first:

	g(f(b))==g(f(2))
	f = X*10
	g = X+b
	b = 1

becomes:

	g(b)==g(2)
	f = X*10
	g = f(X)+b
	b = 1

Infact, this operation can ALWAYS be done first, since now if all f match g, then the above becomes a simple case of inlining f because there is only once reference.

The case that remains is if there's less f than g, e.g.

	1*10+1==2*10+1&&3/10+1

becomes:

	g(f(b))==g(f(2))&&g(3/a)
	g = X+b
	f = X*a
	a = 10
	b = 1

Now, we can't inline f. If the combination of f & g occurs a LOT together, then compression (trying to achieve the lowest amount of nodes) would say to
combine f & g anyway, and leave a seperaty copy for the 3rd use. This of course, introduces redundancy, and you would start to need heuristics
to determine exactly the ratio between the amount of occurrences, the size of duplicated code, etc. This makes things complicated, not in the least
also because now as edits are made, the balance may shift and the decisions made by these heuristics may need to be undone.

It is MUCH easier to stick to a purely non-redundant representation, as for those, the rules are very simple. It only has one big drawback,
that a single use of new code somewhere (in this case +1) can split up a widely used function into multiple parts instantly, which may hurt
readability. I don't think it will bloat up the code that much though, this often happens near the leaves.

Though, the solution therefore may be to use the non-redundant representation as the internal, storage representation, since thats easy to
maintain and fully deterministic, and then use compression heuristics on top of that to show the code to the user. So the user can be shown g, and g', which has
f inlined. Basically, whenever displaying a function from a particular POV, if that POV does happen to have all f's inside, then it can
show f inlined, though maybe it needs to be graphically marked somehow.


so a code base can be maintained non-redundant by these simple ops, applied recursively to any modified/new code, in this order:

1. check a tree for duplicates using the hashtable, and if so, make it a shared node
2. if a node has a shared child, whose other occurences have the same parent node type, make that node (and the other occurrences)
   into function calls to a shared function with placeholders for all non-corresponding children 
3. if any arg to a function call g is always the function call f, move the call to f inside g
4. unshare nodes whose refc drops to 1

That is really simple! 

==

One extra issue is that in the above, a placeholder has a unique identity within a function. So X+1 can only be shared with another
X+1 if its within the same function. This is pretty easy to do. This makes that shared node "local" to the function, but that doesn't
really matter at all for the algorithm.

the current notation can't differentiate between functions and shared vars with placeholders, so lets use more verbose notation:
example:

    g(2) == g(3)
    g(X) = f+f      or f(X)+f(X)
    f = X*1         or f(X) = X+1

makes

    2*1+2*1 == 3*1+3*1
    
which normally would have produced:

    a+a == b+b
    a = f(2)
    b = f(3)
    f(X) = X*1

sounds like there needs to be a rule 2b:

2b. if a node has a shared child, and a second occurrence of that child appears inside another child, abstract out that parameter.

Which after a while produces the initial code above (the verbose version with no free vars).

So that means we do not really need to care about free vars. We can simply let the internal representation
be the verbose version above, and then display them non-verbose (as if f is a local function to g).
That only works if f's arg is directly a g placeholder of course.
Even if f is used in other places, it can be displayed as local depending on code focus.

Another example (with a shared item deeper down):

    1+1*3 == 2+2*3

becomes

    1+f(1) == 2+f(2)
    f = X*3

this SHOULD become

    g(1)==g(2)
    g = X+f(X)
    f = X*3

-> same with 2b as above, will produce the above.

What if the 2 occurrences of the shared child both are not direct children, like:

    g(1)==g(2)
    g = f(X)+h(X)
    f = X*3
    h = X*4

    1*3+1*4 == 2*3+2*4
       
    f(1)+h(1) == f(2)+h(2)
    f = X*3
    h = X*4

At this point, if it applied rule 2b, you'd get the intended effect.
How does it now to make g cover both the 1 and the 2 case? -> we also need to hash function definitions.

How high up does it search for a common parent?

It seems like 2b is getting us into heuristic territory, because it potentially has to pick from multiple
conflicting abstaction choices, but it seems that is no issue, e.g.:

    f(1,2)+h(1,2)
    g = f(X,2)+h(X,2)
    g = f(X,Y)+h(X,Y)

    f(1,h(1,2),2)
    g = f(X,h(X,2),2)
    g = f(X,h(X,Y),Y)

so in theory, 2b could bring things all the way up to the root
-> no it doesn't, it will first become an arg of g, at which point it COULD
be raised up again if whatever function holds the call to g contains another
reference to the same shared item. But often it won't. 

So 2b is purely a question of: are there 2 or more uses of the same shared item
inside 1 function body?

Do 2a and 2b ever clash? assuming f & h are used in unrelated places:

    f = 1+g(1,X)
    h = 1+i(1,X)

If I do 2a first:

    f = j(g(1,X))
    h = j(i(1,X))
    j = 1+X

If I do 2b first:

    f = Y+g(Y,X)
    h = Y+i(Y,X)

How to decide between the two? Neither one is redundant, technically.
In this case, 2b appears preferrable, but not sure if thats always the case.
2b appears preferrable even if 2a encapsulates all uses of 1, if it didn't, it
be even less attractive. 2a could be more attractive if 1 appeared multiple times
such as for:

    f = k(1,1,g(1,X))
    h = k(1,1,i(1,X))

Hmm, even there 2b would be at least as good.
however, the ultimate solution for both would be:

    f = j(g(X))
    h = j(i(X))
    j = 1+X(1)

i.e. if you had "currying". Currying is a natural fit for 2a, since it suffers from
the fact that it abstracts a parameter, yet it can't reach all occurrences of it.
Currying would have to be a specific feature since it requires free vars, so can't
be rewritten with regular functions.
That is a pretty high price to pay, even when 2b may STILL be better.
And this solution only worked because f and h were so analogous, if we had the better
example instead:

    f(3)+h
    f = 1+g(1,X)
    h = 1+2

If I do 2a first:

    f(3)+h
    f = j(g(1,X))
    h = j(2)
    j = 1+X

If I do 2b first:

    f(3,1)+h
    f = Y+g(Y,X)
    h = 1+2

Actually its more like this, because we can't always inline f:

    f(3)+h
    f = j(X, 1)
    j = Y+g(Y,X)
    h = 1+2

Here, currying doesn't even work. Now 2a and 2b are more equal. In both cases, you lose
an opportunity for abstraction for the other.

interestingly, 2b produces single refc functions, whereas 2a doesn't. So when can only inline
if args are all used once.

Even if not resolved it doesn't seem a big issue however, but it
seems that even at this level of compression, heuristics are inescapable. Maybe doing 2a
first is still a bit "simpler".

We could resolve the above of course if we made 2b such that if it can't find a way
to resolve the sharing locally, it DOES go all the way to the root. In this case:

    r(1)
    r = f(3,X)+h(X)
    f = j(g(Y,X),X)
    h = j(2,X)
    j = Y+X

But then you'd have to inline j:

    r(1)
    r = f(3,X)+h(X)
    f = X+g(Y,X)
    h = X+2

This DOES make all shared blocks go away! Only functions and function args remain this way.
Of course, this can produce enormous amounts of args further up the tree. But this can be resolved
by showing any arg that does not depend on variables (and single use functions) as a where-clause,
and omit any args that come directly from parent args and just have them as free vars.

this may be the ultimate solution.. it actually simplifies stuff. 

This could actually help with the problem that a number like 1 is shared thruout the entire program.
Infact, all common constants will be args to the root function!
but now at least you can do scoped changes of 1 to a different number, and only affect that subtree!

btw there needs to be a rule 5 about inlining:

5. inline single use functions, if all args are used once in the body, and inline many use functions that map directly to their body


Multiple arg functions: 2a produces them easily on n-ary functions where only 1 arg corresponds, and 2b when coming across free vars when it creates new functions.


Must define hashing/comparison algo. X must be the same if we compare 2 functions, yet it must be different if we compare subtrees.
I guess the comparison simply walks down the tree, and notes what variables are "bound". Only bound variables return equality.

Can we define it such that X+Y and Y+X are deemed equivalent? I don't see why not.


====

The system above appears to want either normal order eval, or referential transparency.

The system to combine eager & normal order in the edit based system from before could work, but is a bit clumsy,
since you don't want to give the user control over evaluation on a function level, preferably on an expression level.

So have {}, which can contain 0 or more exps. The whole thing is evaluated normal order, i.e. it is a block, so
it can freely be refactored by the system.

If a child of {} that has side effects gets shared, it automatically gets {} added.

A side effecting exp that does not appear directly inside {}, but gets shared is guaranteed to be evaluated exactly once.

so:

	write(read())

copy paste the read():

	write(a+a)
	a = read()

this means only 1 read. Change it to:

	write(a+a)
	a = { read() }

to have 2 reads.
Now, if it became:

	write(a+a)
	a = read()+1

	write(a(1)+a(2))
	a = read()+X

This would normally force multiple reads, so:

	write(a(1)+a(2))
	a = b+X
	b = read()

(side effect abstraction rule).

Of course, the user has to be specific when he adds code like "+1", since:

	a = { read() }+1

still gets evaluated once, so the {} have to be kept at the top of shared exp:

	a = { read()+1 }

Now, if a gets refactored because all its contexts have a +2, you have to move
the {} to its new parent automatically to preserve meaning:

	b = { read()+1+2 }

But now if read() gets shared again, it needs to receive {} again? and b can lose
the curlies, though any shared exp that relies on a normal order shared exp, is
really automatically normal order itself! So this looks like it may be easier
to just keep track of on the level of shared nodes, like before.


- no recursion? how to do higher order functions/loops without vars?
- sharing of single numbers?



==

functional side effects by auto plumbing a "world" object?

Lets ignore the multiple return values read() will give us, and look at write():

    W' = write(W, X)

so this will guaranteed write in sequence:

    main = write(write(W, 1), 2)
    
so to read two numbers in sequence, add them, then write them out:

    main = h(f(read(W)))
    f = g(read(first(X)), second(X))
    g = [first(X), second(X)+Y]
    h = write(first(X), second(X))

Which is terrible. Using PM sugar:

    main(W) = h(f(read(W)))
    f([W,Y]) = g(read(W),Y)
    g([W,Y],Z) = [W,Y+Z]
    h([W,Y]) = write(W,Y)

Functions like g hint at what monads are good for.
Using where clauses:

    main(W) = write(W',Y)
        where [W',Y] = g(read(W''),Y')
            where [W'',Y'] = read(W)
    g([W,Y],Z) = [W,Y+Z]

    main(W) = write(W',Y)
        where [W',Y] = [W''',Y''+Z]
                       where [W''',Y''] = read(W'')
                             Z = Y'
            where [W'',Y'] = read(W)

    main(W) = write(W',Y)
        where [W',Y] = [W''',Y''+Z]
            where [W''',Y''] = read(W'')
                  Z = Y'
                where [W'',Y'] = read(W)


    main(W) = write(W'',X+Y)
        where [W'',Y] = read(W')
              [W' ,X] = read(W)

Better, but still not great. And how to extend this with another read() that automatically gets linked in to the chain
is not obvious. And even if thats the case, we don't want the code literred with endless copies of W.
EVERY function it goes thru has to have these PMs, that's not acceptable. 

If you would somehow hide the W's...


===

Hmm.. yet another idea. What if you had a seperate read operation, and a function for accessing the last read values?
If you combined that with {} forcing normal order of any side effecting related functions, you could have:

    f(read(), pop())
    f = { X; X; Y*Y }       -- 2 reads
    f = { X; dup; Y*Y }     -- 1 read
    f = { X; top()*pop() }  -- 1 read







TODO: how to do loops? maps & filter is easy, fold and general loops are not (see array programming for abstractionless.txt)

    f(W,[1..10])
    f = FOLD write(write(write(`W,X)," = "),fac(X))






=================================================

Side effects revisited:

The entire language is normal order by default.
Pure expressions will be evaluated eagerly whenever they don't break normal order semantics (don't have side effecting sub expressions and terminate).

A cell expression makes sure its child gets evaluated first time, and result reused (lazy evaluation).


to make it look more familiar, a side effecting operation is always surrounded by {} to show it is delayed.
eager expressions are not, since you don't care.
Lets denote cells with <> for now

So, by default:

    f({ pop(S) })
    f(X) = X+X

pops twice.
if a refactoring adds +1 to pop, that doesn't change a thing, since {} are the default anyway, and only make sense on args. So
it becomes, without any particular work:

    f({ pop(S)+1 })

Now if the user wanted a cell, you'd get:

    f(< pop(S) >)

Cells also only make sense on args. Now if you'd get a refactoring that adds +1, you'd want the +1 inside the cell, since there is no reason
not to... though it really doesn't matter. If it was +sideeffect, then maybe you would not want that inside the cell?

    f(< pop(S) >)
    f(X) = (X+1)*X

    f(< pop(S) >)
    f(X) = (X+1)*(X+1)          // user edit
    
    f(< pop(S) >)
    f(X) = g(X)*g(X)            // 2a
    g(X) = X+1

    f(< pop(S) >)
    f(X) = a*a where a = g(X)   // 1
    g(X) = X+1

    f(< pop(S) >)
    f(X) = h(g(X))
    g(X) = X+1
    h(X) = X*X                  // 2b

    h({ g(< pop(S) >) })        // 5, when inlining, do we want to move <> to the outside?
    g(X) = X+1
    h(X) = X*X

    h({ < pop(S) >+1 })         // 5
    h(X) = X*X

probably not, since if it had been:

    f(< pop(S) >, { pop(T) })   // pops once, pops twice
    f(X,Y) = h(g(X), Y)
    g(X) = X+1
    h(X, Y) = X*X+Y*Y

    h({ g(< pop(S) >) }, { pop(T) })    // 5
    g(X) = X+1
    h(X, Y) = X*X+Y*Y

not a good example, since we can deal with it per arg. What about:

    f(< pop(S) >,T)
    f(X,Y) = h(g(X,Y))
    g(X,Y) = X+pop(Y)               // executed twice
    h(X) = X*X

This already shows normal order may be confusing, as its hard to tell when something will be executed.
{} needs to be perculated up, e.g.:

    f(< pop(S) >,T)
    f(X,Y) = h({ g(X,Y) })
    g(X,Y) = X+pop(Y)               // executed twice
    h(X) = X*X

Which at least warns the user when he follows the sequence of calls. But lets assume we WANT it to be executed twice:

    h({ g(< pop(S) >,T) })
    g(X,Y) = X+pop(Y)               // executed twice
    h(X) = X*X

This now has the correct behavior. Clearly, <> can't be moved outwards at this point, as it would cause the deeper pop just once.

    h({ < pop(S) >+pop(T) })
    h(X) = X*X

So <> has to be possible on any tree (that contains a side effect).

Now the original issue, of having

    h({ < pop(S)+1 > })

if 1 is abstracted out, you'd get

    h({ k(S,1) })
    k(S,X) = < pop(S)+X >

But what does that even mean? in pure normal order, it would expanded out first, so there'd be two copies.

normal order for return values????


What about instead, for each *node*, by default:
- pure -> eager
- unpure no retval -> normal
- unpure retval -> lazy
The user can switch these on a node, and the functional abstraction merely shows these graphically

In this case, return values drive the evaluation, so in a sense anything directly under a return value
is "eager" (i.e., if it gets to evaluating the function, there are no more delays).

    f({ g({ pop()+1 }) })

if pop() is lazy, then +1 must also be lazy for the pop to keep being done in the right order if it was
abstracted out. So lazyness perculates up, IF it sits inside an arg (if it sits directly under a retval,
its irrelevant). g must now also be lazy. So f drives the evaluation, towards g, then towards the pop().

The many {} may look confusing, but actually, in MANY cases they can be left out.
- if there's only one side effect happening in g in anyway
- if the other side effect inside g happens after pop anyway
- even if there are multiple side effects, even as args, if they are in the right order (args should be in order
  of evaluation anyway... even if this is not always detectable for sure with lazyness, it is in many cases).
  -> NOTE: system must have auto arg ordering system.
In essense, this is what + is doing already, it should be { pop() }+1, and then be lazy itself, but omits the {}.
The only time we need {} is when the body may do side effects before its last side effecting arg placeholder (!)
So {} purely become an evaluation order warning! wow. (and still have consequences in implementation efficiency too, but that's besides the point almost).
And this system is purely compositional, i.e. wether g needs {} or not has no effect on wether {} needs em.

The original problem we had of pop()+1 becoming a function (pop()+X) isn't really an issue at all here.
In the current system, sharing pop()+1 means it becomes an arg. This should probably be shown as a local def
whenever there is a scope. If the 1 is then change in this scope, and it becomes a function call, the uses of
that function will be lazy too, so the amount of evaluations stays 1:

g(X) = pop()+X
f(X) = g(1)*g(2)

Hmm, but the uses of g are not shared? < g(1) > * < g(2) > doesn't make much sense either?

Also, this means most lazy functions by default just have a single invocation across the entire
program. We can make em normal order, but this doesn't allow sharing with only particular uses?

Sounds like we need a way to tie together syntactically just a few uses...
Btw this is an issue for data too, we cannot identify a stack purely by its constructor if we want
to use 2 of them thruout... never mind numbers.

We can do this scoped either, as uses may be mixed, i.e. pop()*pop()+pop()/pop() the first and the
last occurrence may be required to be the same, as are the middle two.

This whole issue was easier in the editing based system as there identity was formed by sharing
indicated by the user.

So maybe default for unpure retval should be normal order too with sharing indicated by the user???

This however would make the sharing non-syntactical. Either the side effecting function itself, or
some wrapper bracing (say []) would have to invisibly store an id. By default either all ids can
be the same, or generated different, with operations to set an id to be the same.
But that really gets rather ugly, a syntactical solution would be better.

To some extend the aforementioned stack idea can work:

    pop(); pop(); second()*first()+first()/second()

can make it such that no popping ever is required, the stack automatically only keeps the last N,
which makes dups etc unnecessary. 

if first and second are parameterized, then they'll need {}, as they are side effects!

Though still, first & second are now pseudo variable names.

Also, it doesn't really solve the identity problem for global data structures, as stacks will be too
hard to dig thru over larger code spans.

To solve that, we could adopt some form of tuple spaces. That way, you can "cheat" to have global
"variables" by using data structures instead. But it is not so much cheating, as they are really
data structures, and serve other purposes too (there can be many copies), and can help with
concurrency too.

Traditionally a tuple space is a queue, which is still needed for all things were order is important.
but for other things (like side effects), stack like behavior can be more appropriate. So maybe 
you can simply have 2 different out operations (or 2 ins... no, outs probably better, its the producer
that has more insight into usage than the consumer).

A tuple space can simply duplicate a stack too when stuff that gets put in is not tagged with a tuple
at all. So pop() can put an item at the top of the tuple space, at which point functions like first/second
can be used again, but now you'd need read vs consume versions of each.

So it can be seen simply as a stack, where you can read/consume by type, and you can push at the bottom too.

Maybe a way where for each type you can set the max number of elements? You can set them to 1, which would
be like a singleton, and make "overwriting" easier by not first having to pop the old value. Or set it to
10 or so for things like read side effects, or unlimited where that makes sense.
Builtin types could all have limits, so if you want to preserve a value for longer, simply put it back in
wrapped in a custom tuple.
 
    first(T)    // read from top-0
    second(T)   // read from top-1
    third(T)    // read from top-2
    
    pop(T)      // consume top
    
    push(V)     // create on top
    queue(V)    // create on bottom

The last 2 are the ONLY "void" functions needed in the entire language? We can of course make
them functions too by making them:

    push(V,E)   // create on top, then eval E
    queue(V,E)  // create on bottom, the eval E

leaving no further use for {}. The only point to still have "statements" would be as syntactic sugar for a
whole bunch of them strung together. Maybe instead can have a V -> E syntax that naturally lends itself
to chaining... a "pushthen" operator.

Could stacks help with the loop problem too? Yes it would seem so, you can simply do you major iterative thing
with a fold/map/filter, and any remaining items could be placed on the stack. Wouldn't be very pretty, and it
would be best to make as much possible with fold/map/filter, but at least it makes it _possible_ which wasn't
the case before. Can we do tree recursion?

    qsort:
    
    rec if(L==nil) nil
        else recurse(filter T<=P) ++ [P] ++ recurse(filter T>P)
            where P = head L
                  T = tail L

Does not appear to be a problem at all. Would not work with mutual recursion, but that's not totally important now.

pattern matching? seems selectors would fit this paradigm much better.

Hmm.. not sure if limiting the stack is such a good idea. It is not just about being able to nest things in
terms of having lots on there, but also nesting in terms of keeping state local. If functions always
leave their shit on the stack, then it will be harder working with them, since an implementation change
could leave more on the stack, and thus change the situation for the caller. Though, since this feature
is only used for side effects and not general computation, it could be ok. Then again, may need temp
values for recursion, but you can always use custom data types for that

operators for all of this?

V -> E    // pushthen
V => E    // queuethen

T?      // first
T??     // second



the vast majority of OS calls will become data structures... is that ok? In fact, there will be very few
pure "builtin functions", these can all be given an operator syntax, so the the traditional function
syntax can be reserved for constructors? We'd have to define all keyworded operators at the same
precedence level to avoid the usual confusion




=============
auto naming system

Ideally, like everything else, placeholder names should NOT be tied to a function, but depend on nodes.
First of all, we can probably do a pretty good job with auto names. We can collect information about
a placeholder from two sources: its set of callers, and its use in the body. This often results at
least in a type which can be part of the name. If we have records/tuples, those can have names which
help placeholders too. Or any nodes can be assigned names which perculate up to placeholders that may
use them. Can make a good study of what makes a name in human code to find good tricks. Would be
great if the naming system became so self sufficient that we don't need manual placeholder naming.
Manual naming is painful even in normal languages.

