Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result in compiler mode? #2189

Closed
bertram-gil opened this issue Feb 15, 2022 · 21 comments
Closed

Incorrect result in compiler mode? #2189

bertram-gil opened this issue Feb 15, 2022 · 21 comments
Labels
bug - identified Bugs with an identified cause

Comments

@bertram-gil
Copy link

Hi guys,

Consider the following program:

.decl graph(from:number, to:number)
.decl eql(a: number, b: number)
.decl rel(a: number, b: number) btree_delete
.decl C(x:number) btree_delete
.decl D(x:number, y:number)
.decl E(x:number, y:number) btree_delete
.decl a__(a:number, b:number) 

.output a__ // Output node

graph(1, 2).
graph(1, 3).
graph(3, 4).
graph(2, 5).
graph(4, 6).
graph(5, 6).
graph(6, 7).

eql(x,x) :- eql(x,_); eql(_,x).
eql(x,y) :- eql(y,x).
eql(x,z) :- eql(x,y), eql(y,z).
rel(1, 2).
rel(1, 3).
eql(x, x) :- rel(x, _).
eql(2, 3).
rel(a, b) <= rel(c, d) :- eql(a, c), eql(b, d), a <= c, b <= d.

C(1).
D(1, 1).
C(x1) <= C(x2) :- D(x1, x2), x1 <= x2.

E(1,1).
E(1,2).
E(_, x1) <= E(_, x2) :- x1 <= x2.

a__(b, c) :- graph(g, b), !rel(b, d), C(c), E(d, a).

When I run the above program in compiler mode, I get the following result for the relation a__:

$ souffle_59260/src/souffle -w -c orig_rules.dl && cat a__.csv
2	1
3	1
4	1
5	1
6	1
7	1

However, if I add a subgoal E(d, e) in rule a__ to get the a__(b, c) :- graph(g, b), !rel(b, d), C(c), E(d, a), E(d, e)., the result of a__ should not change. But running the following program computes an empty result for a__.


.decl graph(from:number, to:number)
.decl eql(a: number, b: number)
.decl rel(a: number, b: number) btree_delete
.decl C(x:number) btree_delete
.decl D(x:number, y:number)
.decl E(x:number, y:number) btree_delete
.decl a__(a:number, b:number) 

.output a__ // Output node

graph(1, 2).
graph(1, 3).
graph(3, 4).
graph(2, 5).
graph(4, 6).
graph(5, 6).
graph(6, 7).

eql(x,x) :- eql(x,_); eql(_,x).
eql(x,y) :- eql(y,x).
eql(x,z) :- eql(x,y), eql(y,z).
rel(1, 2).
rel(1, 3).
eql(x, x) :- rel(x, _).
eql(2, 3).
rel(a, b) <= rel(c, d) :- eql(a, c), eql(b, d), a <= c, b <= d.

C(1).
D(1, 1).
C(x1) <= C(x2) :- D(x1, x2), x1 <= x2.

E(1,1).
E(1,2).
E(_, x1) <= E(_, x2) :- x1 <= x2.

a__(b, c) :- graph(g, b), !rel(b, d), C(c), E(d, a), E(d, e).
$ souffle_59260/src/souffle -w -c orig_rules.dl && cat a__.csv

FYI, Running both programs in interpreter mode returns:

2	1
3	1
4	1
5	1
6	1
7	1

I am using the latest commit: 592605c

@bertram-gil
Copy link
Author

Hi guys, is this a confirmed bug or am i doing something wrong?

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

Is this happening for the interpreter as well?

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

Ok - I can reproduce the problem. Let me study it.

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

Here is the output of the compiler with all its intermediate relations (NB: I renamed a__ to A):

---------------
D
x	y
===============
1	1
===============
---------------
C
x
===============
1
===============
---------------
E
x	y
===============
1	2
===============
---------------
eql
a	b
===============
1	1
2	2
2	3
3	2
3	3
===============
---------------
rel
a	b
===============
1	3
===============
---------------
A
a	b
===============
===============

and this is the output of the interpreter:

---------------
D
x	y
===============
1	1
===============
---------------
C
x
===============
1
===============
---------------
E
x	y
===============
1	2
===============
---------------
eql
a	b
===============
1	1
2	2
2	3
3	2
3	3
===============
---------------
rel
a	b
===============
1	3
===============
---------------
A
a	b
===============
2	1
3	1
4	1
5	1
6	1
7	1
===============

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

I believe I have found the culprit. We have an UNDEF in the last RAM Query:

QUERY
    IF (((NOT ISEMPTY(C)) AND (NOT ISEMPTY(E))) AND (NOT ISEMPTY(graph)))
     FOR t0 IN graph
      FOR t1 IN C
       FOR t2 IN E
        IF ((t2.0,UNDEF) IN E AND (NOT (t0.1,t2.0) IN rel))
         INSERT (t0.1, t1.0) INTO A
   END QUERY

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

If you rewrite the query to:

A(b, c) :- graph(_, b), !rel(b, d), C(c), E(d, _), E(d, _).

so that you don't have warnings while compiling it works.

The query code is translated as follows:

 QUERY
    IF (((NOT ISEMPTY(E)) AND (NOT ISEMPTY(graph))) AND (NOT ISEMPTY(C)))
     FOR t0 IN graph
      FOR t1 IN C
       IF EXISTS t2 IN E WHERE (NOT (t0.1,t2.0) IN rel)
        INSERT (t0.1, t1.0) INTO A
   END QUERY

@b-scholz b-scholz added the bug - identified Bugs with an identified cause label Mar 4, 2022
@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

The query changes depending on whether unnamed variables are used or not. The first version (with named names) produces

A(b,c) :- 
   graph(_,b),
   !rel(b,d),
   C(c),
   E(d,_),
   E(d,_).

The second version produces (with unnamed variables)

A(b,c) :- 
   graph(_,b),
   !rel(b,d),
   C(c),
   E(d,_).

when running the optimisation.

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

After more investigation, the range-check is failing in the C++ program for the condition ((t2.0,UNDEF) IN E.

The btree_delete data structure fails to deliver a non-empty range here for the check:

range<t_ind_1::iterator> lowerUpperRange_10(
            const t_tuple& lower, const t_tuple& upper, context& h) const {
        t_comparator_1 comparator;
        int cmp = comparator(lower, upper);
        if (cmp > 0) {
            return make_range(ind_1.end(), ind_1.end());
        }
        return make_range(
                ind_1.lower_bound(lower, h.hints_1_lower), ind_1.upper_bound(upper, h.hints_1_upper));
    }

for the query

!rel_4_E->lowerUpperRange_10(
                                                     Tuple<RamDomain, 2>{{ramBitCast(env2[0]),
                                                             ramBitCast<RamDomain>(MIN_RAM_SIGNED)}},
                                                     Tuple<RamDomain, 2>{{ramBitCast(env2[0]),
                                                             ramBitCast<RamDomain>(MAX_RAM_SIGNED)}},
                                                     READ_OP_CONTEXT(rel_4_E_op_ctxt))

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

Both lower/upper iterators return ind_1.end() although it should enumerate the only tuple in E.

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

The range search fails with a single tuple (1,2) in relation E.

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

The bounds are set correctly:

l:1 -2147483648
u:1 2147483647

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

The second index ind_1 is empty, and the tuple (1,2) is missing from there but index ind_0 is populated with tuple (1,2).

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

It seems that the subsumptive clause

E(_, x1) <= E(_, x2) :- x1 <= x2.

leaves the data structure in an inconsistent state.

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

Deleting the tuple (1,1) from the data structure destroys the index ind_1 (makes it empty).

@b-scholz
Copy link
Member

b-scholz commented Mar 4, 2022

Here is a small C++ program that shows that erase() is erroneous:

#include "souffle/CompiledSouffle.h"

using namespace souffle;

int main()
{
    using t_tuple = Tuple<RamDomain, 2>;
    struct t_comparator {
        int operator()(const t_tuple& a, const t_tuple& b) const {
            return (ramBitCast<RamSigned>(a[0]) < ramBitCast<RamSigned>(b[0]))   ? -1
                   : (ramBitCast<RamSigned>(a[0]) > ramBitCast<RamSigned>(b[0])) ? 1

 : (0);
        }
        bool less(const t_tuple& a, const t_tuple& b) const {
            return (ramBitCast<RamSigned>(a[0]) < ramBitCast<RamSigned>(b[0]));
        }
        bool equal(const t_tuple& a, const t_tuple& b) const {
            return (ramBitCast<RamSigned>(a[0]) == ramBitCast<RamSigned>(b[0]));
        }
    };
    using t_ind = btree_delete_multiset<t_tuple, t_comparator>;
    t_ind idx;

    t_tuple t1 = {1,1};
    idx.insert(t1);

    t_tuple t2 = {1,2};
    idx.insert(t2);

    std::cout << "Before erase:" << std::endl;
    for(const auto & t: idx) {
        std::cout << t[0] << " " << t[1] << std::endl;
    }

    idx.erase(t1);

    std::cout << "After erase:" << std::endl;
    for(const auto & t: idx) {
        std::cout << t[0] << " " << t[1] << std::endl;
    }

    return 0;
}

The example shows that after deleting tuple (1,1) the relation is empty instead of containing the tuple (1,2) only.

Here is the output:

Before erase:
1 1
1 2
After erase:

@julienhenry
Copy link
Member

julienhenry commented Mar 7, 2022

@b-scholz The reason why the erase operation deletes the two tuples is because t1 equals t2 given the provided t_comparator. In that case, the call to idx.erase(t1) erases all the tuples t such that t_comparator::equal(t1, t)

I'm not sure what should be the fix.

@b-scholz
Copy link
Member

b-scholz commented Mar 7, 2022

@julienhenry - that is a great observation. I did not look carefully. We have a partial comparator but we need a full one for erase.

@b-scholz
Copy link
Member

b-scholz commented Mar 7, 2022

If we have a btree_delete relation, we need to instruct the index analysis that all attributes need to be included even for secondary indices (the first one is always a complete index).

Interestingly, this works for the interpreter because we have always a complete order in the interpreter.

@b-scholz
Copy link
Member

b-scholz commented Mar 7, 2022

We do something similar for provenance:
// for provenance, all indices must be full so we use btree_set

@b-scholz
Copy link
Member

b-scholz commented Mar 9, 2022

@bertram-gil: It would be great to generate smaller test cases in the future. For example,

.decl G(from:number, to:number)
G(1, 1).

.decl R(a: number, b: number)
R(2,2).

.decl E(x:number, y:number) btree_delete
E(1,1).
E(1,2).
E(_, x1) <= E(_, x2) :- x1 <= x2.

.decl A(a:number)
A(b) :- G(g, b), !R(b, d), E(d, a), E(d, e).
.output A 

exhibits the same behaviour as your example.

@b-scholz
Copy link
Member

b-scholz commented Mar 9, 2022

Perhaps you can use notions of delta debugging for Datalog.

@b-scholz b-scholz closed this as completed Mar 9, 2022
SamArch27 added a commit that referenced this issue Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug - identified Bugs with an identified cause
Projects
None yet
Development

No branches or pull requests

3 participants