Incorrect result in compiler mode? #2189

bertram-gil · 2022-02-15T12:52:06Z

Hi guys,

Consider the following program:

.decl graph(from:number, to:number)
.decl eql(a: number, b: number)
.decl rel(a: number, b: number) btree_delete
.decl C(x:number) btree_delete
.decl D(x:number, y:number)
.decl E(x:number, y:number) btree_delete
.decl a__(a:number, b:number) 

.output a__ // Output node

graph(1, 2).
graph(1, 3).
graph(3, 4).
graph(2, 5).
graph(4, 6).
graph(5, 6).
graph(6, 7).

eql(x,x) :- eql(x,_); eql(_,x).
eql(x,y) :- eql(y,x).
eql(x,z) :- eql(x,y), eql(y,z).
rel(1, 2).
rel(1, 3).
eql(x, x) :- rel(x, _).
eql(2, 3).
rel(a, b) <= rel(c, d) :- eql(a, c), eql(b, d), a <= c, b <= d.

C(1).
D(1, 1).
C(x1) <= C(x2) :- D(x1, x2), x1 <= x2.

E(1,1).
E(1,2).
E(_, x1) <= E(_, x2) :- x1 <= x2.

a__(b, c) :- graph(g, b), !rel(b, d), C(c), E(d, a).

When I run the above program in compiler mode, I get the following result for the relation a__:

$ souffle_59260/src/souffle -w -c orig_rules.dl && cat a__.csv
2	1
3	1
4	1
5	1
6	1
7	1

However, if I add a subgoal E(d, e) in rule a__ to get the a__(b, c) :- graph(g, b), !rel(b, d), C(c), E(d, a), E(d, e)., the result of a__ should not change. But running the following program computes an empty result for a__.


.decl graph(from:number, to:number)
.decl eql(a: number, b: number)
.decl rel(a: number, b: number) btree_delete
.decl C(x:number) btree_delete
.decl D(x:number, y:number)
.decl E(x:number, y:number) btree_delete
.decl a__(a:number, b:number) 

.output a__ // Output node

graph(1, 2).
graph(1, 3).
graph(3, 4).
graph(2, 5).
graph(4, 6).
graph(5, 6).
graph(6, 7).

eql(x,x) :- eql(x,_); eql(_,x).
eql(x,y) :- eql(y,x).
eql(x,z) :- eql(x,y), eql(y,z).
rel(1, 2).
rel(1, 3).
eql(x, x) :- rel(x, _).
eql(2, 3).
rel(a, b) <= rel(c, d) :- eql(a, c), eql(b, d), a <= c, b <= d.

C(1).
D(1, 1).
C(x1) <= C(x2) :- D(x1, x2), x1 <= x2.

E(1,1).
E(1,2).
E(_, x1) <= E(_, x2) :- x1 <= x2.

a__(b, c) :- graph(g, b), !rel(b, d), C(c), E(d, a), E(d, e).

$ souffle_59260/src/souffle -w -c orig_rules.dl && cat a__.csv

FYI, Running both programs in interpreter mode returns:

I am using the latest commit: 592605c

The text was updated successfully, but these errors were encountered:

bertram-gil · 2022-03-02T13:59:03Z

Hi guys, is this a confirmed bug or am i doing something wrong?

b-scholz · 2022-03-04T03:34:41Z

Is this happening for the interpreter as well?

b-scholz · 2022-03-04T03:37:52Z

Ok - I can reproduce the problem. Let me study it.

b-scholz · 2022-03-04T03:41:45Z

Here is the output of the compiler with all its intermediate relations (NB: I renamed a__ to A):

---------------
D
x	y
===============
1	1
===============
---------------
C
x
===============
1
===============
---------------
E
x	y
===============
1	2
===============
---------------
eql
a	b
===============
1	1
2	2
2	3
3	2
3	3
===============
---------------
rel
a	b
===============
1	3
===============
---------------
A
a	b
===============
===============

and this is the output of the interpreter:

---------------
D
x	y
===============
1	1
===============
---------------
C
x
===============
1
===============
---------------
E
x	y
===============
1	2
===============
---------------
eql
a	b
===============
1	1
2	2
2	3
3	2
3	3
===============
---------------
rel
a	b
===============
1	3
===============
---------------
A
a	b
===============
2	1
3	1
4	1
5	1
6	1
7	1
===============

b-scholz · 2022-03-04T03:46:07Z

I believe I have found the culprit. We have an UNDEF in the last RAM Query:

QUERY
    IF (((NOT ISEMPTY(C)) AND (NOT ISEMPTY(E))) AND (NOT ISEMPTY(graph)))
     FOR t0 IN graph
      FOR t1 IN C
       FOR t2 IN E
        IF ((t2.0,UNDEF) IN E AND (NOT (t0.1,t2.0) IN rel))
         INSERT (t0.1, t1.0) INTO A
   END QUERY

b-scholz · 2022-03-04T03:51:50Z

If you rewrite the query to:

A(b, c) :- graph(_, b), !rel(b, d), C(c), E(d, _), E(d, _).

so that you don't have warnings while compiling it works.

The query code is translated as follows:

 QUERY
    IF (((NOT ISEMPTY(E)) AND (NOT ISEMPTY(graph))) AND (NOT ISEMPTY(C)))
     FOR t0 IN graph
      FOR t1 IN C
       IF EXISTS t2 IN E WHERE (NOT (t0.1,t2.0) IN rel)
        INSERT (t0.1, t1.0) INTO A
   END QUERY

b-scholz · 2022-03-04T08:02:16Z

The query changes depending on whether unnamed variables are used or not. The first version (with named names) produces

A(b,c) :- 
   graph(_,b),
   !rel(b,d),
   C(c),
   E(d,_),
   E(d,_).

The second version produces (with unnamed variables)

A(b,c) :- 
   graph(_,b),
   !rel(b,d),
   C(c),
   E(d,_).

when running the optimisation.

b-scholz · 2022-03-04T09:36:53Z

After more investigation, the range-check is failing in the C++ program for the condition ((t2.0,UNDEF) IN E.

The btree_delete data structure fails to deliver a non-empty range here for the check:

range<t_ind_1::iterator> lowerUpperRange_10(
            const t_tuple& lower, const t_tuple& upper, context& h) const {
        t_comparator_1 comparator;
        int cmp = comparator(lower, upper);
        if (cmp > 0) {
            return make_range(ind_1.end(), ind_1.end());
        }
        return make_range(
                ind_1.lower_bound(lower, h.hints_1_lower), ind_1.upper_bound(upper, h.hints_1_upper));
    }

for the query

!rel_4_E->lowerUpperRange_10(
                                                     Tuple<RamDomain, 2>{{ramBitCast(env2[0]),
                                                             ramBitCast<RamDomain>(MIN_RAM_SIGNED)}},
                                                     Tuple<RamDomain, 2>{{ramBitCast(env2[0]),
                                                             ramBitCast<RamDomain>(MAX_RAM_SIGNED)}},
                                                     READ_OP_CONTEXT(rel_4_E_op_ctxt))

b-scholz · 2022-03-04T09:57:26Z

Both lower/upper iterators return ind_1.end() although it should enumerate the only tuple in E.

b-scholz · 2022-03-04T10:06:35Z

The range search fails with a single tuple (1,2) in relation E.

b-scholz · 2022-03-04T10:07:02Z

The bounds are set correctly:

l:1 -2147483648
u:1 2147483647

b-scholz · 2022-03-04T10:20:27Z

The second index ind_1 is empty, and the tuple (1,2) is missing from there but index ind_0 is populated with tuple (1,2).

b-scholz · 2022-03-04T10:37:51Z

It seems that the subsumptive clause

E(_, x1) <= E(_, x2) :- x1 <= x2.

leaves the data structure in an inconsistent state.

b-scholz · 2022-03-04T10:42:10Z

Deleting the tuple (1,1) from the data structure destroys the index ind_1 (makes it empty).

b-scholz · 2022-03-04T20:17:45Z

Here is a small C++ program that shows that erase() is erroneous:

#include "souffle/CompiledSouffle.h"

using namespace souffle;

int main()
{
    using t_tuple = Tuple<RamDomain, 2>;
    struct t_comparator {
        int operator()(const t_tuple& a, const t_tuple& b) const {
            return (ramBitCast<RamSigned>(a[0]) < ramBitCast<RamSigned>(b[0]))   ? -1
                   : (ramBitCast<RamSigned>(a[0]) > ramBitCast<RamSigned>(b[0])) ? 1

 : (0);
        }
        bool less(const t_tuple& a, const t_tuple& b) const {
            return (ramBitCast<RamSigned>(a[0]) < ramBitCast<RamSigned>(b[0]));
        }
        bool equal(const t_tuple& a, const t_tuple& b) const {
            return (ramBitCast<RamSigned>(a[0]) == ramBitCast<RamSigned>(b[0]));
        }
    };
    using t_ind = btree_delete_multiset<t_tuple, t_comparator>;
    t_ind idx;

    t_tuple t1 = {1,1};
    idx.insert(t1);

    t_tuple t2 = {1,2};
    idx.insert(t2);

    std::cout << "Before erase:" << std::endl;
    for(const auto & t: idx) {
        std::cout << t[0] << " " << t[1] << std::endl;
    }

    idx.erase(t1);

    std::cout << "After erase:" << std::endl;
    for(const auto & t: idx) {
        std::cout << t[0] << " " << t[1] << std::endl;
    }

    return 0;
}

The example shows that after deleting tuple (1,1) the relation is empty instead of containing the tuple (1,2) only.

Here is the output:

Before erase:
1 1
1 2
After erase:

julienhenry · 2022-03-07T11:12:16Z

@b-scholz The reason why the erase operation deletes the two tuples is because t1 equals t2 given the provided t_comparator. In that case, the call to idx.erase(t1) erases all the tuples t such that t_comparator::equal(t1, t)

I'm not sure what should be the fix.

b-scholz · 2022-03-07T11:31:33Z

@julienhenry - that is a great observation. I did not look carefully. We have a partial comparator but we need a full one for erase.

b-scholz · 2022-03-07T11:35:43Z

If we have a btree_delete relation, we need to instruct the index analysis that all attributes need to be included even for secondary indices (the first one is always a complete index).

Interestingly, this works for the interpreter because we have always a complete order in the interpreter.

b-scholz · 2022-03-07T11:42:54Z

We do something similar for provenance:
// for provenance, all indices must be full so we use btree_set

b-scholz · 2022-03-09T09:04:55Z

@bertram-gil: It would be great to generate smaller test cases in the future. For example,

.decl G(from:number, to:number)
G(1, 1).

.decl R(a: number, b: number)
R(2,2).

.decl E(x:number, y:number) btree_delete
E(1,1).
E(1,2).
E(_, x1) <= E(_, x2) :- x1 <= x2.

.decl A(a:number)
A(b) :- G(g, b), !R(b, d), E(d, a), E(d, e).
.output A

exhibits the same behaviour as your example.

b-scholz · 2022-03-09T09:05:35Z

Perhaps you can use notions of delta debugging for Datalog.

Subsumption fix for issue #2189

b-scholz added the bug - identified Bugs with an identified cause label Mar 4, 2022

b-scholz mentioned this issue Mar 9, 2022

Subsumption fix for issue #2189 #2204

Merged

b-scholz closed this as completed Mar 9, 2022

SamArch27 added a commit that referenced this issue Mar 10, 2022

Merge pull request #2204 from b-scholz/subsumption_fix

fce85d9

Subsumption fix for issue #2189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect result in compiler mode? #2189

Incorrect result in compiler mode? #2189

bertram-gil commented Feb 15, 2022

bertram-gil commented Mar 2, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022 •

edited

Loading

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022 •

edited

Loading

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022 •

edited

Loading

julienhenry commented Mar 7, 2022 •

edited

Loading

b-scholz commented Mar 7, 2022

b-scholz commented Mar 7, 2022

b-scholz commented Mar 7, 2022

b-scholz commented Mar 9, 2022

b-scholz commented Mar 9, 2022

Incorrect result in compiler mode? #2189

Incorrect result in compiler mode? #2189

Comments

bertram-gil commented Feb 15, 2022

bertram-gil commented Mar 2, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022 • edited Loading

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022 • edited Loading

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022

b-scholz commented Mar 4, 2022 • edited Loading

julienhenry commented Mar 7, 2022 • edited Loading

b-scholz commented Mar 7, 2022

b-scholz commented Mar 7, 2022

b-scholz commented Mar 7, 2022

b-scholz commented Mar 9, 2022

b-scholz commented Mar 9, 2022

b-scholz commented Mar 4, 2022 •

edited

Loading

b-scholz commented Mar 4, 2022 •

edited

Loading

b-scholz commented Mar 4, 2022 •

edited

Loading

julienhenry commented Mar 7, 2022 •

edited

Loading