[Slony1-general] size of requests stored in sl_log_x

Mon Sep 17 01:46:29 PDT 2007


Jan Wieck wrote:
> On 9/16/2007 10:06 AM, Cyril SCETBON wrote:
>>
>> Jan Wieck wrote:
>>> On 9/16/2007 4:27 AM, Cyril SCETBON wrote:
>>>>
>>>> Filip Rembiałkowski wrote:
>>>>> Guys, this discussion went quite offtopic, but maybe you missed one
>>>>> fact that Cyril may no know about.
>>>>>
>>>>> In postgres, setting N columns to NULL is just N bits of physical 
>>>>> writes.
>>>>>
>>>>> So the overhead of
>>>>>
>>>>> INSERT INTO t1 ( id, data1, data2, data3, ..., data100 )
>>>>> VALUES( 12345, 'the only non-null data', NULL,NULL, ..., NULL )
>>>>>
>>>>> is not so terrible.
>>>>>   
>>>> Thanks filip, but I didn't talk about the performance when applying 
>>>> this request, but the fact that storing a longer request than a 
>>>> simple insert into t1(col1,col2) values(valcol1,valcol2) causes 
>>>> slony tables to grow faster, and needs more network bandwidth, 
>>>> that's all :-)
>>>
>>> And we can't do that because imagine you have a schema
>>>
>>>     create table t1 (
>>>         a int primary key,
>>>         b text default 'foo'
>>>     );
>>>
>>> and then do
>>>
>>>     insert into t1 (a, b) values (1, NULL);
>>>
>>> the resulting row, that make it into the masters table, will be
>>>
>>>     (1, NULL)
>>>
>>> Yeah, another one of those little things where MySQL behaves 
>>> different and where Postgres is right according to ANSI. If we omit 
>>> that NULL column now from the INSERT on the subscriber, the default 
>>> will be used there, resulting in
>>>
>>>     (1, 'foo')
>>>
>>> and we have the subscriber out of sync.
>>>
>>>
>>> Jan
>>>
>> The suggestion was to catch the request entered so here (a,b) (1, 
>> NULL) would be stored. but if user enter insert into t1(b) 
>> values('b') , then (b) ('b') would be stored, and as every subscribe 
>> has the same schema the default value would be used on them. 
>
> And what exactly do you capture as "request entered" if the original 
> query was "insert into foo (a, b, c) select x, y, z from temp_bar;" ?
>
> There certainly is some optimization possible. No doubt. But it isn't 
> achievable as cheap as you think. 
I agree.
> One would first have to agree that all nodes in the whole cluster have 
> to have tables containing the same columns. Then, the column names can 
> be mapped to short integers in a new slony configuration table. 
> Finally, the sl_log tables query field has to be split up into a short 
> integer array specifying the table column order in the second field, 
> containing all the values.
>
> The question is, are you willing to develop such a patch, 
I've got no experience with postgresql internals or slony to do it :-(
> or at least willing to sponsor development of such a patch?
I may ask this question to the firm where I'm working.
>
>
> Jan
>

-- 
Cyril SCETBON