数据库内核月报 - 2018 / 07

PgSQL · 应用案例 · PostgreSQL flashback(闪回) 功能实现与介绍

背景

闪回的需求往往是救命的需求,因为通常情况下数据库正常运行是不需要闪回的,往往是出现了误操作,被攻击,被注入后,数据库的数据被删除或恶意纂改并且纂改的事务已提交,也就是说纂改已经被持久化了。

这种情况下需要闪回来救命,回到被破坏前的状态。

闪回的目标分为两种:

DML闪回和DDL闪回。

DML闪回指对INSET, UPDATE, DELETE操作的闪回。DDL闪回指DROP, TRUNCATE操作的闪回。

闪回的实现分两种:

1、物理回退,相当于使用物理备份和归档进行时间点恢复,全库恢复到误操作前的状态。

(可以新建一个库用于恢复,恢复到目标时间点,恢复后,将误操作前的数据导出来,再导入线上数据库。)

2、在当前库回退,在当前库,将误操作影响的数据找出来。

闪回的手段:

1、物理回退,PG内核已支持时间点恢复,只要有误操作前的全量备份和所有归档即可。

2、当前库回退,使用HOOK,可以实现DROP和TRUNCATE操作的回收站功能。

3、使用延迟垃圾回收、脏读、行头事务号、事务提交日志,可以实现DML操作的闪回。

以前写的一些闪回方案:

《PostgreSQL 闪回 - flash back query emulate by trigger》

《PostgreSQL Oracle 兼容性之 - 事件触发器实现类似Oracle的回收站功能》

《PostgreSQL 回收站功能 - 基于HOOK的recycle bin pgtrashcan》

《PostgreSQL 最佳实践 - 任意时间点恢复源码分析》

《PostgreSQL 最佳实践 - 在线增量备份与任意时间点恢复》

《阿里云ApsaraDB RDS for PostgreSQL 最佳实践 - 6 任意时间点恢复》

《PostgreSQL PITR THREE recovery target MODE: name,xid,time USE CASE - 2》

《PostgreSQL PITR THREE recovery target MODE: name,xid,time USE CASE - 1》

本文将详细介绍“使用延迟垃圾回收、脏读、行头事务号、事务提交日志,实现DML操作的闪回。”的方法。

flashback 前提

1、延迟VACUUM,确保误操作的数据还没有被垃圾回收。

vacuum_defer_cleanup_age = 1000000  # 延迟100万个事务再回收垃圾,误操作后在100万个事务内,如果发现了误操作,才有可能使用本文提到的方法闪回。  

2、记录未被freeze,确保无操作的数据,以及后面提交的事务号没有被freeze(抹去)。

vacuum_freeze_min_age = 50000000   # 事务年龄大于5000万时,才可能被抹去事务号。  

3、开启事务提交时间跟踪,确保可以从xid得到事务结束的时间。(开启事务结束时间跟踪后,会开辟一块共享内存区存储这个信息。)。

track_commit_timestamp = on     # 开启事务结束时间跟踪  

4、事务提交时间跟踪未抹去。可以加大BUFFER来增加可跟踪的事务数。(重启数据库不影响,有持久化已有的跟踪记录)

Size  
CommitTsShmemBuffers(void)  
{  
        return Min(16, Max(4, NBuffers / 1024));  
}  
cd $PGDATA  
  
ll  
drwx------ 2 digoal users 4.0K Oct 10 10:28 pg_commit_ts  

flashback 准备工作

1、将xid转换为txid的函数,见本文末尾。(因为记录的行头部,存储的是XID,而不是TXID,而查询事务提交状态,用的是TXID,因此需要转换一下。)

2、脏读插件pg_dirtyread(用于读取脏页)

https://github.com/ChristophBerg/pg_dirtyread

create extension pg_dirtyread ;  

3、根据txid查询事务提交状态

postgres=# select txid_status(2);  
 txid_status   
-------------  
 committed  
(1 row)  

4、根据xid查询事务提交时间

postgres=#  select xmin,xmax,xid_to_txid(xmin),txid_status(xid_to_txid(xmin)),pg_xact_commit_timestamp(xmin) from trac ;  
   xmin   |   xmax   | xid_to_txid | txid_status |   pg_xact_commit_timestamp      
----------+----------+-------------+-------------+-------------------------------  
 40477717 | 40477727 | 25810281493 | committed   | 2017-10-10 10:29:21.269612+08  
 40477719 | 40477727 | 25810281495 | committed   | 2017-10-10 11:15:05.875067+08  
(2 rows)  

5、使用脏读插件,将需要flashback的数据(连同头信息xmin,xmax)写入临时表,根据以上两种方法生成如下字段:

写入事务提交状态、事务提交时间。(xmin)

删除事务提交状态、事务提交时间。(xmax)

create table tmp_xxx as   
select   
       xid_to_txid(xmin) as xmin_txid, txid_status(xid_to_txid(xmin)) as xmin_cmstat, pg_xact_commit_timestamp(xmin) as xmin_ts,   
       xid_to_txid(xmax) as xmax_txid, txid_status(xid_to_txid(xmax)) as xmax_cmstat, pg_xact_commit_timestamp(case xmax when 0 then null else xmax end) as xmax_ts,  
       xmin as xmin1,xmax as xmax1,dead,oid,  
       id  -- 目标表字段  
from   
(  
  SELECT * FROM pg_dirtyread('table'::regclass)  
  as t (tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean, oid oid,   
       id int)  -- 目标表字段  
) t;  

例如

postgres=# select   
       xid_to_txid(xmin) as xmin_txid, txid_status(xid_to_txid(xmin)) as xmin_cmstat, pg_xact_commit_timestamp(xmin) as xmin_ts,   
       xid_to_txid(xmax) as xmax_txid, txid_status(xid_to_txid(xmax)) as xmax_cmstat, pg_xact_commit_timestamp(case xmax when 0 then null else xmax end) as xmax_ts,  
       xmin as xmin1,xmax as xmax1,dead,oid,  
       id  -- 目标表字段  
from   
(  
  SELECT * FROM pg_dirtyread('trac'::regclass)  
  as t (tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean, oid oid,   
       id int)  -- 目标表字段  
) t;  
  
  
  
  xmin_txid  | xmin_cmstat |            xmin_ts            |  xmax_txid  | xmax_cmstat |            xmax_ts            |  xmin1   |  xmax1   | dead | oid | id   
-------------+-------------+-------------------------------+-------------+-------------+-------------------------------+----------+----------+------+-----+----  
 25810281493 | committed   | 2017-10-10 10:29:21.269612+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477717 | 40477750 | f    |   0 |  1  
 25810281495 | committed   | 2017-10-10 11:15:05.875067+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477719 | 40477752 | f    |   0 |  2  
 25810281503 | aborted     |                               |           0 |             |                               | 40477727 |        0 | t    |   0 |  2  
 25810281503 | aborted     |                               |           0 |             |                               | 40477727 |        0 | t    |   0 |  2  
 25810281513 | committed   | 2017-10-10 16:56:12.206339+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477737 | 40477752 | f    |   0 |  2  
 25810281514 | committed   | 2017-10-10 16:56:13.706233+08 |           0 |             |                               | 40477738 |        0 | f    |   0 |  3  
 25810281515 | committed   | 2017-10-10 16:56:15.108331+08 |           0 |             |                               | 40477739 |        0 | f    |   0 |  4  
 25810281516 | committed   | 2017-10-10 16:56:16.092184+08 | 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 40477740 | 40477745 | f    |   0 |  5  
 25810281517 | committed   | 2017-10-10 16:56:17.207356+08 | 25810281519 | committed   | 2017-10-10 16:56:48.011544+08 | 40477741 | 40477743 | f    |   0 |  6  
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  6  
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  6  
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  6  
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  7  
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  7  
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  7  
 25810281519 | committed   | 2017-10-10 16:56:48.011544+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477743 | 40477752 | f    |   0 |  2  
 25810281520 | aborted     |                               |           0 |             |                               | 40477744 |        0 | t    |   0 |  2  
 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477745 | 40477752 | f    |   0 |  2  
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1  
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1  
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1  
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1  
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1  
(23 rows)  
  
  
  
postgres=# select xmin,xmax,* from trac ;  
   xmin   | xmax | id   
----------+------+----  
 40477738 |    0 |  3  
 40477739 |    0 |  4  
(2 rows)  

flashback 实践

1、根据pg_xlogdump找到精确的误操作xid,以及事务提交对应的时间戳。(或者使用用户提供的时间戳,大概的误操作前的时间。)

参考:

《PostgreSQL 使用pg_xlogdump找到误操作事务号》

2、回退到过去的某个时间点(采用基于临时表的VIEW来展现) (根据事务提交顺序,逆序,逐个事务排除,逐个事务回退。)

select * from tmp_xxx where   
  (  
    xmin_cmstat='committed' and xmin_ts <= '$ts'   
    and  
    (xmax_cmstat='committed' and xmax_ts < '$ts') is distinct from true  
  )  
;  

例子

select * from tmp_xxx where   
  (  
    xmin_cmstat='committed' and xmin_ts <= '2017-10-10 16:56:58.684832+08'   
    and  
    (xmax_cmstat='committed' and xmax_ts < '2017-10-10 16:56:58.684832+08') is distinct from true  
  )  
;  
  
  xmin_txid  | xmin_cmstat |            xmin_ts            |  xmax_txid  | xmax_cmstat |            xmax_ts            |  xmin1   |  xmax1   | dead | oid | id   
-------------+-------------+-------------------------------+-------------+-------------+-------------------------------+----------+----------+------+-----+----  
 25810281493 | committed   | 2017-10-10 10:29:21.269612+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477717 | 40477750 | f    |   0 |  1  
 25810281495 | committed   | 2017-10-10 11:15:05.875067+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477719 | 40477752 | f    |   0 |  2  
 25810281513 | committed   | 2017-10-10 16:56:12.206339+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477737 | 40477752 | f    |   0 |  2  
 25810281514 | committed   | 2017-10-10 16:56:13.706233+08 |           0 |             |                               | 40477738 |        0 | f    |   0 |  3  
 25810281515 | committed   | 2017-10-10 16:56:15.108331+08 |           0 |             |                               | 40477739 |        0 | f    |   0 |  4  
 25810281516 | committed   | 2017-10-10 16:56:16.092184+08 | 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 40477740 | 40477745 | f    |   0 |  5  
 25810281519 | committed   | 2017-10-10 16:56:48.011544+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477743 | 40477752 | f    |   0 |  2  
 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477745 | 40477752 | f    |   0 |  2  
(8 rows)  
  

3、找出被误操作事务 删除、更新、插入 的记录(OLD ROW(被删除、更新前的), NEW ROW(更新后的、误插入的))。

select * from tmp_xxx where xmax1=? and xmax_cmstat='committed' and xmin_cmstat='committed';    -- 被某个XID删除、更新前的数据。  
  
select * from tmp_xxx where xmin1=? and xmin_cmstat='committed';                                -- 被某个XID插入、更新后的数据。  

例子

postgres=# select * from tmp_xxx where xmax1=40477745 and xmax_cmstat='committed' and xmin_cmstat='committed';   
  xmin_txid  | xmin_cmstat |            xmin_ts            |  xmax_txid  | xmax_cmstat |            xmax_ts            |  xmin1   |  xmax1   | dead | oid | id   
-------------+-------------+-------------------------------+-------------+-------------+-------------------------------+----------+----------+------+-----+----  
 25810281516 | committed   | 2017-10-10 16:56:16.092184+08 | 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 40477740 | 40477745 | f    |   0 |  5  
(1 row)  
  
postgres=# select * from tmp_xxx where xmin1=40477745 and xmin_cmstat='committed';  
  xmin_txid  | xmin_cmstat |            xmin_ts            |  xmax_txid  | xmax_cmstat |            xmax_ts            |  xmin1   |  xmax1   | dead | oid | id   
-------------+-------------+-------------------------------+-------------+-------------+-------------------------------+----------+----------+------+-----+----  
 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477745 | 40477752 | f    |   0 |  2  
(1 row)  

flashback 可能造成的负面影响

flashback的前提,前面已经讲了,但是可能带来一些负面影响。

1、由于设置了vacuum_defer_cleanup_age,因此每次都会被触发VACUUM,扫描表,但是又有一些垃圾不能被回收;

2、可能导致表膨胀。

解决负面影响

1、提高autovacuum_naptime,从而降低频繁扫描的概率。

2、膨胀可能无法解决。建议修改内核,实现可以仅针对重要的表设置 vacuum_defer_cleanup_age。从而避免全库膨胀。

相关知识点

相关参数 - 跟踪事务时间、延迟回收垃圾

track_commit_timestamp = on         # collect timestamp of transaction commit  
                                    # (change requires restart)  
  
vacuum_defer_cleanup_age = 100000   # number of xacts by which cleanup is delayed  

相关函数 - 事务状态、事务结束时间

Name Return Type Description
txid_status(bigint) txid_status report the status of the given transaction: committed, aborted, in progress, or null if the transaction ID is too old
pg_xact_commit_timestamp(xid) timestamp with time zone get commit timestamp of a transaction

https://www.postgresql.org/docs/10/static/functions-info.html

txid_status(bigint)   
  
reports the commit status of a recent transaction.   
  
Applications may use it to determine whether a transaction committed or aborted   
when the application and database server become disconnected while a COMMIT is in progress.   
  
The status of a transaction will be reported as either in progress, committed, or aborted,   
provided that the transaction is recent enough that the system retains the commit status of that transaction.   
  
If is old enough that no references to that transaction survive in the system and the commit status information has been discarded,   
this function will return NULL. Note that prepared transactions are reported as in progress;   
  
applications must check pg_prepared_xacts if they need to determine whether the txid is a prepared transaction.  

读脏页插件

https://github.com/ChristophBerg/pg_dirtyread

如何跟踪事务结束时间

开启事务结束时间跟踪后,会开辟一块共享内存,跟踪事务结束时间。

/*  
 * Number of shared CommitTS buffers.  
 *  
 * We use a very similar logic as for the number of CLOG buffers; see comments  
 * in CLOGShmemBuffers.  
 */  
Size  
CommitTsShmemBuffers(void)  
{  
        return Min(16, Max(4, NBuffers / 1024));  
}  
  
/*  
 * Shared memory sizing for CommitTs  
 */  
Size  
CommitTsShmemSize(void)  
{  
        return SimpleLruShmemSize(CommitTsShmemBuffers(), 0) +  
                sizeof(CommitTimestampShared);  
}  

xid和txid的区别

xid是32位的整型,会被FREEZE,循环使用,xmin,xmax都是XID类型。通过xid可以得到事务结束时间。

txid是64位整型,不会被循环使用,通过epoch可以从xid转换为txid。通过txid可以得到事务的提交状态。

/*  
 *      Export internal transaction IDs to user level.  
 *  
 * Note that only top-level transaction IDs are ever converted to TXID.  
 * This is important because TXIDs frequently persist beyond the global  
 * xmin horizon, or may even be shipped to other machines, so we cannot  
 * rely on being able to correlate subtransaction IDs with their parents  
 * via functions such as SubTransGetTopmostTransaction().  
 *  
/* txid will be signed int8 in database, so must limit to 63 bits */  
#define MAX_TXID   UINT64CONST(0x7FFFFFFFFFFFFFFF)  
  
/*  
 * do a TransactionId -> txid conversion for an XID near the given epoch  
 */  
static txid  
convert_xid(TransactionId xid, const TxidEpoch *state)  
{  
        uint64          epoch;  
  
        /* return special xid's as-is */  
        if (!TransactionIdIsNormal(xid))  
                return (txid) xid;  
  
        /* xid can be on either side when near wrap-around */  
        epoch = (uint64) state->epoch;  
        if (xid > state->last_xid &&  
                TransactionIdPrecedes(xid, state->last_xid))  
                epoch--;  
        else if (xid < state->last_xid &&  
                         TransactionIdFollows(xid, state->last_xid))  
                epoch++;  
  
        return (epoch << 32) | xid;  
}  

如何将xid转换为txid

我们可以自定义一个函数,将xid转换为txid。

vi xid_to_txid.c  
  
#include "postgres.h"  
#include "fmgr.h"  
#include "access/xact.h"  
#include "access/transam.h"  
#include "access/xlog.h"  
  
/* Use unsigned variant internally */  
typedef uint64 txid;  
  
/*  
 * Epoch values from xact.c  
 */  
typedef struct  
{  
        TransactionId last_xid;  
        uint32          epoch;  
} TxidEpoch;  
  
PG_MODULE_MAGIC;  
  
PG_FUNCTION_INFO_V1(xid_to_txid);  
  
/*  
 * do a TransactionId -> txid conversion for an XID near the given epoch  
 */  
static txid  
convert_xid(TransactionId xid, const TxidEpoch *state)  
{  
        uint64          epoch;  
  
        /* return special xid's as-is */  
        if (!TransactionIdIsNormal(xid))  
                return (txid) xid;  
  
        /* xid can be on either side when near wrap-around */  
        epoch = (uint64) state->epoch;  
        if (xid > state->last_xid &&  
                TransactionIdPrecedes(xid, state->last_xid))  
                epoch--;  
        else if (xid < state->last_xid &&  
                         TransactionIdFollows(xid, state->last_xid))  
                epoch++;  
  
        return (epoch << 32) | xid;  
}  
    
/*  
 * Fetch epoch data from xact.c.  
 */  
static void  
load_xid_epoch(TxidEpoch *state)  
{  
        GetNextXidAndEpoch(&state->last_xid, &state->epoch);  
}  
  
  
Datum  
xid_to_txid(PG_FUNCTION_ARGS)  
{  
        txid            val;  
        TxidEpoch       state;  
        TransactionId   xid;  
  
        xid = DatumGetTransactionId(PG_GETARG_DATUM(0));  
          
        load_xid_epoch(&state);  
  
        val = convert_xid(xid, &state);  
  
        PG_RETURN_INT64(val);  
}  

编译

gcc -O3 -Wall -Wextra -Werror -I /home/digoal/postgresql-10beta4/src/include -g -fPIC -c ./xid_to_txid.c -o xid_to_txid.o  
  
gcc -O3 -Wall -Wextra -Werror -I /home/digoal/postgresql-10beta4/src/include -g -shared xid_to_txid.o -o libxid_to_txid.so  

拷贝到数据库软件LIB目录

cp libxid_to_txid.so pgsql10/lib/  

创建函数

create or replace function xid_to_txid(xid) returns int8 as '$libdir/libxid_to_txid.so', 'xid_to_txid' language C STRICT;  

测试

-- 当前值  
postgres=# select xmin,xmax,xid_to_txid(xmin),txid_status(xid_to_txid(xmin)),pg_xact_commit_timestamp(xmin) from trac ;  
   xmin   | xmax | xid_to_txid | txid_status |   pg_xact_commit_timestamp      
----------+------+-------------+-------------+-------------------------------  
 40477717 |    0 |  8630412309 | committed   | 2017-10-10 10:29:21.269612+08  
 40477719 |    0 |  8630412311 | committed   | 2017-10-10 11:15:05.875067+08  
(2 rows)  
  
-- 产生一些DML  
postgres=# begin;  
BEGIN  
postgres=# update trac set id =2;  
UPDATE 2  
postgres=# rollback;  
ROLLBACK  
  
-- 值的变化  
postgres=# select xmin,xmax,xid_to_txid(xmin),txid_status(xid_to_txid(xmin)),pg_xact_commit_timestamp(xmin) from trac ;  
   xmin   |   xmax   | xid_to_txid | txid_status |   pg_xact_commit_timestamp      
----------+----------+-------------+-------------+-------------------------------  
 40477717 | 40477727 |  8630412309 | committed   | 2017-10-10 10:29:21.269612+08  
 40477719 | 40477727 |  8630412311 | committed   | 2017-10-10 11:15:05.875067+08  
(2 rows)  
  
postgres=# select xmin,xmax,xid_to_txid(xmin),txid_status(xid_to_txid(xmin)),pg_xact_commit_timestamp(xmin),xid_to_txid(xmax),txid_status(xid_to_txid(xmax)),pg_xact_commit_timestamp(xmax) from trac ;  
   xmin   |   xmax   | xid_to_txid | txid_status |   pg_xact_commit_timestamp    | xid_to_txid | txid_status | pg_xact_commit_timestamp   
----------+----------+-------------+-------------+-------------------------------+-------------+-------------+--------------------------  
 40477717 | 40477727 |  8630412309 | committed   | 2017-10-10 10:29:21.269612+08 |  8630412319 | aborted     |   
 40477719 | 40477727 |  8630412311 | committed   | 2017-10-10 11:15:05.875067+08 |  8630412319 | aborted     |   
(2 rows)  
  
-- 停库,修改epoch  
  
pg_ctl stop -m fast  
  
pg_resetwal -e 5 $PGDATA  
Write-ahead log reset  
  
-- 通过控制文件可以得到这个epoch  
pg_controldata   
Latest checkpoint's NextXID:          5:40477728  
  
-- 继续修改epoch  
  
pg_resetwal -e 6 $PGDATA  
Write-ahead log reset  
  
-- 通过控制文件可以得到这个epoch  
pg_controldata   
Latest checkpoint's NextXID:          6:40477728  
  
-- 启动  
pg_ctl start  
  
  
-- epoch修改后,txid发生了变化  
postgres=# select xmin,xmax,xid_to_txid(xmin),txid_status(xid_to_txid(xmin)),pg_xact_commit_timestamp(xmin),xid_to_txid(xmax),txid_status(xid_to_txid(xmax)),pg_xact_commit_timestamp(xmax) from trac ;  
   xmin   |   xmax   | xid_to_txid | txid_status |   pg_xact_commit_timestamp    | xid_to_txid | txid_status | pg_xact_commit_timestamp   
----------+----------+-------------+-------------+-------------------------------+-------------+-------------+--------------------------  
 40477717 | 40477727 | 25810281493 | committed   | 2017-10-10 10:29:21.269612+08 | 25810281503 | aborted     |   
 40477719 | 40477727 | 25810281495 | committed   | 2017-10-10 11:15:05.875067+08 | 25810281503 | aborted     |   
(2 rows)  

为什么会有EPOCH呢?因为XID是循环使用的,每一个循环都会使得EPOCH自增1。从而使得TXID可以持续增长。

参考

《PostgreSQL 闪回 - flash back query emulate by trigger》

《Use pg_resetxlog simulate tuple disappear within PostgreSQL》

《PostgreSQL xid(int4) to txid(int8)》

《PostgreSQL 使用pg_xlogdump找到误操作事务号》

https://github.com/ChristophBerg/pg_dirtyread

https://www.postgresql.org/docs/10/static/functions-info.html

《PostgreSQL Oracle 兼容性之 - 事件触发器实现类似Oracle的回收站功能》

《PostgreSQL 回收站功能 - 基于HOOK的recycle bin pgtrashcan》

其他

误操作后,如果又发生了DDL,例如新增字段,修改字段等。使用本文提供的闪回方法,有没有问题?

测一下就知道了,不会有问题。

postgres=# alter table trac add column info text;  
ALTER TABLE  
postgres=# insert into trac values (2,'test');  
INSERT 0 1  
  
postgres=# select   
       xid_to_txid(xmin) as xmin_txid, txid_status(xid_to_txid(xmin)) as xmin_cmstat, pg_xact_commit_timestamp(xmin) as xmin_ts,   
       xid_to_txid(xmax) as xmax_txid, txid_status(xid_to_txid(xmax)) as xmax_cmstat, pg_xact_commit_timestamp(case xmax when 0 then null else xmax end) as xmax_ts,  
       xmin as xmin1,xmax as xmax1,dead,oid,  
       id,info  -- 目标表字段  
from   
(  
  SELECT * FROM pg_dirtyread('trac'::regclass)  
  as t (tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean, oid oid,   
       id int, info text)  -- 目标表字段  
) t;  
  xmin_txid  | xmin_cmstat |            xmin_ts            |  xmax_txid  | xmax_cmstat |            xmax_ts            |  xmin1   |  xmax1   | dead | oid | id | info   
-------------+-------------+-------------------------------+-------------+-------------+-------------------------------+----------+----------+------+-----+----+------  
 25810281493 | committed   | 2017-10-10 10:29:21.269612+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477717 | 40477750 | f    |   0 |  1 |   
 25810281495 | committed   | 2017-10-10 11:15:05.875067+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477719 | 40477752 | f    |   0 |  2 |   
 25810281503 | aborted     |                               |           0 |             |                               | 40477727 |        0 | t    |   0 |  2 |   
 25810281503 | aborted     |                               |           0 |             |                               | 40477727 |        0 | t    |   0 |  2 |   
 25810281513 | committed   | 2017-10-10 16:56:12.206339+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477737 | 40477752 | f    |   0 |  2 |   
 25810281514 | committed   | 2017-10-10 16:56:13.706233+08 |           0 |             |                               | 40477738 |        0 | f    |   0 |  3 |   
 25810281515 | committed   | 2017-10-10 16:56:15.108331+08 |           0 |             |                               | 40477739 |        0 | f    |   0 |  4 |   
 25810281516 | committed   | 2017-10-10 16:56:16.092184+08 | 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 40477740 | 40477745 | f    |   0 |  5 |   
 25810281517 | committed   | 2017-10-10 16:56:17.207356+08 | 25810281519 | committed   | 2017-10-10 16:56:48.011544+08 | 40477741 | 40477743 | f    |   0 |  6 |   
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  6 |   
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  6 |   
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  6 |   
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  7 |   
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  7 |   
 25810281518 | aborted     |                               |           0 |             |                               | 40477742 |        0 | t    |   0 |  7 |   
 25810281519 | committed   | 2017-10-10 16:56:48.011544+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477743 | 40477752 | f    |   0 |  2 |   
 25810281520 | aborted     |                               |           0 |             |                               | 40477744 |        0 | t    |   0 |  2 |   
 25810281521 | committed   | 2017-10-10 16:56:58.684832+08 | 25810281528 | committed   | 2017-10-10 17:25:47.410793+08 | 40477745 | 40477752 | f    |   0 |  2 |   
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1 |   
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1 |   
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1 |   
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1 |   
 25810281523 | committed   | 2017-10-10 17:21:39.000214+08 | 25810281526 | committed   | 2017-10-10 17:25:34.593189+08 | 40477747 | 40477750 | f    |   0 |  1 |   
 25810281532 | committed   | 2017-10-10 18:58:50.720095+08 |           0 |             |                               | 40477756 |        0 | f    |   0 |  2 | test  
(24 rows)  

小结

本文介绍了原地闪回的一种方法:

1、新增脏读功能、新增表级vacuum_defer_cleanup_age(避免全库膨胀)功能、开启track_commit_timestamp。可以实现dml flashback(闪回)。

2、增加回收站(通过HOOK)功能,可以实现DDL flashback。

全库闪回,可以参考PostgreSQL的PITR的功能。

原地闪回的另一种方法:利用redo的undo内容进行闪回,这种方法可以避免膨胀的问题,回退到什么时间点,与保留的REDO文件数有关。使用redo来闪回也会引入一个问题,UNDO需要记录更多的内容,导致REDO文件内容变多。因此建议也是设置表级redo的UNDO内容。

create table\alter table

REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING }
  
需要设置为FULL,存储所有的OLD VALUE,才能实现UNDO。  
REPLICA IDENTITY

This form changes the information which is written to the write-ahead log 
to identify rows which are updated or deleted. 

This option has no effect except when logical replication is in use. 

DEFAULT (the default for non-system tables) records the old values of the columns 
of the primary key, if any. 

USING INDEX records the old values of the columns covered by the named index, 
which must be unique, not partial, not deferrable, and include only columns marked NOT NULL. 

FULL records the old values of all columns in the row. 

NOTHING records no information about the old row. 
(This is the default for system tables.) 

In all cases, no old values are logged unless at least one of the columns that would 
be logged differs between the old and new versions of the row.