Reference: SQL

This section gives the SQL syntax that HAWQ ++ now supports。

ABORT

Aborts the current transaction.

Synopsis

1
ABORT [ WORK | TRANSACTION ]

Description

ABORT rolls back the current transaction and causes all the updates made by the transaction to be discarded. This command is identical in behavior to the standard SQL command ROLLBACK, and is present only for historical reasons.

Parameters

WORK TRANSACTION Optional key words. They have no effect.

Notes

Use COMMIT to successfully terminate a transaction.

Issuing ABORT when not inside a transaction does no harm, but it will provoke a warning message.

Compatibility

This command is a HAWQ extension present for historical reasons. ROLLBACK is the equivalent standard SQL command.

ALTER AGGREGATE

Changes the definition of an aggregate function.

Synopsis

1
2
3
4
5
ALTER AGGREGATE <name> ( <type> [ , ... ] ) RENAME TO <new_name>

ALTER AGGREGATE <name> ( <type> [ , ... ] ) OWNER TO <new_owner>

ALTER AGGREGATE <name> ( <type> [ , ... ] ) SET SCHEMA <new_schema>

Description

ALTER AGGREGATE changes the definition of an aggregate function.

You must own the aggregate function to use ALTER AGGREGATE. To change the schema of an aggregate function, you must also have CREATE privilege on the new schema. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the aggregate function’s schema. (These restrictions enforce that altering the owner does not do anything you could not do by dropping and recreating the aggregate function. However, a superuser can alter ownership of any aggregate function anyway.)

Parameters

1
<name>

The name (optionally schema-qualified) of an existing aggregate function.

1
<type>

An input data type on which the aggregate function operates. To reference a zero-argument aggregate function, write * in place of the list of input data types.

1
<new_name>

The new name of the aggregate function.

1
<new_owner>

The new owner of the aggregate function.

1
<new_schema>

The new schema for the aggregate function.

Examples

To rename the aggregate function myavg for type integer to my_average:

1
ALTER AGGREGATE myavg(integer) RENAME TO my_average;

To change the owner of the aggregate function myavg for type integer to joe:

1
ALTER AGGREGATE myavg(integer) OWNER TO joe;

To move the aggregate function myavg for type integer into schema myschema:

1
ALTER AGGREGATE myavg(integer) SET SCHEMA myschema;

Compatibility

There is no ALTER AGGREGATE statement in the SQL standard.

ALTER DATABASE

Changes the attributes of a database.

Synopsis

1
2
3
ALTER DATABASE <name> SET <parameter> { TO | = } { <value> | DEFAULT }

ALTER DATABASE <name> RESET <parameter>

Description

ALTER DATABASE changes the attributes of a HAWQ database.

SET and RESET changes the session default for a configuration parameter for a HAWQ database. Whenever a new session is subsequently started in that database, the specified value becomes the session default value. The database-specific default overrides whatever setting is present in the server configuration file (hawq-site.xml). Only the database owner or a superuser can change the session defaults for a database. Certain parameters cannot be set this way, or can only be set by a superuser.

Parameters

1
<name>

The name of the database whose attributes are to be altered. Note: HAWQ reserves the database “hcatalog” for system use. You cannot connect to or alter the system “hcatalog” database.

1
<parameter>

Set this database’s session default for the specified configuration parameter to the given value. If value is DEFAULT or if RESET is used, the database-specific setting is removed, so the system-wide default setting will be inherited in new sessions. Use RESET ALL to clear all database-specific settings. See About Server Configuration Parameters for information about user-settable configuration parameters.

Notes

It is also possible to set a configuration parameter session default for a specific role (user) rather than to a database. Role-specific settings override database-specific ones if there is a conflict.

Examples

To set the default schema search path for the mydatabase database:

1
2
3
ALTER DATABASE mydatabase SET search_path TO myschema,

public, pg_catalog;

Compatibility

The ALTER DATABASE statement is a HAWQ extension.

ALTER FUNCTION

Changes the definition of a function.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
ALTER FUNCTION <name> ( [ [<argmode>] [<argname>] <argtype> [, ...] ] )
<action> [, ... ] [RESTRICT]

ALTER FUNCTION <name> ( [ [<argmode>] [<argname>] <argtype> [, ...] ] )
RENAME TO <new_name>

ALTER FUNCTION <name> ( [ [<argmode>] [<argname>] <argtype> [, ...] ] )
OWNER TO <new_owner>

ALTER FUNCTION <name> ( [ [<argmode>] [<argname>] <argtype> [, ...] ] )
SET SCHEMA <new_schema>

where <action> is one of:

{ CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT }
{ IMMUTABLE | STABLE | VOLATILE }
{ [EXTERNAL] SECURITY INVOKER | [EXTERNAL] SECURITY DEFINER }

Description

ALTER FUNCTION changes the definition of a function.

You must own the function to use ALTER FUNCTION. To change a function’s schema, you must also have CREATE privilege on the new schema. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the function’s schema. (These restrictions enforce that altering the owner does not do anything you could not do by dropping and recreating the function. However, a superuser can alter ownership of any function anyway.)

Parameters

1
<name>

The name (optionally schema-qualified) of an existing function.

1
<argmode>

The mode of an argument: either IN, OUT, or INOUT. If omitted, the default is IN. Note that ALTER FUNCTION does not actually pay any attention to OUT arguments, since only the input arguments are needed to determine the function’s identity. So it is sufficient to list the IN and INOUT arguments.

1
<argname>

The name of an argument. Note that ALTER FUNCTION does not actually pay any attention to argument names, since only the argument data types are needed to determine the function’s identity.

1
<argtype>

The data type(s) of the function’s arguments (optionally schema-qualified), if any.

1
<new_name>

The new name of the function.

1
<new_owner>

The new owner of the function. Note that if the function is marked SECURITY DEFINER, it will subsequently execute as the new owner.

1
<new_schema>

The new schema for the function.

1
2
3
4
5
CALLED ON NULL INPUT

RETURNS NULL ON NULL INPUT

STRICT

CALLED ON NULL INPUT changes the function so that it will be invoked when some or all of its arguments are null. RETURNS NULL ON NULL INPUT or STRICT changes the function so that it is not invoked if any of its arguments are null; instead, a null result is assumed automatically. See CREATE FUNCTION for more information.

1
2
3
4
5
IMMUTABLE

STABLE

VOLATILE

Change the volatility of the function to the specified setting. See CREATE FUNCTION for details.

1
2
3
[ EXTERNAL ] SECURITY INVOKER

[ EXTERNAL ] SECURITY DEFINER

Change whether the function is a security definer or not. The key word EXTERNAL is ignored for SQL conformance. See CREATE FUNCTION for more information about this capability.

1
RESTRICT

Ignored for conformance with the SQL standard.

Notes

HAWQ has limitations on the use of functions defined as STABLE or VOLATILE. See CREATE FUNCTION for more information

Examples

To rename the function sqrt for type integer to square_root:

1
ALTER FUNCTION sqrt(integer) RENAME TO square_root;

To change the owner of the function sqrt for type integer to joe:

1
ALTER FUNCTION sqrt(integer) OWNER TO joe;

To change the schema of the function sqrt for type integer to math:

1
ALTER FUNCTION sqrt(integer) SET SCHEMA math;

Compatibility

This statement is partially compatible with the ALTER FUNCTION statement in the SQL standard. The standard allows more properties of a function to be modified, but does not provide the ability to rename a function, make a function a security definer, or change the owner, schema, or volatility of a function. The standard also requires the RESTRICT key word, which is optional in HAWQ.

ALTER OPERATOR

Changes the definition of an operator.

Synopsis

1
2
3
ALTER OPERATOR <name> ( {<lefttype> | NONE} , {<righttype> | NONE} )

OWNER TO <newowner>

Description

ALTER OPERATOR changes the definition of an operator. The only currently available functionality is to change the owner of the operator.

You must own the operator to use ALTER OPERATOR. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the operator’s schema. (These restrictions enforce that altering the owner does not do anything you could not do by dropping and recreating the operator. However, a superuser can alter ownership of any operator anyway.)

Parameters

1
<name>

The name (optionally schema-qualified) of an existing operator.

1
<lefttype>

The data type of the operator’s left operand; write NONE if the operator has no left operand.

1
<righttype>

The data type of the operator’s right operand; write NONE if the operator has no right operand.

1
<newowner>

The new owner of the operator.

Examples

Change the owner of a custom operator a @@ b for type text:

1
ALTER OPERATOR @@ (text, text) OWNER TO joe;

Compatibility

There is no ALTER OPERATOR statement in the SQL standard.

ALTER OPERATOR CLASS

Changes the definition of an operator class.

Synopsis

1
2
3
ALTER OPERATOR CLASS <name> USING <index_method> RENAME TO <newname>

ALTER OPERATOR CLASS <name> USING <index_method> OWNER TO <newowner>

Description

ALTER OPERATOR CLASS changes the definition of an operator class.

You must own the operator class to use ALTER OPERATOR CLASS. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the operator class’s schema. (These restrictions enforce that altering the owner does not do anything you could not do by dropping and recreating the operator class. However, a superuser can alter ownership of any operator class anyway.)

Parameters

1
<name>

The name (optionally schema-qualified) of an existing operator class.

1
<index_method>

The name of the index method this operator class is for.

1
<newname>

The new name of the operator class.

1
<newowner>

The new owner of the operator class

Compatibility

There is no ALTER OPERATOR CLASS statement in the SQL standard.

ALTER RESOURCE QUEUE

Modify an existing resource queue.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ALTER RESOURCE QUEUE <name> WITH (<queue_attribute>=<value> [, ... ])

where <queue_attribute> is:

   [MEMORY_LIMIT_CLUSTER=<percentage>]
   [CORE_LIMIT_CLUSTER=<percentage>]
   [ACTIVE_STATEMENTS=<integer>]
   [ALLOCATION_POLICY='even']
   [VSEG_RESOURCE_QUOTA='mem:<memory_units>']
   [RESOURCE_OVERCOMMIT_FACTOR=<double>]
   [NVSEG_UPPER_LIMIT=<integer>]
   [NVSEG_LOWER_LIMIT=<integer>]
   [NVSEG_UPPER_LIMIT_PERSEG=<double>]
   [NVSEG_LOWER_LIMIT_PERSEG=<double>]

   <memory_units> ::= {128mb|256mb|512mb|1024mb|2048mb|4096mb|
                   8192mb|16384mb|1gb|2gb|4gb|8gb|16gb}
   <percentage> ::= <integer>%

Description

Changes attributes for an existing resource queue in HAWQ. You cannot change the parent of an existing resource queue, and you cannot change a resource queue while it is active. Only a superuser can modify a resource queue.

Resource queues with an ACTIVE_STATEMENTS threshold set a maximum limit on the number parallel active query statements that can be executed by roles assigned to the leaf queue. It controls the number of active queries that are allowed to run at the same time. The value for ACTIVE_STATEMENTS should be an integer greater than 0. If not specified, the default value is 20.

When modifying the resource queue, use the MEMORY_LIMIT_CLUSTER and CORE_LIMIT_CLUSTER to tune the allowed resource usage of the resource queue. MEMORY_LIMIT_CLUSTER and CORE_LIMIT_CLUSTER must be equal for the same resource queue. In addition the sum of the percentages of MEMORY_LIMIT_CLUSTER (and CORE_LIMIT_CLUSTER) for resource queues that share the same parent cannot exceed 100%.

To modify the role associated with the resource queue, use the ALTER ROLE or CREATE ROLE command. You can only assign roles to the leaf-level resource queues (resource queues that do not have any children.)

The default memory allotment can be overridden on a per-query basis by using hawq_rm_stmt_vseg_memory andhawq_rm_stmt_nvseg configuration parameters.

Parameters

1
<name>

Required. The name of the resource queue you wish to modify.

1
MEMORY_LIMIT_CLUSTER=<percentage>

Required. Defines how much memory a resource queue can consume from its parent resource queue and consequently dispatch to the execution of parallel statements. The valid values are 1% to 100%. The value of MEMORY_LIMIT_CLUSTER must be identical to the value of CORE_LIMIT_CLUSTER. The sum of values for MEMORY_LIMIT_CLUSTER of this queue plus other queues that share the same parent cannot exceed 100%. The HAWQ resource manager periodically validates this restriction.

Note: If you want to increase the percentage, you may need to decrease the percentage of any resource queue(s) that share the same parent resource queue first. The total cannot exceed 100%.

1
CORE_LIMIT_CLUSTER=<percentage>

Required. The percentage of consumable CPU (virtual core) resources that the resource queue can take from its parent resource queue. The valid values are 1% to 100%. The value of CORE_LIMIT_CLUSTER must be identical to the value of MEMORY_LIMIT_CLUSTER. The sum of values for CORE_LIMIT_CLUSTER of this queue and queues that share the same parent cannot exceed 100%.

Note: If you want to increase the percentage, you may need to decrease the percentage of any resource queue(s) that share the same parent resource queue first. The total cannot exceed 100%.

1
ACTIVE_STATEMENTS=<integer>

Optional. Defines the limit of the number of parallel active statements in one leaf queue. The maximum number of connections cannot exceed this limit. If this limit is reached, the HAWQ resource manager queues more query allocation requests. Note that a single session can have several concurrent statement executions that occupy multiple connection resources. The value for ACTIVE_STATEMENTS should be an integer greater than 0. The default value is 20.

1
ALLOCATION_POLICY=<string>

Optional. Defines the resource allocation policy for parallel statement execution. The default value is even. Note: This release only supports an even allocation policy. Even if you do not specify this attribute, the resource queue still applies an even allocation policy. Future releases will support alternative allocation policies.

Setting the allocation policy to even means resources are always evenly dispatched based on current concurrency. When multiple query resource allocation requests are queued, the resource queue tries to evenly dispatch resources to queued requests until one of the following conditions are encountered:

  • There are no more allocated resources in this queue to dispatch, or
  • The ACTIVE_STATEMENTS limit has been reached

For each query resource allocation request, the HAWQ resource mananger determines the minimum and maximum size of a virtual segment based on multiple factors including query cost, user configuration, table properties, and so on. For example, a hash distributed table requires fixed size of virtual segments. With an even allocation policy, the HAWQ resource manager uses the minimum virtual segment size requirement and evenly dispatches resources to each query resource allocation request in the resource queue.

1
VSEG_RESOURCE_QUOTA=‘mem:{128mb | 256mb | 512mb | 1024mb | 2048mb | 4096mb | 8192mb | 16384mb | 1gb | 2gb | 4gb | 8gb | 16gb}’

Optional. This quota defines how resources are split across multiple virtual segments. For example, when the HAWQ resource manager determines that 256GB memory and 128 vcores should be allocated to the current resource queue, there are multiple solutions on how to divide the resources across virtual segments. For example, you could use a) 2GB/1 vcore * 128 virtual segments or b) 1GB/0.5 vcore * 256 virtual segments. Therefore, you can use this attribute to make the HAWQ resource manager calculate the number of virtual segments based on how to divide the memory. For example, if VSEG_RESOURCE_QUOTA=’mem:512mb’, then the resource queue will use 512MB/0.25 vcore * 512 virtual segments. The default value is ’mem:256mb’.

Note: To avoid resource fragmentation, make sure that the segment resource capacity configured for HAWQ (in HAWQ Standalone mode: hawq_rm_memory_limit_perseg; in YARN mode: yarn.nodemanager.resource.memory-mb must be a multiple of the resource quotas for all virtual segments and CPU to memory ratio must be a multiple of the amount configured for yarn.scheduler.minimum-allocation-mb

1
RESOURCE_OVERCOMMIT_FACTOR=<double>

Optional. This factor defines how much a resource can be overcommitted. The default value is 2.0. For example, if RESOURCE_OVERCOMMIT_FACTOR is set to 3.0 and MEMORY_LIMIT_CLUSTER is set to 30%, then the maximum possible resource allocation in this queue is 90% (30% x 3.0). If the resulting maximum is bigger than 100%, then 100% is adopted. The minimum value that this attribute can be set to is 1.0

1
NVSEG_UPPER_LIMIT=<integer> / NVSEG_UPPER_LIMIT_PERSEG=<double>

Optional. These limits restrict the range of number of virtual segments allocated in this resource queue for executing one query statement. NVSEG_UPPER_LIMIT defines an upper limit of virtual segments for one statement execution regardless of actual cluster size, while NVSEG_UPPER_LIMIT_PERSEG defines the same limit by using the average number of virtual segments in one physical segment. Therefore, the limit defined by NVSEG_UPPER_LIMIT_PERSEG varies dynamically according to the changing size of the HAWQ cluster. For example, if you set NVSEG_UPPER_LIMIT=10 all query resource requests are strictly allocated no more than 10 virtual segments. If you set NVSEG_UPPER_LIMIT_PERSEG=2 and assume that currently there are 5 available HAWQ segments in the cluster, query resource requests are allocated 10 virtual segments at the most.

NVSEG_UPPER_LIMIT cannot be set to a lower value than NVSEG_LOWER_LIMIT if both limits are enabled. In addition, the upper limit cannot be set to a value larger than the value set in global configuration parameter hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit. By default, both limits are set to -1, which means the limits are disabled. NVSEG_UPPER_LIMIT has higher priority than NVSEG_UPPER_LIMIT_PERSEG. If both limits are set, then NVSEG_UPPER_LIMIT_PERSEG is ignored. If you have enabled resource quotas for the query statement, then these limits are ignored.

Note: If the actual lower limit of the number of virtual segments becomes greater than the upper limit, then the lower limit is automatically reduced to be equal to the upper limit. This situation is possible when user sets both NVSEG_UPPER_LIMITand NVSEG_LOWER_LIMIT_PERSEG. After expanding the cluster, the dynamic lower limit may become greater than the value set for the fixed upper limit.

NVSEG_LOWER_LIMIT=<integer> / NVSEG_LOWER_LIMIT_PERSEG=<double>

Optional. These limits specify the minimum number of virtual segments allocated for one statement execution in order to guarantee query performance. NVSEG_LOWER_LIMIT defines the lower limit of virtual segments for one statement execution regardless the actual cluster size, while NVSEG_LOWER_LIMIT_PERSEG defines the same limit by the average virtual segment number in one segment. Therefore, the limit defined by NVSEG_LOWER_LIMIT_PERSEG varies dynamically along with the size of HAWQ cluster. NVSEG_UPPER_LIMIT_PERSEG cannot be less than NVSEG_LOWER_LIMIT_PERSEG if both limits are set enabled.

For example, if you set NVSEG_LOWER_LIMIT=10, and one statement execution potentially needs no fewer than 10 virtual segments, then this request has at least 10 virtual segments allocated. If you set NVSEG_UPPER_LIMIT_PERSEG=2, assuming there are currently 5 available HAWQ segments in the cluster, and one statement execution potentially needs no fewer than 10 virtual segments, then the query resource request will be allocated at least 10 virtual segments. If one statement execution needs at most 4 virtual segments, the resource manager will allocate at most 4 virtual segments instead of 10 since this resource request does not need more than 9 virtual segments.

By default, both limits are set to -1, which means the limits are disabled. NVSEG_LOWER_LIMIT has higher priority than NVSEG_LOWER_LIMIT_PERSEG. If both limits are set, then NVSEG_LOWER_LIMIT_PERSEG is ignored. If you have enabled resource quotas for the query statement, then these limits are ignored.

Note: If the actual lower limit of the number of virtual segments becomes greater than the upper limit, then the lower limit is automatically reduced to be equal to the upper limit. This situation is possible when user sets both NVSEG_UPPER_LIMITand NVSEG_LOWER_LIMIT_PERSEG. After expanding the cluster, the dynamic lower limit may become greater than the value set for the fixed upper limit.

Examples

Change the memory and core limit of a resource queue:

1
2
ALTER RESOURCE QUEUE test_queue_1 WITH (MEMORY_LIMIT_CLUSTER=40%,
CORE_LIMIT_CLUSTER=40%);

Change the active statements maximum for the resource queue:

1
ALTER RESOURCE QUEUE test_queue_1 WITH (ACTIVE_STATEMENTS=50);

Compatibility

ALTER RESOURCE QUEUE is a HAWQ extension. There is no provision for resource queues or workload management in the SQL standard.

ALTER ROLE

Changes a database role (user or group).

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ALTER ROLE <name> RENAME TO <newname>

ALTER ROLE <name> RESOURCE QUEUE {<queue_name> | NONE}

ALTER ROLE <name> [ [WITH] <option> [ ... ] ]

where <option> can be:

   SUPERUSER | NOSUPERUSER
 | CREATEDB | NOCREATEDB
 | CREATEROLE | NOCREATEROLE
 | CREATEEXTTABLE | NOCREATEEXTTABLE
 [ ( <attribute>='<value>'[, ...] ) ]
        where attribute and value are:
        type='readable'|'writable'
        protocol='gpfdist'|'http'

 | INHERIT | NOINHERIT
 | LOGIN | NOLOGIN
 | CONNECTION LIMIT <connlimit>
 | [ENCRYPTED | UNENCRYPTED] PASSWORD '<password>'
 | VALID UNTIL '<timestamp>'
 | [ DENY <deny_point> ]
 | [ DENY BETWEEN <deny_point> AND <deny_point>]
 | [ DROP DENY FOR <deny_point> ]

Description

ALTER ROLE changes the attributes of a HAWQ role. There are several variants of this command:

  • RENAME — Changes the name of the role. Database superusers can rename any role. Roles having CREATEROLE privilege can rename non-superuser roles. The current session user cannot be renamed (connect as a different user to rename a role). Because MD5-encrypted passwords use the role name as cryptographic salt, renaming a role clears its password if the password is MD5-encrypted.
  • RESOURCE QUEUE — Assigns the role to a workload management resource queue. The role would then be subject to the limits assigned to the resource queue when issuing queries. Specify NONE to assign the role to the default resource queue. A role can only belong to one resource queue. For a role without LOGIN privilege, resource queues have no effect. See CREATE RESOURCE QUEUE for more information.
  • WITH <option> — Changes many of the role attributes that can be specified in CREATE ROLE. Attributes not mentioned in the command retain their previous settings. Database superusers can change any of these settings for any role. Roles having CREATEROLE privilege can change any of these settings, but only for non-superuser roles. Ordinary roles can only change their own password.

Note: SET and RESET commands are currently not supported in connection with ALTER ROLE and will result in an error. See SET and About Server Configuration Parameters for information about user-settable configuration parameters.

Parameters

1
<name>

The name of the role whose attributes are to be altered.

1
<newname>

The new name of the role.

1
<queue_name>

The name of the resource queue to which the user-level role is to be assigned. Only roles with LOGIN privilege can be assigned to a resource queue. To unassign a role from a resource queue and put it in the default resource queue, specify NONE. A role can only belong to one resource queue.

1
2
3
4
SUPERUSER | NOSUPERUSER
CREATEDB | NOCREATEDB
CREATEROLE | NOCREATEROLE
CREATEEXTTABLE | NOCREATEEXTTABLE [(<attribute>=’<value>’)]

If CREATEEXTTABLE is specified, the role being defined is allowed to create external tables. The default type is readable and the default protocol is gpfdist if not specified. NOCREATEEXTTABLE (the default) denies the role the ability to create external tables. Using the file protocol when creating external tables is not supported. This is because HAWQ cannot guarantee scheduling executors on a specific host. Likewise, you cannot use the execute command with ON ALL and ON HOST for the same reason. Use the ON MASTER/ /SEGMENT to specify which segment instances are to execute the command.

1
2
3
4
5
6
INHERIT | NOINHERIT
LOGIN | NOLOGIN
CONNECTION LIMIT <connlimit>
PASSWORD ’<password>’
ENCRYPTED | UNENCRYPTED
VALID UNTIL ’<timestamp>

These clauses alter role attributes originally set by CREATE ROLE.

1
2
DENY <deny_point>
DENY BETWEEN <deny_point> AND <deny_point>

The DENY and DENY BETWEEN keywords set time-based constraints that are enforced at login. DENYsets a day or a day and time to deny access. DENY BETWEEN sets an interval during which access is denied. Both use the parameter that has following format:

1
DAY <day> [ TIME '<time>' ]

The two parts of the parameter use the following formats:

For :

1
2
{'Sunday' | 'Monday' | 'Tuesday' |'Wednesday' | 'Thursday' | 'Friday' |
'Saturday' | 0-6 }

For :

1
{ 00-23 : 00-59 | 01-12 : 00-59 { AM | PM }}

The DENY BETWEEN clause uses two parameters.

1
2
3
DENY BETWEEN <deny_point> AND <deny_point>

DROP DENY FOR <deny_point>

The DROP DENY FOR clause removes a time-based constraint from the role. It uses the parameter described above.

Notes

Use GRANT and REVOKE for adding and removing role memberships.

Caution must be exercised when specifying an unencrypted password with this command. The password will be transmitted to the server in clear text, and it might also be logged in the client’s command history or the server log. The psql command-line client contains a meta-command password that can be used to safely change a role’s password.

It is also possible to tie a session default to a specific database rather than to a role. Role-specific settings override database-specific ones if there is a conflict.

Examples

Change the password for a role:

1
ALTER ROLE daria WITH PASSWORD 'passwd123';

Change a password expiration date:

1
ALTER ROLE scott VALID UNTIL 'May 4 12:00:00 2015 +1';

Change a password expiration date:

1
ALTER ROLE scott VALID UNTIL 'May 4 12:00:00 2015 +1';

Make a password valid forever:

1
ALTER ROLE luke VALID UNTIL 'infinity';

Give a role the ability to create other roles and new databases:

1
ALTER ROLE joelle CREATEROLE CREATEDB;

Give a role a non-default setting of the maintenance_work_mem parameter:

1
ALTER ROLE admin SET maintenance_work_mem = 100000;

Assign a role to a resource queue:

1
ALTER ROLE sammy RESOURCE QUEUE poweruser;

Give a role permission to create writable external tables:

1
ALTER ROLE load CREATEEXTTABLE (type='writable');

Alter a role so it does not allow login access on Sundays:

1
ALTER ROLE user3 DENY DAY 'Sunday';

Alter a role to remove the constraint that does not allow login access on Sundays:

1
ALTER ROLE user3 DROP DENY FOR DAY 'Sunday';

Compatibility

The ALTER ROLE statement is a HAWQ extension.

ALTER TABLE

Changes the definition of a table.

Synopsis

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
ALTER TABLE [ONLY] <name> RENAME [COLUMN] <column> TO <new_column>

ALTER TABLE <name> RENAME TO <new_name>

ALTER TABLE <name> SET SCHEMA <new_schema>

ALTER TABLE [ONLY] <name> SET
    DISTRIBUTED BY (<column>, [ ... ] )
    | DISTRIBUTED RANDOMLY
    | WITH (REORGANIZE=true|false)

ALTER TABLE [ONLY] <name>
            <action> [, ... ]

ALTER TABLE <name>
    [ ALTER PARTITION { <partition_name> | FOR (RANK(<number>))
    | FOR (<value>) } <partition_action> [...] ]
    <partition_action>

where <action> is one of:

    ADD [COLUMN] <column_name> <type>
        [ ENCODING ( <storage_directive> [,...] ) ]
        [<column_constraint> [ ... ]]
DROP [COLUMN] <column> [RESTRICT | CASCADE]
ALTER [COLUMN] <column> TYPE <type> [USING <expression>]
ALTER [COLUMN] <column> SET DEFAULT <expression>
ALTER [COLUMN] <column> DROP DEFAULT
ALTER [COLUMN] <column> { SET | DROP } NOT NULL
ALTER [COLUMN] <column> SET STATISTICS <integer>
ADD <table_constraint>
DROP CONSTRAINT <constraint_name> [RESTRICT | CASCADE]
SET WITHOUT OIDS
INHERIT <parent_table>
NO INHERIT <parent_table>
OWNER TO <new_owner>

where <partition_action> is one of:

    ALTER DEFAULT PARTITION
    DROP DEFAULT PARTITION [IF EXISTS]
    DROP PARTITION [IF EXISTS] { <partition_name> |
        FOR (RANK(<number>)) | FOR (<value>) } [CASCADE]
    TRUNCATE DEFAULT PARTITION
    TRUNCATE PARTITION { <partition_name> | FOR (RANK(<number>)) |
        FOR (<value>) }
    RENAME DEFAULT PARTITION TO <new_partition_name>
    RENAME PARTITION { <partition_name> | FOR (RANK(<number>)) |
        FOR (<value>) } TO <new_partition_name>
    ADD DEFAULT PARTITION <name> [ ( <subpartition_spec> ) ]
    ADD PARTITION <name>
            <partition_element>
        [ ( <subpartition_spec> ) ]
    EXCHANGE DEFAULT PARTITION WITH TABLE <table_name>
            [ WITH | WITHOUT VALIDATION ]
    EXCHANGE PARTITION { <partition_name> | FOR (RANK(<number>)) |
            FOR (<value>) } WITH TABLE <table_name>
            [ WITH | WITHOUT VALIDATION ]
    SET SUBPARTITION TEMPLATE (<subpartition_spec>)
    SPLIT DEFAULT PARTITION
        { AT (<list_value>)
        | START([<datatype>] <range_value>) [INCLUSIVE | EXCLUSIVE]
            END([<datatype>] <range_value>) [INCLUSIVE | EXCLUSIVE] }
        [ INTO ( PARTITION <new_partition_name>,
            PARTITION <default_partition_name> ) ]
        SPLIT PARTITION { <partition_name> | FOR (RANK(<number>)) |
            FOR (<value>) } AT (<value>)
            [ INTO (PARTITION <partition_name>, PARTITION <partition_name>)]
where <partition_element> is:

    VALUES (<list_value> [,...] )
| START ([<datatype>] '<start_value>') [INCLUSIVE | EXCLUSIVE]
    [ END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE] ]
| END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE]
[ WITH ( <partition_storage_parameter>=<value> [, ... ] ) ]
[ TABLESPACE <tablespace> ]
where <subpartition_spec> is:

        <subpartition_element> [, ...]
and <subpartition_element> is:

    DEFAULT SUBPARTITION <subpartition_name>
    | [SUBPARTITION <subpartition_name>] VALUES (<list_value> [,...] )
    | [SUBPARTITION <subpartition_name>]
        START ([<datatype>] '<start_value>') [INCLUSIVE | EXCLUSIVE]
        [ END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE] ]
        [ EVERY ( [<number> | <datatype>] '<interval_value>') ]
    | [SUBPARTITION <subpartition_name>]
        END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE]
        [ EVERY ( [<number> | <datatype>] '<interval_value>') ]
    [ WITH ( <partition_storage_parameter>=<value> [, ... ] ) ]
    [ TABLESPACE <tablespace> ]
where <storage_parameter> is:

    APPENDONLY={TRUE}
    BLOCKSIZE={8192-2097152}
    ORIENTATION={ROW | PARQUET}
    COMPRESSTYPE={ZLIB|SNAPPY|GZIP|NONE}
    COMPRESSLEVEL={0-9}
    FILLFACTOR={10-100}
    OIDS[=TRUE|FALSE]
where <storage_directive> is:

    COMPRESSTYPE={ZLIB|SNAPPY|GZIP|NONE}
    | COMPRESSLEVEL={0-9}
    | BLOCKSIZE={8192-2097152}
where <column_reference_storage_directive> is:

    COLUMN <column_name> ENCODING ( <storage_directive> [, ... ] ), ...
    | DEFAULT COLUMN ENCODING ( <storage_directive> [, ... ] )

Note: When using multi-level partition designs, the following operations are not supported with ALTER TABLE:

  • ADD DEFAULT PARTITION
  • ADD PARTITION
  • DROP DEFAULT PARTITION
  • DROP PARTITION
  • SPLIT PARTITION

All operations that involve modifying subpartitions.

Limitations

HAWQ does not support using ALTER TABLE to ADD or DROP a column in an existing Parquet table.

Parameters

1
ONLY

Only perform the operation on the table name specified. If the ONLY keyword is not used, the operation will be performed on the named table and any child table partitions associated with that table.

1
<name>

The name (possibly schema-qualified) of an existing table to alter. If ONLY is specified, only that table is altered. If ONLY is not specified, the table and all its descendant tables (if any) are updated.

Note: Constraints can only be added to an entire table, not to a partition. Because of that restriction, the parameter can only contain a table name, not a partition name.

1
<column>

Name of a new or existing column. Note that HAWQ distribution key columns must be treated with special care. Altering or dropping these columns can change the distribution policy for the table.

1
<new_column>

New name for an existing column.

1
<new_name>

New name for the table.

1
<type>

Data type of the new column, or new data type for an existing column. If changing the data type of a HAWQ distribution key column, you are only allowed to change it to a compatible type (for example, text to varchar is OK, but text to int is not).

1
<table_constraint>

New table constraint for the table. Note that foreign key constraints are currently not supported in HAWQ. Also a table is only allowed one unique constraint and the uniqueness must be within the HAWQ distribution key.

1
<constraint_name>

Name of an existing constraint to drop.

1
CASCADE

Automatically drop objects that depend on the dropped column or constraint (for example, views referencing the column).

1
RESTRICT

Refuse to drop the column or constraint if there are any dependent objects. This is the default behavior.

1
ALL

Disable or enable all triggers belonging to the table including constraint related triggers. This requires superuser privilege.

1
USER

Disable or enable all user-created triggers belonging to the table.

1
DISTRIBUTED RANDOMLY | DISTRIBUTED BY (<column>)

Specifies the distribution policy for a table. The default is RANDOM distribution. Changing a distribution policy will cause the table data to be physically redistributed on disk, which can be resource intensive. If you declare the same distribution policy or change from random to hash distribution, data will not be redistributed unless you declare SET WITH (REORGANIZE=true).

1
REORGANIZE=true|false

Use REORGANIZE=true when the distribution policy has not changed or when you have changed from a random to a hash distribution, and you want to redistribute the data anyways.

1
<parent_table>

A parent table to associate or de-associate with this table.

1
<new_owner>

The role name of the new owner of the table.

1
<new_tablespace>

The name of the tablespace to which the table will be moved.

1
<new_schema>

The name of the schema to which the table will be moved.

1
<parent_table_name>

When altering a partitioned table, the name of the top-level parent table.

1
ALTER [DEFAULT] PARTITION

If altering a partition deeper than the first level of partitions, the ALTER PARTITION clause is used to specify which subpartition in the hierarchy you want to alter.

1
DROP [DEFAULT] PARTITION

Note: Cannot be used with multi-level partitions.

Drops the specified partition. If the partition has subpartitions, the subpartitions are automatically dropped as well.

1
TRUNCATE [DEFAULT] PARTITION

Truncates the specified partition. If the partition has subpartitions, the subpartitions are automatically truncated as well.

1
RENAME [DEFAULT] PARTITION

Changes the partition name of a partition (not the relation name). Partitioned tables are created using the naming convention: __prt_.

1
ADD DEFAULT PARTITION

Note: Cannot be used with multi-level partitions.

Adds a default partition to an existing partition design. When data does not match to an existing partition, it is inserted into the default partition. Partition designs that do not have a default partition will reject incoming rows that do not match to an existing partition. Default partitions must be given a name.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ADD PARTITION

Note: Cannot be used with multi-level partitions.
<partition_element> - Using the existing partition type of the table (range or list), defines the boundaries of new partition you are adding.

<name> - A name for this new partition.

VALUES - For list partitions, defines the value(s) that the partition will contain.

START - For range partitions, defines the starting range value for the partition. By default, start values are INCLUSIVE. For example, if you declared a start date of '2008-01-01', then the partition would contain all dates greater than or equal to '2008-01-01'. Typically the data type of the START expression is the same type as the partition key column. If that is not the case, then you must explicitly cast to the intended data type.

END - For range partitions, defines the ending range value for the partition. By default, end values are EXCLUSIVE. For example, if you declared an end date of '2008-02-01', then the partition would contain all dates less than but not equal to '2008-02-01'. Typically the data type of the END expression is the same type as the partition key column. If that is not the case, then you must explicitly cast to the intended data type.

WITH - Sets the table storage options for a partition. For example, you may want older partitions to be append-only tables and newer partitions to be regular heap tables. See CREATE TABLE for a description of the storage options.

TABLESPACE - The name of the tablespace in which the partition is to be created.

<subpartition_spec> - Only allowed on partition designs that were created without a subpartition template. Declares a subpartition specification for the new partition you are adding. If the partitioned table was originally defined using a subpartition template, then the template will be used to generate the subpartitions automatically.
1
2
3
4
5
EXCHANGE [DEFAULT] PARTITION
Exchanges another table into the partition hierarchy into the place of an existing partition. In a multi-level partition design, you can only exchange the lowest level partitions (those that contain data).
WITH TABLE <table_name> - The name of the table you are swapping in to the partition design.

WITH | WITHOUT VALIDATION - Validates that the data in the table matches the CHECK constraint of the partition you are exchanging. The default is to validate the data against the CHECK constraint.
1
SET SUBPARTITION TEMPLATE

Modifies the subpartition template for an existing partition. After a new subpartition template is set, all new partitions added will have the new subpartition design (existing partitions are not modified).

1
SPLIT DEFAULT PARTITION

Note: Cannot be used with multi-level partitions. Splits a default partition. In a multi-level partition design, you can only split the lowest level default partitions (those that contain data). Splitting a default partition creates a new partition containing the values specified and leaves the default partition containing any values that do not match to an existing partition.

AT - For list partitioned tables, specifies a single list value that should be used as the criteria for the split.

START - For range partitioned tables, specifies a starting value for the new partition.

END - For range partitioned tables, specifies an ending value for the new partition.

INTO - Allows you to specify a name for the new partition. When using the INTO clause to split a default partition, the second partition name specified should always be that of the existing default partition. If you do not know the name of the default partition, you can look it up using the pg_partitions view.

1
SPLIT PARTITION

Note: Cannot be used with multi-level partitions. Splits an existing partition into two partitions. In a multi-level partition design, you can only split the lowest level partitions (those that contain data).

AT - Specifies a single value that should be used as the criteria for the split. The partition will be divided into two new partitions with the split value specified being the starting range for the latter partition.

INTO - Allows you to specify names for the two new partitions created by the split.

1
<partition_name>

The given name of a partition.

1
FOR (RANK(<number>))

For range partitions, the rank of the partition in the range.

1
2
3
4
5
FOR (’<value>’)

Specifies a partition by declaring a value that falls within the partition boundary specification. If the value declared with FOR matches to both a partition and one of its subpartitions (for example, if the value is a date and the table is partitioned by month and then by day), then FOR will operate on the first level where a match is found (for example, the monthly partition). If your intent is to operate on a subpartition, you must declare so as follows:

ALTER TABLE name ALTER PARTITION FOR ('2008-10-01') DROP PARTITION FOR ('2008-10-01');

Notes

Take special care when altering or dropping columns that are part of the HAWQ distribution key as this can change the distribution policy for the table. HAWQ does not currently support foreign key constraints.

Note: Note: The table name specified in the ALTER TABLE command cannot be the name of a partition within a table.

Adding a CHECK or NOT NULL constraint requires scanning the table to verify that existing rows meet the constraint.

When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column’s default value (NULL if no DEFAULT clause is specified). Adding a column with a non-null default or changing the type of an existing column will require the entire table to be rewritten. This may take a significant amount of time for a large table; and it will temporarily require double the disk space.

You can specify multiple changes in a single ALTER TABLE command, which will be done in a single pass over the table.

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed.

The fact that ALTER TYPE requires rewriting the whole table is sometimes an advantage, because the rewriting process eliminates any dead space in the table. For example, to reclaim the space occupied by a dropped column immediately, the fastest way is: ALTER TABLE <table> ALTER COLUMN TYPE ; Where is any remaining table column and is the same type that column already has. This results in no semantically-visible change in the table, but the command forces rewriting, which gets rid of no-longer-useful data.

If a table is partitioned or has any descendant tables, it is not permitted to add, rename, or change the type of a column in the parent table without doing the same to the descendants. This ensures that the descendants always have columns matching the parent.

A recursive DROP COLUMN operation will remove a descendant table’s column only if the descendant does not inherit that column from any other parents and never had an independent definition of the column. A nonrecursive DROP COLUMN (ALTER TABLE ONLY … DROP COLUMN) never removes any descendant columns, but instead marks them as independently defined rather than inherited. The OWNER action never recurse to descendant tables; that is, they always act as though ONLY were specified. Adding a constraint can recurse only for CHECK constraints.

Changing any part of a system catalog table is not permitted.

Examples Add a column to a table:

1
ALTER TABLE distributors ADD COLUMN address varchar(30);

Rename an existing column:

1
ALTER TABLE distributors RENAME COLUMN address TO city;

Rename an existing table:

1
ALTER TABLE distributors RENAME TO suppliers;

Add a not-null constraint to a column:

1
ALTER TABLE distributors ALTER COLUMN street SET NOT NULL;

Add a check constraint to a table:

1
ALTER TABLE distributors ADD CONSTRAINT zipchk CHECK (char_length(zipcode) = 5);

Move a table to a different schema:

1
ALTER TABLE myschema.distributors SET SCHEMA yourschema;

Add a new partition to a partitioned table:

1
2
3
ALTER TABLE sales ADD PARTITION
    START (date '2009-02-01') INCLUSIVE
    END (date '2009-03-01') EXCLUSIVE;

Add a default partition to an existing partition design:

1
ALTER TABLE sales ADD DEFAULT PARTITION other;

Rename a partition:

1
ALTER TABLE sales RENAME PARTITION FOR ('2008-01-01') TO jan08;

Drop the first (oldest) partition in a range sequence:

1
ALTER TABLE sales DROP PARTITION FOR (RANK(1));

Exchange a table into your partition design:

1
ALTER TABLE sales EXCHANGE PARTITION FOR ('2008-01-01') WITH TABLE jan08;

Split the default partition (where the existing default partition’s name is other) to add a new monthly partition for January 2009:

1
2
3
4
ALTER TABLE sales SPLIT DEFAULT PARTITION
    START ('2009-01-01') INCLUSIVE
    END ('2009-02-01') EXCLUSIVE
    INTO (PARTITION jan09, PARTITION other);

Split a monthly partition into two with the first partition containing dates January 1-15 and the second partition containing dates January 16-31:

1
2
3
ALTER TABLE sales SPLIT PARTITION FOR ('2008-01-01')
    AT ('2008-01-16')
    INTO (PARTITION jan081to15, PARTITION jan0816to31);

Compatibility

The ADD, DROP, and SET DEFAULT forms conform with the SQL standard. The other forms are HAWQ extensions of the SQL standard. Also, the ability to specify more than one manipulation in a single ALTER TABLE command is an extension. ALTER TABLE DROP COLUMN can be used to drop the only column of a table, leaving a zero-column table. This is an extension of SQL, which disallows zero-column tables.

ALTER TABLESPACE

Changes the definition of a tablespace.

Synopsis

1
2
3
ALTER TABLESPACE <name> RENAME TO <newname>

ALTER TABLESPACE <name> OWNER TO <newowner>

Description

ALTER TABLESPACE changes the definition of a tablespace.

You must own the tablespace to use ALTER TABLESPACE. To alter the owner, you must also be a direct or indirect member of the new owning role. (Note that superusers have these privileges automatically.)

Parameters

1
<name>

The name of an existing tablespace.

1
<newname>

The new name of the tablespace. The new name cannot begin with pg_ (reserved for system tablespaces).

1
<newowner>

The new owner of the tablespace.

Examples Rename tablespace index_space to fast_raid:

1
ALTER TABLESPACE index_space RENAME TO fast_raid;

Change the owner of tablespace index_space:

1
ALTER TABLESPACE index_space OWNER TO mary;

Compatibility

There is no ALTER TABLESPACE statement in the SQL standard.

ALTER TYPE

Changes the definition of a data type.

Synopsis

1
2
ALTER TYPE <name>
    OWNER TO <new_owner> | SET SCHEMA <new_schema>

Description

ALTER TYPE changes the definition of an existing type. You can change the owner and the schema of a type.

You must own the type to use ALTER TYPE. To change the schema of a type, you must also have CREATE privilege on the new schema. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the type’s schema. (These restrictions enforce that altering the owner does not do anything that could be done by dropping and recreating the type. However, a superuser can alter ownership of any type.)

Parameters

1
<name>

The name (optionally schema-qualified) of an existing type to alter.

1
<new_owner>

The user name of the new owner of the type.

1
<new_schema>

The new schema for the type.

Examples

To change the owner of the user-defined type email to joe:

1
ALTER TYPE email OWNER TO joe;

To change the schema of the user-defined type email to customers:

1
ALTER TYPE email SET SCHEMA customers;

Compatibility

There is no ALTER TYPE statement in the SQL standard.

ALTER USER

Changes the definition of a database role (user).

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ALTER USER <name> RENAME TO <newname>

ALTER USER <name> SET <config_parameter> {TO | =} {<value> | DEFAULT}

ALTER USER <name> RESET <config_parameter>

ALTER USER <name> [ [WITH] <option> [ ... ] ]

where <option> can be:

     SUPERUSER | NOSUPERUSER
    | CREATEDB | NOCREATEDB
    | CREATEROLE | NOCREATEROLE
    | CREATEUSER | NOCREATEUSER
    | INHERIT | NOINHERIT
    | LOGIN | NOLOGIN
    | [ ENCRYPTED | UNENCRYPTED ] PASSWORD '<password>'
    | VALID UNTIL '<timestamp>'

Description

ALTER USER is a deprecated command but is still accepted for historical reasons. It is an alias for ALTER ROLE. See ALTER ROLE for more information.

Compatibility

The ALTER USER statement is a HAWQ extension. The SQL standard leaves the definition of users to the implementation.

ANALYZE

Collects statistics about a database.

Synopsis

1
ANALYZE [VERBOSE] [ROOTPARTITION] <table> [ (<column> [, ...] ) ]]

Description

ANALYZE collects statistics about the contents of tables in the database, and stores the results in the system table pg_statistic. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

With no parameter, ANALYZE examines every table in the current database. With a parameter, ANALYZE examines only that table. It is further possible to give a list of column names, in which case only the statistics for those columns are collected.

Parameters

1
VERBOSE

Enables display of progress messages. When specified, ANALYZE emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well.

1
ROOTPARTITION

For partitioned tables, ANALYZE on the parent (the root in multi-level partitioning) table without this option will collect statistics on each individual leaf partition as well as the global partition table, both of which are needed for query planning. In scenarios when all the individual child partitions have up-to-date statistics (for example, after loading and analyzing a daily partition), the ROOTPARTITION option can be used to collect only the global stats on the partition table. This could save the time of re-analyzing each individual leaf partition.

If you use ROOTPARTITION on a non-root or non-partitioned table, ANALYZE will skip the option and issue a warning. You can also analyze all root partition tables in the database by using ROOTPARTITION ALL

Note: Use ROOTPARTITION ALL to analyze all root partition tables in the database.

1
<table>

The name (possibly schema-qualified) of a specific table to analyze. Defaults to all tables in the current database.

1
column

The name of a specific column to analyze. Defaults to all columns.

Notes

It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the query planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy is to run VACUUM and ANALYZE once a day during a low-usage time of day.

ANALYZE requires only a read lock on the target table, so it can run in parallel with other activity on the table.

ANALYZE skips tables if the user is not the table owner or database owner.

The statistics collected by ANALYZE usually include a list of some of the most common values in each column and a histogram showing the approximate data distribution in each column. One or both of these may be omitted if ANALYZE deems them uninteresting (for example, in a unique-key column, there are no common values) or if the column data type does not support the appropriate operators.

For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row. This allows even very large tables to be analyzed in a small amount of time. Note, however, that the statistics are only approximate, and will change slightly each time ANALYZE is run, even if the actual table contents did not change. This may result in small changes in the planner’s estimated costs shown by EXPLAIN. In rare situations, this non-determinism will cause the query optimizer to choose a different query plan between runs of ANALYZE. To avoid this, raise the amount of statistics collected by ANALYZE by adjusting the default_statistics_target configuration parameter, or on a column-by-column basis by setting the per-column statistics target with ALTER TABLE … ALTER COLUMN … SET STATISTICS (see ALTER TABLE). The target value sets the maximum number of entries in the most-common-value list and the maximum number of bins in the histogram. The default target value is 25, but this can be adjusted up or down to trade off accuracy of planner estimates against the time taken for ANALYZE and the amount of space occupied in pg_statistic. In particular, setting the statistics target to zero disables collection of statistics for that column. It may be useful to do that for columns that are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner will have no use for statistics on such columns.

The largest statistics target among the columns being analyzed determines the number of table rows sampled to prepare the statistics. Increasing the target causes a proportional increase in the time and space needed to do ANALYZE.

Examples

Collect statistics for the table mytable: s .. code-block:: html

linenos:

ANALYZE mytable;

Compatibility

There is no ANALYZE statement in the SQL standard.

BEGIN

Starts a transaction block.

Synopsis

1
2
BEGIN [WORK | TRANSACTION] [SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED]
      [READ WRITE | READ ONLY]

Description

BEGIN initiates a transaction block, that is, all statements after a BEGIN command will be executed in a single transaction until an explicit COMMIT or ROLLBACK is given. By default (without BEGIN), HAWQ executes transactions in autocommit mode, that is, each statement is executed in its own transaction and a commit is implicitly performed at the end of the statement (if execution was successful, otherwise a rollback is done).

Statements are executed more quickly in a transaction block, because transaction start/commit requires significant CPU and disk activity. Execution of multiple statements inside a transaction is also useful to ensure consistency when making several related changes: other sessions will be unable to see the intermediate states wherein not all the related updates have been done.

Parameters

1
2
3
4
5
6
7
8
9
WORK
TRANSACTION
```
Optional key words. They have no effect.
SERIALIZABLE
REPEATABLE READ
READ COMMITTED
READ UNCOMMITTED
```

The SQL standard defines four transaction isolation levels: READ COMMITTED, READ UNCOMMITTED, SERIALIZABLE, and REPEATABLE READ. The default behavior is that a statement can only see rows committed before it began (READ COMMITTED). In HAWQ, READ UNCOMMITTED is treated the same as READ COMMITTED. SERIALIZABLE is supported the same as REPEATABLE READ wherein all statements of the current transaction can only see rows committed before the first statement was executed in the transaction. SERIALIZABLE is the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. Applications using this level must be prepared to retry transactions due to serialization failures.

1
2
READ WRITE
READ ONLY

Determines whether the transaction is read/write or read-only. Read/write is the default. When a transaction is read-only, the following SQL commands are disallowed: INSERT and COPY FROM if the table they would write to is not a temporary table; all CREATE, ALTER, and DROP commands; GRANT, REVOKE, TRUNCATE; and EXPLAIN ANALYZE and EXECUTE if the command they would execute is among those listed.

Notes

Use COMMIT or ROLLBACK to terminate a transaction block.

Issuing BEGIN when already inside a transaction block will provoke a warning message. The state of the transaction is not affected. To nest transactions within a transaction block, use savepoints (see SAVEPOINT).

Examples

To begin a transaction block:

1
BEGIN;

Compatibility

BEGIN is a HAWQ language extension. It is equivalent to the SQL-standard command START TRANSACTION lang=”EN”.

Incidentally, the BEGIN key word is used for a different purpose in embedded SQL. You are advised to be careful about the transaction semantics when porting database applications.

CHECKPOINT

Forces a transaction log checkpoint.

1
Synopsis

Description

Write-Ahead Logging (WAL) puts a checkpoint in the transaction log every so often. The automatic checkpoint interval is set per HAWQ segment instance by the server configuration parameters checkpoint_segments and checkpoint_timeout. The CHECKPOINT command forces an immediate checkpoint when the command is issued, without waiting for a scheduled checkpoint.

A checkpoint is a point in the transaction log sequence at which all data files have been updated to reflect the information in the log. All data files will be flushed to disk.

Only superusers may call CHECKPOINT. The command is not intended for use during normal operation.

Compatibility

The CHECKPOINT command is a HAWQ language extension.

CLOSE

Closes a cursor.

Synopsis

1
CLOSE <cursor_name>

Description

CLOSE frees the resources associated with an open cursor. After the cursor is closed, no subsequent operations are allowed on it. A cursor should be closed when it is no longer needed.

Every non-holdable open cursor is implicitly closed when a transaction is terminated by COMMIT or ROLLBACK. A holdable cursor is implicitly closed if the transaction that created it aborts via ROLLBACK. If the creating transaction successfully commits, the holdable cursor remains open until an explicit CLOSE is executed, or the client disconnects.

Parameters

1
<cursor_name>

The name of an open cursor to close.

Notes

HAWQ does not have an explicit OPEN cursor statement. A cursor is considered open when it is declared. Use the DECLARE statement to declare (and open) a cursor.

You can see all available cursors by querying the pg_cursors system view.

Examples

Close the cursor portala:

1
CLOSE portala;

Compatibility

CLOSE is fully conforming with the SQL standard.

COMMIT

Commits the current transaction.

Synopsis

1
COMMIT [WORK | TRANSACTION]

Description

COMMIT commits the current transaction. All changes made by the transaction become visible to others and are guaranteed to be durable if a crash occurs.

Parameters

1
2
WORK
TRANSACTION

Optional key words. They have no effect.

Notes

Use ROLLBACK to abort a transaction.

Issuing COMMIT when not inside a transaction does no harm, but it will provoke a warning message.

Examples

To commit the current transaction and make all changes permanent:

1
COMMIT;

Compatibility

The SQL standard only specifies the two forms COMMIT and COMMIT WORK. Otherwise, this command is fully conforming.

COPY

Copies data between a file and a table.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
COPY <table> [(<column> [, ...])] FROM {'<file>' | STDIN}
     [ [WITH]
       [OIDS]
       [HEADER]
       [DELIMITER [ AS ] '<delimiter>']
       [NULL [ AS ] '<null string>']
       [ESCAPE [ AS ] '<escape>' | 'OFF']
       [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
       [CSV [QUOTE [ AS ] '<quote>']
            [FORCE NOT NULL <column> [, ...]]
       [FILL MISSING FIELDS]
       [[LOG ERRORS INTO <error_table> [KEEP]
       SEGMENT REJECT LIMIT <count> [ROWS | PERCENT] ]

COPY {<table> [(<column> [, ...])] | (<query>)} TO {'<file>' | STDOUT}
     [ [WITH]
       [OIDS]
       [HEADER]
       [DELIMITER [ AS ] '<delimiter>']
       [NULL [ AS ] '<null string>']
       [ESCAPE [ AS ] '<escape>' | 'OFF']
       [CSV [QUOTE [ AS ] '<quote>']
            [FORCE QUOTE <column> [, ...]] ]

Description

COPY moves data between HAWQ tables and standard file-system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already). COPY TO can also copy the results of a SELECT query.

If a list of columns is specified, COPY will only copy the data in the specified columns to or from the file. If there are any columns in the table that are not in the column list, COPY FROM will insert the default values for those columns.

COPY with a file name instructs the HAWQ master host to directly read from or write to a file. The file must be accessible to the master host and the name must be specified from the viewpoint of the master host. When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the master.

If SEGMENT REJECT LIMIT is used, then a COPY FROM operation will operate in single row error isolation mode. In this release, single row error isolation mode only applies to rows in the input file with format errors — for example, extra or missing attributes, attributes of a wrong data type, or invalid client encoding sequences. Constraint errors such as violation of a NOT NULL, CHECK, or UNIQUE constraint will still be handled in ‘all-or-nothing’ input mode. The user can specify the number of error rows acceptable (on a per-segment basis), after which the entire COPY FROM operation will be aborted and no rows will be loaded. Note that the count of error rows is per-segment, not per entire load operation. If the per-segment reject limit is not reached, then all rows not containing an error will be loaded. If the limit is not reached, all good rows will be loaded and any error rows discarded. If you would like to keep error rows for further examination, you can optionally declare an error table using the LOG ERRORS INTO clause. Any rows containing a format error would then be logged to the specified error table.

Outputs

On successful completion, a COPY command returns a command tag of the form, where is the number of rows copied:

1
COPY <count>

If running a COPY FROM command in single row error isolation mode, the following notice message will be returned if any rows were not loaded due to format errors, where is the number of rows rejected:

1
NOTICE: Rejected <count> badly formatted rows.

Parameters

1
<table>

The name (optionally schema-qualified) of an existing table.

1
<column>

An optional list of columns to be copied. If no column list is specified, all columns of the table will be copied.

1
<query>

A SELECT or VALUES command whose results are to be copied. Note that parentheses are required around the query.

1
<file>

The absolute path name of the input or output file.

1
STDIN

Specifies that input comes from the client application.

1
STDOUT

Specifies that output goes to the client application.

1
OIDS

Specifies copying the OID for each row. (An error is raised if OIDS is specified for a table that does not have OIDs, or in the case of copying a query.)

1
<delimiter>

The delimiter that separates columns within each row (line) of the file. The default is a tab character in text mode, a comma in CSV mode.

Before Version 2.2.0, only single ASCII character delimiter is supported. Version 2.2.0 adds support for non-ASCII character (Such as Chinese) and multiple characters delimiter.

1
<null string>

The string that represents a null value. The default is N (backslash-N) in text mode, and a empty value with no quotes in CSV mode. You might prefer an empty string even in text mode for cases where you don’t want to distinguish nulls from empty strings. When using COPY FROM, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with COPY TO.

1
<escape>

Specifies the single character that is used for C escape sequences (such as n,t,100, and so on) and for quoting data characters that might otherwise be taken as row or column delimiters. Make sure to choose an escape character that is not used anywhere in your actual column data. The default escape character is (backslash) for text files or “ (double quote) for CSV files, however it is possible to specify any other character to represent an escape. It is also possible to disable escaping on text-formatted files by specifying the value ’OFF’ as the escape value. This is very useful for data such as web log data that has many embedded backslashes that are not intended to be escapes.

1
NEWLINE

Specifies the newline used in your data files — LF (Line feed, 0x0A), CR (Carriage return, 0x0D), or CRLF (Carriage return plus line feed, 0x0D 0x0A). If not specified, a HAWQ segment will detect the newline type by looking at the first row of data it receives and using the first newline type encountered.

1
2
CSV
Selects Comma Separated Value (CSV) mode.
1
HEADER

Specifies that a file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored.

1
<quote>

Specifies the quotation character in CSV mode. The default is double-quote.

1
FORCE QUOTE

In CSV COPY TO mode, forces quoting to be used for all non-NULL values in each specified column. NULL output is never quoted.

1
FORCE NOT NULL

In CSV COPY FROM mode, process each specified column as though it were quoted and hence not a NULL value. For the default null string in CSV mode (nothing between two delimiters), this causes missing values to be evaluated as zero-length strings.

1
FILL MISSING FIELDS

In COPY FROM more for both TEXT and CSV, specifying FILL MISSING FIELDS will set missing trailing field values to NULL (instead of reporting an error) when a row of data has missing data fields at the end of a line or row. Blank rows, fields with a NOT NULL constraint, and trailing delimiters on a line will still report an error.

1
LOG ERRORS INTO <error_table> [KEEP]

This is an optional clause that can precede a SEGMENT REJECT LIMIT clause to log information about rows with formatting errors. The INTO clause specifies an error table where rows with formatting errors will be logged when running in single row error isolation mode. You can then examine this error table to see error rows that were not loaded (if any). If the specified already exists, it will be used. If it does not exist, it will be automatically generated. If the command auto-generates the error table and no errors are produced, the default is to drop the error table after the operation completes unless KEEP is specified. If the table is auto-generated and the error limit is exceeded, the entire transaction is rolled back and no error data is saved. If you want the error table to persist in this case, create the error table prior to running the COPY. An error table is defined as follows: “

1
pre CREATE TABLE ( cmdtime timestamptz, relname text, filename text, linenum int, bytenum int, errmsg text, rawdata text, rawbytes bytea ) DISTRIBUTED RANDOMLY; ”
1
SEGMENT REJECT LIMIT <count> [ROWS | PERCENT]

Runs a COPY FROM operation in single row error isolation mode. If the input rows have format errors they will be discarded provided that the reject limit count is not reached on any HAWQ segment instance during the load operation. The reject limit count can be specified as number of rows (the default) or percentage of total rows (1-100). If PERCENT is used, each segment starts calculating the bad row percentage only after the number of rows specified by the parameter gp_reject_percent_threshold has been processed. The default for gp_reject_percent_threshold is 300 rows. Constraint errors such as violation of a NOT NULL or CHECK constraint will still be handled in ‘all-or-nothing’ input mode. If the limit is not reached, all good rows will be loaded and any error rows discarded.

Notes

COPY can only be used with tables, not with views. However, you can write COPY (SELECT * FROM viewname) TO …

The BINARY key word causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the normal text mode, but a binary-format file is less portable across machine architectures and HAWQ versions. Also, you cannot run COPY FROM in single row error isolation mode if the data is in binary format.

You must have SELECT privilege on the table whose values are read by COPY TO, and insert privilege on the table into which values are inserted by COPY FROM.

Files named in a COPY command are read or written directly by the database server, not by the client application. Therefore, they must reside on or be accessible to the HAWQ master host machine, not the client. They must be accessible to and readable or writable by the HAWQ system user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.

COPY FROM will invoke any check constraints on the destination table. However, it will not invoke rewrite rules. Note that in this release, violations of constraints are not evaluated for single row error isolation mode.

COPY input and output is affected by DateStyle. To ensure portability to other HAWQ installations that might use non-default DateStyle settings, DateStyle should be set to ISO before using COPY TO.

By default, COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This may amount to a considerable amount of wasted disk space if the failure happened well into a large COPY FROM operation. You may wish to use single row error isolation mode to filter out error rows while still loading good rows.

COPY supports creating readable foreign tables with error tables. The default for concurrently inserting into the error table is 127. You can use error tables with foreign tables under the following circumstances:

  • Multiple foreign tables can use different error tables
  • Multiple foreign tables cannot use the same error table

File Format

File formats supported by COPY.

Text Format

When COPY is used without the BINARY or CSV options, the data read or written is a text file with one line per table row. Columns in a row are separated by the character (tab by default). The column values themselves are strings generated by the output function, or acceptable to the input function, of each attribute’s data type. The specified null string is used in place of columns that are null. COPY FROM will raise an error if any line of the input file contains more or fewer columns than are expected. If OIDS is specified, the OID is read or written as the first column, preceding the user data columns.

The data file has two reserved characters that have special meaning to COPY:

  • The designated delimiter character (tab by default), which is used to separate fields in the data file.
  • A UNIX-style line feed (n or 0x0a), which is used to designate a new row in the data file. It is strongly recommended that applications generating COPY data convert data line feeds to UNIX-style line feeds rather than Microsoft Windows style carriage return line feeds (rn or 0x0a 0x0d).

If your data contains either of these characters, you must escape the character so COPY treats it as data and not as a field separator or new row.

By default, the escape character is a (backslash) for text-formatted files and a “ (double quote) for csv-formatted files. If you want to use a different escape character, you can do so using the ESCAPE ASclause. Make sure to choose an escape character that is not used anywhere in your data file as an actual data value. You can also disable escaping in text-formatted files by using ESCAPE ‘OFF’.

For example, suppose you have a table with three columns and you want to load the following three fields using COPY.

1
2
3
 percentage sign = %
 vertical bar = |
 backslash = \
1
 Your designated <delimiter> character is | (pipe character), and your designated <escape> character is * (asterisk). The formatted row in your data file would look like this:
1
percentage sign = % | vertical bar = *| | backslash = \

Notice how the pipe character that is part of the data has been escaped using the asterisk character (*). Also notice that we do not need to escape the backslash since we are using an alternative escape character.

The following characters must be preceded by the escape character if they appear as part of a column value: the escape character itself, newline, carriage return, and the current delimiter character. You can specify a different escape character using the ESCAPE AS clause.

CSV Format

This format is used for importing and exporting the Comma Separated Value (CSV) file format used by many other programs, such as spreadsheets. Instead of the escaping used by HAWQ standard text mode, it produces and recognizes the common CSV escaping mechanism.

The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the ESCAPE character (which is double quote by default), the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character. You can also use FORCE QUOTE to force quotes when outputting non-NULL values in specific columns.

The CSV format has no standard way to distinguish a NULL value from an empty string. HAWQ COPY handles this by quoting. A NULL is output as the NULL string and is not quoted, while a data value matching the NULL string is quoted. Therefore, using the default settings, a NULL is written as an unquoted empty string, while an empty string is written with double quotes (“”). Reading values follows similar rules. You can use FORCE NOT NULL to prevent NULL input comparisons for specific columns.

Because backslash is not a special character in the CSV format, ., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a . data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker. If you are loading a file created by another application that has a single unquoted column and might have a value of ., you might need to quote that value in the input file.

Note: In CSV mode, all characters are significant. A quoted value surrounded by white space, or any characters other than DELIMITER, will include those characters. This can cause errors if you import data from a system that pads CSV lines with white space out to some fixed width. If such a situation arises you might need to preprocess the CSV file to remove the trailing white space, before importing the data into HAWQ.

Note: CSV mode will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-mode files.

Binary Format

The BINARY format consists of a file header, zero or more tuples containing the row data, and a file trailer. Headers and data are in network byte order.

  • File Header — The file header consists of 15 bytes of fixed fields, followed by a variable-length header extension area. The fixed fields are:

    • Signature — 11-byte sequence PGCOPYn377rn0 — note that the zero byte is a required part of the signature. (The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean transfer. This signature will be changed by end-of-line-translation filters, dropped zero bytes, dropped high bits, or parity changes.)
    • Flags field — 32-bit integer bit mask to denote important aspects of the file format. Bits are numbered from 0 (LSB) to 31 (MSB). Note that this field is stored in network byte order (most significant byte first), as are all the integer fields used in the file format. Bits 16-31 are reserved to denote critical file format issues; a reader should abort if it finds an unexpected bit set in this range. Bits 0-15 are reserved to signal backwards-compatible format issues; a reader should simply ignore any unexpected bits set in this range. Currently only one flag is defined, and the rest must be zero (Bit 16: 1 if data has OIDs, 0 if not).
    • Header extension area length — 32-bit integer, length in bytes of remainder of header, not including self. Currently, this is zero, and the first tuple follows immediately. Future changes to the format might allow additional data to be present in the header. A reader should silently skip over any header extension data it does not know what to do with. The header extension area is envisioned to contain a sequence of self-identifying chunks. The flags field is not intended to tell readers what is in the extension area. Specific design of header extension contents is left for a later release.
  • Tuples — Each tuple begins with a 16-bit integer count of the number of fields in the tuple. (Presently, all tuples in a table will have the same count, but that might not always be true.) Then, repeated for each field in the tuple, there is a 32-bit length word followed by that many bytes of field data. (The length word does not include itself, and can be zero.) As a special case, -1 indicates a NULL field value. No value bytes follow in the NULL case. There is no alignment padding or any other extra data between fields. Presently, all data values in a COPY BINARY file are assumed to be in binary format (format code one). It is anticipated that a future extension may add a header field that allows per-column format codes to be specified. If OIDs are included in the file, the OID field immediately follows the field-count word. It is a normal field except that it’s not included in the field-count. In particular it has a length word — this will allow handling of 4-byte vs. 8-byte OIDs without too much pain, and will allow OIDs to be shown as null if that ever proves desirable.

  • File Trailer — The file trailer consists of a 16-bit integer word containing -1. This is easily distinguished from a tuple’s field-count word. A reader should report an error if a field-count word is neither -1 nor the expected number of columns. This provides an extra check against somehow getting out of sync with the data.

Examples

Copy a table to the client using the vertical bar (|) as the field delimiter:

1
COPY country TO STDOUT WITH DELIMITER '|';

Copy data from a file into the country table:

1
COPY country FROM '/home/usr1/sql/country_data';

Copy into a file just the countries whose names start with ‘A’:

1
2
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO
'/home/usr1/sql/a_list_countries.copy';

Create an error table called err_sales to use with single row error isolation mode:

1
2
CREATE TABLE err_sales ( cmdtime timestamptz, relname text,
filename text, linenum int, bytenum int, errmsg text, rawdata text, rawbytes bytea ) DISTRIBUTED RANDOMLY;

Copy data from a file into the sales table using single row error isolation mode:

1
2
COPY sales FROM '/home/usr1/sql/sales_data' LOG ERRORS INTO
err_sales SEGMENT REJECT LIMIT 10 ROWS;

Compatibility

There is no COPY statement in the SQL standard.

CREATE AGGREGATE

Defines a new aggregate function.

Synopsis

1
2
3
4
5
6
7
 CREATE [ORDERED] AGGREGATE <name> (<input_data_type> [ , ... ])
     ( SFUNC = <sfunc>,
         STYPE = <state_data_type>
         [, PREFUNC = <prefunc>]
         [, FINALFUNC = <ffunc>]
         [, INITCOND = <initial_condition>]
         [, SORTOP = <sort_operator>] )

Description

CREATE AGGREGATE defines a new aggregate function. Some basic and commonly-used aggregate functions such as count, min, max, sum, avg and so on are already provided in HAWQ. If one defines new types or needs an aggregate function not already provided, then CREATE AGGREGATE can be used to provide the desired features.

An aggregate function is identified by its name and input data types. Two aggregate functions in the same schema can have the same name if they operate on different input types. The name and input data types of an aggregate function must also be distinct from the name and input data types of every ordinary function in the same schema.

An aggregate function is made from one, two or three ordinary functions (all of which must be IMMUTABLE functions):

  • A state transition function <sfunc>
  • An optional preliminary segment-level calculation function <prefunc>
  • An optional final calculation function <ffunc>

These functions are used as follows:

1
2
3
 sfunc( internal-state, next-data-values ) ---> next-internal-state
 prefunc( internal-state, internal-state ) ---> next-internal-state
 ffunc( internal-state ) ---> aggregate-value

You can specify PREFUNC as method for optimizing aggregate execution. By specifying PREFUNC, the aggregate can be executed in parallel on segments first and then on the master. When a two-level execution is performed, SFUNC is executed on the segments to generate partial aggregate results, and PREFUNC is executed on the master to aggregate the partial results from segments. If single-level aggregation is performed, all the rows are sent to the master and is applied to the rows.

Single-level aggregation and two-level aggregation are equivalent execution strategies. Either type of aggregation can be implemented in a query plan. When you implement the functions and , you must ensure that the invocation of on the segment instances followed by on the master produce the same result as single-level aggregation that sends all the rows to the master and then applies only the to the rows.

HAWQ creates a temporary variable of data type to hold the current internal state of the aggregate function. At each input row, the aggregate argument values are calculated and the state transition function is invoked with the current state value and the new argument values to calculate a new internal state value. After all the rows have been processed, the final function is invoked once to calculate the aggregate return value. If there is no final function then the ending state value is returned as-is.

An aggregate function can provide an optional initial condition, an initial value for the internal state value. This is specified and stored in the database as a value of type text, but it must be a valid external representation of a constant of the state value data type. If it is not supplied then the state value starts out NULL.

If the state transition function is declared STRICT, then it cannot be called with NULL inputs. With such a transition function, aggregate execution behaves as follows. Rows with any null input values are ignored (the function is not called and the previous state value is retained). If the initial state value is NULL, then at the first row with all non-null input values, the first argument value replaces the state value, and the transition function is invoked at subsequent rows with all non-null input values. This is useful for implementing aggregates like max. Note that this behavior is only available when is the same as the first . When these types are different, you must supply a non-null initial condition or use a nonstrict transition function.

If the state transition function is not declared STRICT, then it will be called unconditionally at each input row, and must deal with NULL inputs and NULL transition values for itself. This allows the aggregate author to have full control over the aggregate handling of NULL values.

If the final function is declared STRICT, then it will not be called when the ending state value is NULL; instead a NULL result will be returned automatically. (This is the normal behavior of STRICT functions.) In any case the final function has the option of returning a NULL value. For example, the final function for avg returns NULL when it sees there were zero input rows.

Single argument aggregate functions, such as min or max, can sometimes be optimized by looking into an index instead of scanning every input row. If this aggregate can be so optimized, indicate it by specifying a sort operator. The basic requirement is that the aggregate must yield the first element in the sort ordering induced by the operator; in other words:

1
SELECT agg(col) FROM tab;

must be equivalent to:

1
SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;

Further assumptions are that the aggregate function ignores NULL inputs, and that it delivers a NULL result if and only if there were no non-null inputs. Ordinarily, a data type’s < operator is the proper sort operator for MIN, and > is the proper sort operator for MAX. Note that the optimization will never actually take effect unless the specified operator is the “less than” or “greater than” strategy member of a B-tree index operator class.

Ordered Aggregates

If the optional qualification ORDERED appears, the created aggregate function is an ordered aggregate. In this case, the preliminary aggregation function, prefunc cannot be specified.

An ordered aggregate is called with the following syntax.

1
<name> ( <arg> [ , ... ] [ORDER BY <sortspec> [ , ...]] )

If the optional ORDER BY is omitted, a system-defined ordering is used. The transition function <sfunc> of an ordered aggregate function is called on its input arguments in the specified order and on a single segment. There is a new column aggordered in the pg_aggregate table to indicate the aggregate function is defined as an ordered aggregate.

Parameters

1
<name>

The name (optionally schema-qualified) of the aggregate function to create.

1
<input_data_type>

An input data type on which this aggregate function operates. To create a zero-argument aggregate function, write * in place of the list of input data types. An example of such an aggregate is count(*).

1
<sfunc>

The name of the state transition function to be called for each input row. For an N-argument aggregate function, the must take N+1 arguments, the first being of type and the rest matching the declared input data types of the aggregate. The function must return a value of type . This function takes the current state value and the current input data values, and returns the next state value.

1
<state_data_type>

The data type for the aggregate state value.

1
<prefunc>

The name of a preliminary aggregation function. This is a function of two arguments, both of type <state_data_type>. It must return a value of <state_data_type>. A preliminary function takes two transition state values and returns a new transition state value representing the combined aggregation. In HAWQ, if the result of the aggregate function is computed in a segmented fashion, the preliminary aggregation function is invoked on the individual internal states in order to combine them into an ending internal state. Note that this function is also called in hash aggregate mode within a segment. Therefore, if you call this aggregate function without a preliminary function, hash aggregate is never chosen. Since hash aggregate is efficient, consider defining preliminary function whenever possible.

PREFUNC is optional. If defined, it is executed on master. Input to PREFUNC is partial results from segments, and not the tuples. If PREFUNC is not defined, the aggregate cannot be executed in parallel. PREFUNC and gp_enable_multiphase_agg are used as follows:

  • gp_enable_multiphase_agg = off: SFUNC is executed sequentially on master. PREFUNC, even if defined, is unused.
  • gp_enable_multiphase_agg = on and PREFUNC is defined: SFUNC is executed in parallel, on segments. PREFUNC is invoked on master to aggregate partial results from segments.
1
2
3
4
CREATE OR REPLACE FUNCTION my_avg_accum(bytea,bigint) returns bytea as 'int8_avg_accum' language internal strict immutable;
CREATE OR REPLACE FUNCTION my_avg_merge(bytea,bytea) returns bytea as 'int8_avg_amalg' language internal strict immutable;
CREATE OR REPLACE FUNCTION my_avg_final(bytea) returns numeric as 'int8_avg' language internal strict immutable;
CREATE AGGREGATE my_avg(bigint) (   stype = bytea,sfunc = my_avg_accum,prefunc = my_avg_merge,finalfunc = my_avg_final,initcond = ''  );
1
<ffunc>

The name of the final function called to compute the aggregate result after all input rows have been traversed. The function must take a single argument of type state_data_type. The return data type of the aggregate is defined as the return type of this function. If <ffunc> is not specified, then the ending state value is used as the aggregate result, and the return type is <state_data_type>.

1
<initial_condition>

The initial setting for the state value. This must be a string constant in the form accepted for the data type . If not specified, the state value starts out NULL.

1
 <sort_operator>

The associated sort operator for a MIN- or MAX-like aggregate function. This is just an operator name (possibly schema-qualified). The operator is assumed to have the same input data types as the aggregate function (which must be a single-argument aggregate function).

Notes

The ordinary functions used to define a new aggregate function must be defined first. Note that in this release of HAWQ, it is required that the , , and functions used to create the aggregate are defined as IMMUTABLE.

Any compiled code (shared library files) for custom functions must be placed in the same location on every host in your HAWQ array (master and all segments). This location must also be in the LD_LIBRARY_PATH so that the server can locate the files.

Examples

Create a sum of cubes aggregate:

1
2
3
4
5
6
7
8
9
CREATE FUNCTION scube_accum(numeric, numeric) RETURNS numeric
   AS 'select $1 + $2 * $2 * $2'
   LANGUAGE SQL
   IMMUTABLE
   RETURNS NULL ON NULL INPUT;
CREATE AGGREGATE scube(numeric) (
   SFUNC = scube_accum,
   STYPE = numeric,
   INITCOND = 0 );

To test this aggregate:

1
2
3
CREATE TABLE x(a INT);
INSERT INTO x VALUES (1),(2),(3);
SELECT scube(a) FROM x;

Correct answer for reference:

1
SELECT sum(a*a*a) FROM x;

Compatibility

CREATE AGGREGATE is a HAWQ language extension. The SQL standard does not provide for user-defined aggregate functions.

CREATE DATABASE

Creates a new database.

Synopsis

1
2
3
4
5
6
7
8
9
CREATE DATABASE <database_name> [s[WITH] <database_attribute>=<value> [ ... ]]

where <database_attribute> is:

    [OWNER=<database_owner>]
    [TEMPLATE=<template>]
    [ENCODING=<encoding>]
    [TABLESPACE=<tablespace>]
    [CONNECTION LIMIT=<connection_limit>]

Description

CREATE DATABASE creates a new database. To create a database, you must be a superuser or have the special CREATEDB privilege.

The creator becomes the owner of the new database by default. Superusers can create databases owned by other users by using the OWNER clause. They can even create databases owned by users with no special privileges. Non-superusers with CREATEDB privilege can only create databases owned by themselves.

By default, the new database will be created by cloning the standard system database template1. A different template can be specified by writing TEMPLATE

Parameters

1
<database_name>

The name of a database to create. Note: HAWQ reserves the database name “hcatalog” for system use.

1
TEMPLATE=<template>

The name of the template from which to create the new database, or DEFAULT to use the default template (template1).

1
ENCODING=<encoding>

Character set encoding to use in the new database. Specify a string constant (such as ‘SQL_ASCII’), an integer encoding number, or DEFAULT to use the default encoding.

1
TABLESPACE=<tablespace>

The name of the tablespace that will be associated with the new database, or DEFAULT to use the template database’s tablespace. This tablespace will be the default tablespace used for objects created in this database.

1
CONNECTION LIMIT=<connection_limit>

The maximum number of concurrent connections possible. The default of -1 means there is no limitation.

Notes

CREATE DATABASE cannot be executed inside a transaction block.

When you copy a database by specifying its name as the template, no other sessions can be connected to the template database while it is being copied. New connections to the template database are locked out until CREATE DATABASE completes.

The CONNECTION LIMIT is not enforced against superusers.

Examples

To create a new database:

1
CREATE DATABASE hawq;

To create a database sales owned by user salesapp with a default tablespace of salesspace:

1
CREATE DATABASE sales OWNER=salesapp TABLESPACE=salesspace;

To create a database music which supports the ISO-8859-1 character set:

1
CREATE DATABASE music ENCODING='LATIN1';

Compatibility

There is no CREATE DATABASE statement in the SQL standard. Databases are equivalent to catalogs, whose creation is implementation-defined.

CREATE EXTERNAL TABLE

Defines a new external table.

Synopsis

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
CREATE READABLE EXTERNAL TABLE <table_name>
     ( <column_name>
           <data_type> [, ...] | LIKE <other_table> )
     LOCATION ('gpfdist://<filehost>[:<port>]/<file_pattern>[#<transform>]' [, ...])
         | ('gpfdists://<filehost>[:<port>]/<file_pattern>[#<transform>]' [, ...])
         | ('hdfs://<host>[:<port>]/<path-to-data-directory>' [, ...])
         | ('hive://<host>[:<port>]/<hive-db-name> /<hive-table-name>’ [, ...])

         FORMAT 'TEXT'
             [( [HEADER]
                [DELIMITER '<delimiter>' | 'OFF']
                [NULL '<null string>']
                [ESCAPE '<escape>' | 'OFF']
                [NEWLINE 'LF' | 'CR' | 'CRLF']
                [FILL MISSING FIELDS] )]
        | 'CSV'
            [( [HEADER]
                [QUOTE '<quote>']
                [DELIMITER '<delimiter>']
                [NULL '<null string>']
                [FORCE NOT NULL <column> [, ...]]
                [ESCAPE '<escape>']
                [NEWLINE 'LF' | 'CR' | 'CRLF']
                [FILL MISSING FIELDS] )]
        | 'ORC'
            [( [COMPRESSTYPE 'NONE' | 'SNAPPY' | 'LZ4']
               [DICTHRESHOLD [0-1]]
               [BLOOMFILTER [column_name] [, ...]] )]
        | 'CUSTOM' (Formatter=<formatter specifications>)
  [ ENCODING '<encoding>' ]
  [ [LOG ERRORS INTO <error_table>] SEGMENT REJECT LIMIT <count>
    [ROWS | PERCENT] ]

 CREATE READABLE EXTERNAL WEB TABLE <table_name>
    ( <column_name>
           <data_type> [, ...] | LIKE <other_table> )
     LOCATION ('http://<webhost>[:<port>]/<path>/<file>' [, ...])
   | EXECUTE '<command>' ON { MASTER | <number_of_segments> | SEGMENT #<num> }
     FORMAT 'TEXT'
         [( [HEADER]
            [DELIMITER '<delimiter>' | 'OFF']
            [NULL '<null string>']
            [ESCAPE '<escape>' | 'OFF']
            [NEWLINE 'LF' | 'CR' | 'CRLF']
            [FILL MISSING FIELDS] )]
        | 'CSV'
         [( [HEADER]
            [QUOTE '<quote>']
            [DELIMITER '<delimiter>']
            [NULL '<null string>']
            [FORCE NOT NULL <column> [, ...]]
            [ESCAPE '<escape>']
            [NEWLINE 'LF' | 'CR' | 'CRLF']
            [FILL MISSING FIELDS] )]
        | 'CUSTOM' (Formatter=<formatter specifications>)
  [ ENCODING '<encoding>' ]
  [ [LOG ERRORS INTO <error_table>] SEGMENT REJECT LIMIT <count>
    [ROWS | PERCENT] ]

 CREATE [WRITABLE] EXTERNAL TABLE <table_name>
      ( <column_name>
           <data_type> [, ...] | LIKE <other_table> )
      LOCATION('gpfdist://<outputhost>[:<port>]/<filename>[#<transform>]'
      | ('gpfdists://<outputhost>[:<port>]/<file_pattern>[#<transform>]'
           [, ...])
      | ('hdfs://<outputhost>[:<port>]/<path-to-data-directory')
      | ('hive://<outputhost>[:<port>]/<dbname>/<tablename>’)

      FORMAT 'TEXT'
              [( [DELIMITER  '<delimiter>']
              [NULL  '<null string>']
              [ESCAPE  '<escape>' | 'OFF'] )]
       | 'CSV'
            [([QUOTE  '<quote>']
            [DELIMITER  '<delimiter>']
            [NULL  '<null string>']
            [FORCE QUOTE <column> [, ...]] ]
            [ESCAPE  '<escape>'] )]
        | 'CUSTOM' (Formatter=<formatter specifications>)
    [ ENCODING '<write_encoding>' ]
    [ [LOG ERRORS INTO <error_table>] SEGMENT REJECT LIMIT <count> [ROWS]]
    [ DISTRIBUTED BY (<column>, [ ... ] ) | DISTRIBUTED RANDOMLY ]

 CREATE [WRITABLE] EXTERNAL WEB TABLE <table_name>
    ( <column_name>
           <data_type> [, ...] | LIKE <other_table> )
    EXECUTE '<command>' ON #<num>
    FORMAT 'TEXT'
             [( [DELIMITER  '<delimiter>']
             [NULL  '<null string>']
             [ESCAPE  '<escape>' | 'OFF'] )]
       | 'CSV'
            [([QUOTE  '<quote>']
            [DELIMITER  '<delimiter>']
            [NULL  '<null string>']
            [FORCE QUOTE <column> [, ...]] ]
            [ESCAPE  '<escape>'] )]
       | 'CUSTOM' (Formatter=<formatter specifications>)
     [ ENCODING '<write_encoding>' ]
     [ DISTRIBUTED BY (<column>, [ ... ] ) | DISTRIBUTED RANDOMLY ]

Description

CREATE EXTERNAL TABLE or CREATE EXTERNAL WEB TABLE creates a new readable external table definition in HAWQ. Readable external tables are typically used for fast, parallel data loading. Once an external table is defined, you can query its data directly (and in parallel) using SQL commands. For example, you can select, join, or sort external table data. You can also create views for external tables. DML operations (UPDATE, INSERT, DELETE, or TRUNCATE) are not permitted on readable external tables.

CREATE WRITABLE EXTERNAL TABLE or CREATE WRITABLE EXTERNAL WEB TABLE creates a new writable external table definition in HAWQ. Writable external tables are typically used for unloading data from the database into a set of files or named pipes.

Writable external web tables can also be used to output data to an executable program. Once a writable external table is defined, data can be selected from database tables and inserted into the writable external table. Writable external tables only allow INSERT operations – SELECT, UPDATE, DELETE, or TRUNCATE are not allowed.

Web external tables access dynamic data sources – either on a web server or by executing OS commands or scripts.

The LOCATION clause specifies the location of the external data. The location string begins with a protocol string that specifies the storage type and protocol used to access the data. The gpfdist:// protocol specifies data files served by one or more instances of the HAWQ file server gpfdist. The http:// protocol specifies one or more HTTP URLs and is used with web tables. The hdfs:// protocol specifies a data file served by a Hadoop HDFS filesystem. The hive:// protocol specifies a data file served by a HIVE database.

Note: The file:// protocol is deprecated. Instead, use the gpfdist://, gpfdists://, hive://, hdfs:// or the COPY command.

The FORMAT clause is used to describe how external table files are formatted. Valid flat file formats, including files in HDFS, are delimited text (TEXT) and comma separated values (CSV) format for gpfdist protocols. Version 3.0 adds support for orc format in HDFS protocol. If the data in the file does not use the default column delimiter, escape character, null string, and so on, you must specify the additional formatting options so that the data in the external file is read correctly by HAWQ.

One writable hdfs:// external table can also be readable, this is a convenient way for user to use one external table supporting both high performance reading and writing external data. This kind of external table also supports high performance scalable data exchange.

Parameters

1
READABLE | WRITABLE

Specifiies the type of external table, readable being the default. Readable external tables are used for loading data into HAWQ. Writable external tables are used for unloading data.

1
WEB

Creates a readable or writable web external table definition in HAWQ. There are two forms of readable web external tables – those that access files via the http:// protocol or those that access data by executing OS commands. Writable web external tables output data to an executable program that can accept an input stream of data. Web external tables are not rescannable during query execution.

1
<table_name>

The name of the new external table.

1
<column_name>

The name of a column to create in the external table definition. Unlike regular tables, external tables do not have column constraints or default values, so do not specify those.

1
LIKE <other_table>

The LIKE clause specifies a table from which the new external table automatically copies all column names, data types and HAWQ distribution policy. If the original table specifies any column constraints or default column values, those will not be copied over to the new external table definition.

1
<data_type>

The data type of the column.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
LOCATION (’<protocol>://<host>[:<port>]/<path>/<file>’ [, …])

For readable external tables, specifies the URI of the external data source(s) to be used to populate the external table or web table. Regular readable external tables allow the file(deprecated), gpfdist, and hdfs protocols. Web external tables allow the http protocol. If <port> is omitted, the http and gpfdist protocols assume port 8080 and the hdfs protocol assumes the <host> is a high availability nameservice string.

If using the gpfdist protocol, the <path> is relative to the directory from which gpfdist is serving files (the directory specified when you started the gpfdist program). Also, the <path> can use wildcards (or other C-style pattern matching) in the <file> name part of the location to denote multiple files in a directory. For example:

'gpfdist://filehost:8081/*'
'gpfdist://masterhost/my_load_file'
'http://intranet.example.com/finance/expenses.csv'

If using the hdfs protocol, the <path> is the directory in target Hadoop HDFS, all files contained in the specified directory will be read in a scanning query context, the directory should not contain sub-directories. For example:

'hdfs://namenode:9000/directory1'

If using the hive protocol, the <path> is the database name of target table in hive, file is the table name. For example:

'hdfs://hivehost:9083/dbname/tablename’

For writable gpfdist external tables, specifies the URI location of the gpfdist process that will collect data output from the HAWQ segments and write it to the named file. The <path> is relative to the directory from which gpfdist is serving files (the directory specified when you started the gpfdist program). If multiple gpfdist locations are listed, the segments sending data will be evenly divided across the available output locations. For example:

'gpfdist://outputhost:8081/data1.out',
'gpfdist://outputhost:8081/data2.out'

With two gpfdist locations listed as in the above example, half of the segments would send their output data to the data1.out file and the other half to the data2.out file.

For writable hdfs external tables, specifies one URI location of the target directory that will have output files saved in Hadoop HDFS, only one directory is accepted. User can also query a writable hdfs external table like a readable one. For example:

'hdfs://namenode:9000/directory2'

For writable hive external tables, specifies one URI location of the target table that will have output files saved in hive database, only one directory is accepted. User can also query a writable hive external table like a readable one. For example:

'hive://hivehost:9083/dbname/tablename’
1
EXECUTE ’<command>’ ON …

Allowed for readable web external tables or writable external tables only. For readable web external tables, specifies the OS command to be executed by the segment instances. The <command> can be a single OS command or a script. If <command> executes a script, that script must reside in the same location on all of the segment hosts and be executable by the HAWQ superuser (gpadmin). For writable external tables, the <command> specified in the EXECUTE clause must be prepared to have data piped into it, as segments having data to send write their output to the specified program. HAWQ uses virtual elastic segments to run its queries.

The ON clause is used to specify which segment instances will execute the given command. For writable external tables, only ON <number> is supported.

Note: ON ALL/HOST is deprecated when creating a readable external table, as HAWQ cannot guarantee scheduling executors on a specific host. Instead, use ON MASTER, ON <number>, or SEGMENT <virtual_segment> to specify which segment instances will execute the command.

  • ON MASTER runs the command on the master host only.
  • ON <number> means the command will be executed by the specified number of virtual segments. The particular segments are chosen by the HAWQ system’s Resource Manager at runtime.
  • ON SEGMENT <virtual_segment> means the command will be executed only once by the specified segment.
1
FORMAT ‘TEXT | CSV’ (<options>)

Specifies the format of the external or web table data - either plain text (TEXT) or comma separated values (CSV) format.

1
DELIMITER

Specifies a delimiter that separates columns within each row (line) of data. The default is a tab character in TEXT mode, a comma in CSV mode. In TEXT mode for readable external tables, the delimiter can be set to OFF for special use cases in which unstructured data is loaded into a single-column table.

Before Version 2.2.0, only single ASCII character delimiter is supported. Version 2.2.0 adds support for non-ASCII character (Such as Chinese) and multiple characters delimiter.

Note: hdfs external table does not support ‘OFF’ option.

1
NULL

Specifies the string that represents a NULL value. The default is N (backslash-N) in TEXT mode, and an empty value with no quotations in CSV mode. You might prefer an empty string even in TEXT mode for cases where you do not want to distinguish NULL values from empty strings. When using external and web tables, any data item that matches this string will be considered a NULL value.

1
ESCAPE

Specifies the single character that is used for C escape sequences (such as n,t,100, and so on) and for escaping data characters that might otherwise be taken as row or column delimiters. Make sure to choose an escape character that is not used anywhere in your actual column data. The default escape character is a (backslash) for text-formatted files and a “ (double quote) for csv-formatted files, however it is possible to specify another character to represent an escape. It is also possible to disable escaping in text-formatted files by specifying the value ‘OFF’ as the escape value. This is very useful for data such as text-formatted web log data that has many embedded backslashes that are not intended to be escapes.

Note: hdfs external table does not support ‘OFF’ option.

1
NEWLINE

Specifies the newline used in your data files – LF (Line feed, 0x0A), CR (Carriage return, 0x0D), or CRLF (Carriage return plus line feed, 0x0D 0x0A). If not specified, a HAWQ segment will detect the newline type by looking at the first row of data it receives and using the first newline type encountered.

1
HEADER

For readable external tables, specifies that the first line in the data file(s) is a header row (contains the names of the table columns) and should not be included as data for the table. If using multiple data source files, all files must have a header row. Note: The HEADER formatting option is not allowed with PXF. For CSV files or other files that include a header line, use an error table instead of the HEADER formatting option.

1
QUOTE

Specifies the quotation character for CSV mode. The default is double-quote (“). FORCE NOT NULL

In CSV mode, processes each specified column as though it were quoted and hence not a NULL value. For the default null string in CSV mode (nothing between two delimiters), this causes missing values to be evaluated as zero-length strings.

1
FORCE QUOTE

In CSV mode for writable external tables, forces quoting to be used for all non-NULL values in each specified column. NULL output is never quoted.

1
FILL MISSING FIELDS

In both TEXT and CSV mode for readable external tables, specifying FILL MISSING FIELDS will set missing trailing field values to NULL (instead of reporting an error) when a row of data has missing data fields at the end of a line or row. Blank rows, fields with a NOT NULL constraint, and trailing delimiters on a line will still report an error.

1
COMPRESSTYPE ’<compresstype>

In ORC mode, specify compresstype to compress orc files. It supports none, snappy and lz4. By default it’s none.

1
DICTHRESHOLD [0-1]

In ORC mode, version 3.1.0 adds dictionary encoding for string column for ORC table. If the number of distinct keys in a dictionary is greater than this fraction of the total number of non-null rows, turn off dictionary encoding then. By default it’s zero.

1
ENCODING ’<encoding>

Character set encoding to use for the external table. Specify a string constant (such as ‘SQL_ASCII’), an integer encoding number, or DEFAULT to use the default client encoding.

1
LOG ERRORS INTO `<error_table>`

This is an optional clause that can precede a SEGMENT REJECT LIMIT clause to log information about rows with formatting errors. It specifies an error table where rows with formatting errors will be logged when running in single row error isolation mode. You can then examine this <error_table> to see error rows that were not loaded (if any). If the <error_table> specified already exists, it will be used. If it does not exist, it will be automatically generated.

Note: It is supported to have error table specified in an hdfs writable external table, which is to support reading data from this table. The error table for one hdfs external table must be an hdfs external table as well, user can define it directly in advance. If the specified error table does not exist, HAWQ automatically creates it with a new directory created. User can query ‘pg_exttable’ to find the generated path for the error table. User should manually remove that error table directory when it is not used any longer.

1
SEGMENT REJECT LIMIT <count> [ROWS | PERCENT]

Runs a COPY FROM operation in single row error isolation mode. If the input rows have format errors they will be discarded provided that the reject limit is not reached on any HAWQ segment instance during the load operation. The reject limit can be specified as number of rows (the default) or percentage of total rows (1-100). If PERCENT is used, each segment starts calculating the bad row percentage only after the number of rows specified by the parameter gp_reject_percent_threshold has been processed. The default for gp_reject_percent_threshold is 300 rows. Constraint errors such as violation of a NOT NULL or CHECK constraint will still be handled in “all-or-nothing” input mode. If the limit is not reached, all good rows will be loaded and any error rows discarded.

Note: hdfs external table does not support ‘PERCENT’ option. DISTRIBUTED RANDOMLY Used to declare the HAWQ distribution policy for a writable external table. By default, writable external tables are distributed randomly. If the source table you are exporting data from has a hash distribution policy, defining the same distribution key column(s) for the writable external table will improve unload performance by eliminating the need to move rows over the interconnect. When you issue an unload command such as INSERT INTO wex_table SELECT * FROM source_table, the rows that are unloaded can be sent directly from the segments to the output location if the two tables have the same hash distribution policy.

Notes

It is also possible to set a configuration parameter session default for a specific role (user) rather than to a database. Role-specific settings override database-specific ones if there is a conflict. See ALTER ROLE.

Examples

Start the gpfdist file server program in the background on port 8081 serving files from directory /var/data/staging:

1
gpfdist -p 8081 -d /var/data/staging -l /home/gpadmin/log &

Create a readable external table named ext_customer using the gpfdist protocol and any text formatted files (*.txt) found in the gpfdist directory. The files are formatted with a pipe (|) as the column delimiter and an empty space as NULL. Also access the external table in single row error isolation mode:

1
2
3
4
5
CREATE EXTERNAL TABLE ext_customer
     (id int, name text, sponsor text)
     LOCATION ( 'gpfdist://filehost:8081/*.txt' )
     FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
     LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5;

Create the same readable external table definition as above, but with CSV formatted files:

1
2
3
4
CREATE EXTERNAL TABLE ext_customer
   (id int, name text, sponsor text)
   LOCATION ( 'gpfdist://filehost:8081/*.csv' )
   FORMAT 'CSV' ( DELIMITER ',' );

Create a readable web external table that executes a script on five virtual segment hosts. (The script must reside at the same location on all segment hosts.)

1
2
3
CREATE EXTERNAL WEB TABLE log_output (linenum int, message text)
EXECUTE '/var/load_scripts/get_log_data.sh' ON 5
FORMAT 'TEXT' (DELIMITER '|');

Create a writable external table named sales_out that uses gpfdist to write output data to a file named sales.out. The files are formatted with a pipe (|) as the column delimiter and an empty space as NULL.

1
2
3
4
CREATE WRITABLE EXTERNAL TABLE sales_out (LIKE sales)
   LOCATION ('gpfdist://etl1:8081/sales.out')
   FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
   DISTRIBUTED BY (txn_id);

The following command sequence shows how to create a writable external web table using a specified number of elastic virtual segments to run the query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
postgres=# CREATE TABLE a (i int);
CREATE TABLE
postgres=# INSERT INTO a VALUES(1);
INSERT 0 1
postgres=# INSERT INTO a VALUES(2);
INSERT 0 1
postgres=# INSERT INTO a VALUES(10);
INSERT 0 1
postgres=# CREATE WRITABLE EXTERNAL WEB TABLE externala (output text)
postgres-# EXECUTE 'cat > /tmp/externala' ON 3
postgres-# FORMAT 'TEXT' DISTRIBUTED RANDOMLY;
CREATE EXTERNAL TABLE
postgres=# INSERT INTO externala SELECT * FROM a;
INSERT 0 3

Create a writable external web table that pipes output data received by the segments to an executable script named to_adreport_etl.sh:

1
2
3
CREATE WRITABLE EXTERNAL WEB TABLE campaign_out (LIKE campaign)
EXECUTE '/var/unload_scripts/to_adreport_etl.sh'
FORMAT 'TEXT' (DELIMITER '|');

Use the writable external table defined above to unload selected data:

1
2
INSERT INTO campaign_out
    SELECT * FROM campaign WHERE customer_id=123;

The following command sequence shows how to create a writable hdfs external table to freely load, unload, query and insert data:

1
CREATE WRITABLE EXTERNAL TABLE tbexternal (a int, b decimal(10,2), t text, d date) LOCATION ('hdfs://host1:port1/tbexternaldir') FORMAT 'csv' (DELIMITER '|') LOG ERRORS INTO err_tbexternal SEGMENT REJECT LIMIT 5;

This table supports both reading and writing, all files located in the path ‘/tbexternaldir’ can be read for query or loading, and user can also unload or insert data into this table by writing data into the same path ‘/tbexternaldir’. Some examples of operating this external table are shown as below:

1
2
3
4
5
6
7
INSERT INTO tbexternal SELECT * FROM tb;
INSERT INTO tbexternal SELECT * FROM tb WHERE t like 'test%';
INSERT INTO tb SELECT * FROM tbexternal;
INSERT INTO tb SELECT * FROM tbexternal WHERE t like 'test%';

SELECT a,b FROM tbexternal WHERE b > 10;
SELECT T1.a, T1.d FROM tbexternal T1, tbinternal T2 WHERE T1.t = T2.t;

User can also sometime update files in path ‘/tbexternaldir’ through Hadoop HDFS file operation services and use this table to read data directly. In this case, error table maybe useful to filter out wrong row content.

The following command shows how to create a writable hdfs external table in orc format:

1
CREATE WRITABLE EXTERNAL TABLE orcexternal (p text, q text) LOCATION ('hdfs://host1:port1/orcexternaldir') FORMAT 'orc' (COMPRESSTYPE 'lz4');

The following command shows how to create a writable hive external table in orc format:

1
CREATE WRITABLE EXTERNAL TABLE orcexternal (p text, q text) LOCATION ('hive://host1:port1/dbname/orctablename’) FORMAT 'orc' (COMPRESSTYPE 'lz4');

Compatibility

CREATE EXTERNAL TABLE is a HAWQ extension. The SQL standard makes no provisions for external tables.

CREATE FUNCTION

Defines a new function.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE [OR REPLACE] FUNCTION <name>
     ( [ [<argmode>] [<argname>] <argtype> [, ...] ] )
       [ RETURNS { [ SETOF ] <rettype>
         | TABLE ([{ <argname> <argtype> | LIKE <other table> }
       [, ...]])
     } ]
   { LANGUAGE <langname>
   | IMMUTABLE | STABLE | VOLATILE
   | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
   | [EXTERNAL] SECURITY INVOKER | [EXTERNAL] SECURITY DEFINER
   | AS '<definition>'
   | AS '<obj_file>', '<link_symbol>' } ...
   [ WITH ({ DESCRIBE = <describe_function>
        } [, ...] ) ]

Description

CREATE FUNCTION defines a new function. CREATE OR REPLACE FUNCTION will either create a new function, or replace an existing definition.

The name of the new function must not match any existing function with the same argument types in the same schema. However, functions of different argument types may share a name (overloading).

To update the definition of an existing function, use CREATE OR REPLACE FUNCTION. It is not possible to change the name or argument types of a function this way (this would actually create a new, distinct function). Also, CREATE OR REPLACE FUNCTION will not let you change the return type of an existing function. To do that, you must drop and recreate the function. If you drop and then recreate a function, you will have to drop existing objects (rules, views, and so on) that refer to the old function. Use CREATE OR REPLACE FUNCTION to change a function definition without breaking objects that refer to the function.

For more information about creating functions, see User-Defined Functions.

Limited Use of VOLATILE and STABLE Functions

To prevent data from becoming out-of-sync across the segments in HAWQ, any function classified as STABLE or VOLATILE cannot be executed at the segment level if it contains SQL or modifies the database in any way. For example, functions such as random() or timeofday() are not allowed to execute on distributed data in HAWQ because they could potentially cause inconsistent data between the segment instances.

To ensure data consistency, VOLATILE and STABLE functions can safely be used in statements that are evaluated on and execute from the master. For example, the following statements are always executed on the master (statements without a FROM clause):

1
2
3
SELECT setval('myseq', 201);
SELECT foo();
In cases where a statement has a FROM clause containing a distributed table and the function used in the FROM clause simply returns a set of rows, execution may be allowed on the segments:
1
SELECT * FROM foo();

One exception to this rule are functions that return a table reference (rangeFuncs) or functions that use the refCursor data type. Note that you cannot return a refcursor from any kind of function in HAWQ.

Parameters

1
<name>

The name (optionally schema-qualified) of the function to create.

1
<argmode>

The mode of an argument: either IN, OUT, or INOUT. If omitted, the default is IN.

1
<argname>

The name of an argument. Some languages (currently only PL/pgSQL) let you use the name in the function body. For other languages the name of an input argument is just extra documentation. But the name of an output argument is significant, since it defines the column name in the result row type. (If you omit the name for an output argument, the system will choose a default column name.)

1
<argtype>

The data type(s) of the function’s arguments (optionally schema-qualified), if any. The argument types may be base, composite, or domain types, or may reference the type of a table column. Depending on the implementation language it may also be allowed to specify pseudotypes such as cstring. Pseudotypes indicate that the actual argument type is either incompletely specified, or outside the set of ordinary SQL data types.

The type of a column is referenced by writing .%. Using this feature can sometimes help make a function independent of changes to the definition of a table.

1
<rettype>

The return data type (optionally schema-qualified). The return type can be a base, composite, or domain type, or may reference the type of a table column. Depending on the implementation language it may also be allowed to specify pseudotypes such as cstring. If the function is not supposed to return a value, specify void as the return type. When there are OUT or INOUT parameters, the RETURNS clause may be omitted. If present, it must agree with the result type implied by the output parameters: RECORD if there are multiple output parameters, or the same type as the single output parameter.

The SETOF modifier indicates that the function will return a set of items, rather than a single item.

The type of a column is referenced by writing .%.

1
<langname>

The name of the language that the function is implemented in. May be SQL, C, internal, or the name of a user-defined procedural language. See CREATE LANGUAGE for the procedural languages supported in HAWQ. For backward compatibility, the name may be enclosed by single quotes.

1
2
3
IMMUTABLE
STABLE
VOLATILE

These attributes inform the query optimizer about the behavior of the function. At most one choice may be specified. If none of these appear, VOLATILE is the default assumption. Since HAWQ currently has limited use of VOLATILE functions, if a function is truly IMMUTABLE, you must declare it as so to be able to use it without restrictions. IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values. It does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.

These attributes inform the query optimizer about the behavior of the function. At most one choice may be specified. If none of these appear, VOLATILE is the default assumption. Since HAWQ currently has limited use of VOLATILE functions, if a function is truly IMMUTABLE, you must declare it as so to be able to use it without restrictions. IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values. It does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.

VOLATILE indicates that the function value can change even within a single table scan, so no optimizations can be made. Relatively few database functions are volatile in this sense; some examples are random(), currval(), timeofday(). But note that any function that has side-effects must be classified volatile, even if its result is quite predictable, to prevent calls from being optimized away; an example is setval().

1
2
3
CALLED ON NULL INPUT
RETURNS NULL ON NULL INPUT
STRICT

CALLED ON NULL INPUT (the default) indicates that the function will be called normally when some of its arguments are null. It is then the function author’s responsibility to check for null values if necessary and respond appropriately. RETURNS NULL ON NULL INPUT or STRICT indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not executed when there are null arguments; instead a null result is assumed automatically.

1
2
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER (the default) indicates that the function is to be executed with the privileges of the user that calls it. SECURITY DEFINER specifies that the function is to be executed with the privileges of the user that created it. The key word EXTERNAL is allowed for SQL conformance, but it is optional since, unlike in SQL, this feature applies to all functions not just external ones.

1
<definition>

A string constant defining the function; the meaning depends on the language. It may be an internal function name, the path to an object file, an SQL command, or text in a procedural language.

1
<obj_file>, <link_symbol>

This form of the AS clause is used for dynamically loadable C language functions when the function name in the C language source code is not the same as the name of the SQL function. The string is the name of the file containing the dynamically loadable object, and is the name of the function in the C language source code. If the link symbol is omitted, it is assumed to be the same as the name of the SQL function being defined. A good practice is to locate shared libraries either relative to $libdir (which is located at $GPHOME/lib) or through the dynamic library path (set by the dynamic_library_path server configuration parameter). This simplifies version upgrades if the new installation is at a different location.

1
<describe_function>

The name of a callback function to execute when a query that calls this function is parsed. The callback function returns a tuple descriptor that indicates the result type.

Notes

Any compiled code (shared library files) for custom functions must be placed in the same location on every host in your HAWQ array (master and all segments). This location must also be in the LD_LIBRARY_PATH so that the server can locate the files. Consider locating shared libraries either relative to $libdir (which is located at $GPHOME/lib) or through the dynamic library path (set by the dynamic_library_path server configuration parameter) on all master segment instances in the HAWQ array.

The full SQL type syntax is allowed for input arguments and return value. However, some details of the type specification (such as the precision field for type numeric) are the responsibility of the underlying function implementation and are not recognized or enforced by the CREATE FUNCTION command.

HAWQ allows function overloading. The same name can be used for several different functions so long as they have distinct argument types. However, the C names of all functions must be different, so you must give overloaded C functions different C names (for example, use the argument types as part of the C names).

Two functions are considered the same if they have the same names and input argument types, ignoring any OUT parameters. Thus for example these declarations conflict:

1
2
CREATE FUNCTION foo(int) ...
CREATE FUNCTION foo(int, out text) ...

When repeated CREATE FUNCTION calls refer to the same object file, the file is only loaded once. To unload and reload the file, use the LOAD command.

To be able to define a function, the user must have the USAGE privilege on the language.

It is often helpful to use dollar quoting to write the function definition string, rather than the normal single quote syntax. Without dollar quoting, any single quotes or backslashes in the function definition must be escaped by doubling them. A dollar-quoted string constant consists of a dollar sign ($), an optional tag of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign. Inside the dollar-quoted string, single quotes, backslashes, or any character can be used without escaping. The string content is always written literally. For example, here are two different ways to specify the string “Dianne’s horse” using dollar quoting:

1
2
$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$

Examples

A very simple addition function:

1
2
3
4
5
CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;

Increment an integer, making use of an argument name, in PL/pgSQL:

1
2
3
4
5
6
CREATE OR REPLACE FUNCTION increment(i integer) RETURNS
integer AS $$
        BEGIN
                RETURN i + 1;
        END;
$$ LANGUAGE plpgsql;

Return a record containing multiple output parameters:

1
2
3
4
CREATE FUNCTION dup(in int, out f1 int, out f2 text)
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;
SELECT * FROM dup(42);

You can do the same thing more verbosely with an explicitly named composite type:

1
2
3
4
5
CREATE TYPE dup_result AS (f1 int, f2 text);
CREATE FUNCTION dup(int) RETURNS dup_result
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;
SELECT * FROM dup(42);

Compatibility

CREATE FUNCTION is defined in SQL:1999 and later. The HAWQ version of the command is similar, but not fully compatible. The attributes are not portable, neither are the different available languages.

For compatibility with some other database systems, can be written either before or after . But only the first way is standard-compliant.

CREATE GROUP

Defines a new database role.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
CREATE GROUP <name> [ [WITH] <option> [ ... ] ]
where <option> can be:

      SUPERUSER | NOSUPERUSER
    | CREATEDB | NOCREATEDB
    | CREATEROLE | NOCREATEROLE
    | CREATEUSER | NOCREATEUSER
    | INHERIT | NOINHERIT
    | LOGIN | NOLOGIN
    | [ ENCRYPTED | UNENCRYPTED ] PASSWORD '<password>'
    | VALID UNTIL '<timestamp>'
    | IN ROLE <rolename> [, ...]
    | IN GROUP <rolename> [, ...]
    | ROLE <rolename> [, ...]
    | ADMIN <rolename> [, ...]
    | USER <rolename> [, ...]
    | SYSID <uid>

Description

CREATE GROUP has been replaced by CREATE ROLE, although it is still accepted for backwards compatibility.

Compatibility

There is no CREATE GROUP statement in the SQL standard.

CREATE LANGUAGE

Defines a new procedural language.

Synopsis

1
2
3
4
CREATE [PROCEDURAL] LANGUAGE <name>

CREATE [TRUSTED] [PROCEDURAL] LANGUAGE <name>
       HANDLER <call_handler> [VALIDATOR <valfunction>]

Description

CREATE LANGUAGE registers a new procedural language with a HAWQ database. Subsequently, functions can be defined in this new language. You must be a superuser to register a new language.

When you register a new procedural language, you effectively associate the language name with a call handler that is responsible for executing functions written in that language. For a function written in a procedural language (a language other than C or SQL), the database server has no built-in knowledge about how to interpret the function’s source code. The task is passed to a special handler that knows the details of the language. The handler could either do all the work of parsing, syntax analysis, execution, and so on, or it could serve as a bridge between HAWQ and an existing implementation of a programming language. The handler itself is a C language function compiled into a shared object and loaded on demand, just like any other C function.

There are two forms of the CREATE LANGUAGE command. In the first form, the user specifies the name of the desired language and the HAWQ server uses the pg_pltemplate system catalog to determine the correct parameters. In the second form, the user specifies the language parameters as well as the language name. You can use the second form to create a language that is not defined in pg_pltemplate.

When the server finds an entry in the pg_pltemplate catalog for the given language name, it will use the catalog data even if the command includes language parameters. This behavior simplifies loading of old dump files, which are likely to contain out-of-date information about language support functions.

Parameters

1
TRUSTED

Ignored if the server has an entry for the specified language name in pg_pltemplate. Specifies that the call handler for the language is safe and does not offer an unprivileged user any functionality to bypass access restrictions. If this key word is omitted when registering the language, only users with the superuser privilege can use this language to create new functions.

1
PROCEDURAL

Indicates that this is a procedural language.

1
<name>

The name of the new procedural language. The language name is case insensitive. The name must be unique among the languages in the database. Built-in support is included for plpgsql, plpython, plpythonu, and plr. plpgsql is installed by default in HAWQ.

1
HANDLER <call_handler>

Ignored if the server has an entry for the specified language name in pg_pltemplate. The name of a previously registered function that will be called to execute the procedural language functions. The call handler for a procedural language must be written in a compiled language such as C with version 1 call convention and registered with HAWQ as a function taking no arguments and returning the language_handler type, a placeholder type that is simply used to identify the function as a call handler.

1
VALIDATOR <valfunction>

Ignored if the server has an entry for the specified language name in pg_pltemplate. <valfunction> is the name of a previously registered function that will be called when a new function in the language is created, to validate the new function. If no validator function is specified, then a new function will not be checked when it is created. The validator function must take one argument of type oid, which will be the OID of the to-be-created function, and will typically return void. A validator function would typically inspect the function body for syntactical correctness, but it can also look at other properties of the function, for example if the language cannot handle certain argument types. To signal an error, the validator function should use the ereport() function. The return value of the function is ignored.

Notes

The procedural language packages included in the standard HAWQ distribution are:

  • PL/pgSQL - registered in all databases by default
  • PL/Perl
  • PL/Python
  • PL/Java

HAWQ supports a language handler for PL/R, but the PL/R language package is not pre-installed with HAWQ.

The system catalog pg_language records information about the currently installed languages.

To create functions in a procedural language, a user must have the USAGE privilege for the language. By default, USAGE is granted to PUBLIC (everyone) for trusted languages. This may be revoked if desired.

Procedural languages are local to individual databases. However, a language can be installed into the template1 database, which will cause it to be available automatically in all subsequently-created databases.

The call handler function and the validator function (if any) must already exist if the server does not have an entry for the language in pg_pltemplate. But when there is an entry, the functions need not already exist; they will be automatically defined if not present in the database.

Any shared library that implements a language must be located in the same LD_LIBRARY_PATH location on all segment hosts in your HAWQ array.

Examples

The preferred way of creating any of the standard procedural languages in a database:

1
2
3
CREATE LANGUAGE plr;
CREATE LANGUAGE plpythonu;
CREATE LANGUAGE plperl;

For a language not known in the pg_pltemplate catalog:

1
2
3
4
5
6
CREATE FUNCTION plsample_call_handler() RETURNS
language_handler
    AS '$libdir/plsample'
    LANGUAGE C;
CREATE LANGUAGE plsample
    HANDLER plsample_call_handler;

Compatibility

CREATE LANGUAGE is a HAWQ extension.

CREATE OPERATOR

Defines a new operator.

Synopsis

1
2
3
4
5
6
7
8
CREATE OPERATOR <name> (
       PROCEDURE = <funcname>
       [, LEFTARG = <lefttype>] [, RIGHTARG = <righttype>]
       [, COMMUTATOR = <com_op>] [, NEGATOR = <neg_op>]
       [, RESTRICT = <res_proc>] [, JOIN = <join_proc>]
       [, HASHES] [, MERGES]
       [, SORT1 = <left_sort_op>] [, SORT2 = <right_sort_op>]
       [, LTCMP = <less_than_op>] [, GTCMP = <greater_than_op>] )

Description

CREATE OPERATOR defines a new operator. The user who defines an operator becomes its owner.

The operator name is a sequence of up to NAMEDATALEN-1 (63 by default) characters from the following list: + - * / < > = ~ ! @ # % ^ & | ` ?

There are a few restrictions on your choice of name:

– and /* cannot appear anywhere in an operator name, since they will be taken as the start of a comment. A multicharacter operator name cannot end in + or -, unless the name also contains at least one of these characters: ~ ! @ # % ^ & | ` ? For example, @- is an allowed operator name, but *- is not. This restriction allows HAWQ to parse SQL-compliant commands without requiring spaces between tokens.

The operator != is mapped to <> on input, so these two names are always equivalent.

At least one of LEFTARG and RIGHTARG must be defined. For binary operators, both must be defined. For right unary operators, only LEFTARG should be defined, while for left unary operators only RIGHTARG should be defined.

The <funcname> procedure must have been previously defined using CREATE FUNCTION, must be IMMUTABLE, and must be defined to accept the correct number of arguments (either one or two) of the indicated types.

The other clauses specify optional operator optimization clauses. These clauses should be provided whenever appropriate to speed up queries that use the operator. But if you provide them, you must be sure that they are correct. Incorrect use of an optimization clause can result in server process crashes, subtly wrong output, or other unexpected results. You can always leave out an optimization clause if you are not sure about it.

Parameters

1
<name>

The (optionally schema-qualified) name of the operator to be defined. Two operators in the same schema can have the same name if they operate on different data types.

1
<funcname>

The function used to implement this operator (must be an IMMUTABLE function).

1
<lefttype>

The data type of the operator’s left operand, if any. This option would be omitted for a left-unary operator.

1
<righttype>

The data type of the operator’s right operand, if any. This option would be omitted for a right-unary operator.

1
<com_op>

The optional COMMUTATOR clause names an operator that is the commutator of the operator being defined. We say that operator A is the commutator of operator B if (x A y) equals (y B x) for all possible input values x, y. Notice that B is also the commutator of A. For example, operators < and > for a particular data type are usually each others commutators, and operator + is usually commutative with itself. But operator - is usually not commutative with anything. The left operand type of a commutable operator is the same as the right operand type of its commutator, and vice versa. So the name of the commutator operator is all that needs to be provided in the COMMUTATOR clause.

1
<neg_op>

The optional NEGATOR clause names an operator that is the negator of the operator being defined. We say that operator A is the negator of operator B if both return Boolean results and (x A y) equals NOT (x B y) for all possible inputs x, y. Notice that B is also the negator of A. For example, < and >= are a negator pair for most data types. An operator’s negator must have the same left and/or right operand types as the operator to be defined, so only the operator name need be given in the NEGATOR clause.

1
<res_proc>

The optional RESTRICT names a restriction selectivity estimation function for the operator. Note that this is a function name, not an operator name. RESTRICT clauses only make sense for binary operators that return boolean. The idea behind a restriction selectivity estimator is to guess what fraction of the rows in a table will satisfy a WHERE-clause condition of the form:

1
column OP constant

for the current operator and a particular constant value. This assists the optimizer by giving it some idea of how many rows will be eliminated by WHERE clauses that have this form.

You can usually just use one of the following system standard estimator functions for many of your own operators:

1
2
3
4
5
6
7
eqsel for =

neqsel for <>

scalarltsel for < or <=

scalargtsel for > or >=
1
<join_proc>

The optional JOIN clause names a join selectivity estimation function for the operator. Note that this is a function name, not an operator name. JOIN clauses only make sense for binary operators that return boolean. The idea behind a join selectivity estimator is to guess what fraction of the rows in a pair of tables will satisfy a WHERE-clause condition of the form: table1.column1 OP table2.column2 for the current operator. This helps the optimizer by letting it figure out which of several possible join sequences is likely to take the least work.

You can usually just use one of the following system standard join selectivity estimator functions for many of your own operators:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
eqjoinsel for =

neqjoinsel for <>

scalarltjoinsel for < or <=

scalargtjoinsel for > or >=

areajoinsel for 2D area-based comparisons

positionjoinsel for 2D position-based comparisons

contjoinsel for 2D containment-based comparisons
1
HASHES

The optional HASHES clause tells the system that it is permissible to use the hash join method for a join based on this operator. HASHES only makes sense for a binary operator that returns boolean. The hash join operator can only return true for pairs of left and right values that hash to the same hash code. If two values get put in different hash buckets, the join will never compare them at all, implicitly assuming that the result of the join operator must be false. So it never makes sense to specify HASHES for operators that do not represent equality.

To be marked HASHES, the join operator must appear in a hash index operator class. Attempts to use the operator in hash joins will fail at run time if no such operator class exists. The system needs the operator class to find the data-type-specific hash function for the operator’s input data type. You must also supply a suitable hash function before you can create the operator class. Care should be exercised when preparing a hash function, because there are machine-dependent ways in which it might fail to do the right thing.

1
MERGES

The MERGES clause, if present, tells the system that it is permissible to use the merge-join method for a join based on this operator. MERGES only makes sense for a binary operator that returns boolean, and in practice the operator must represent equality for some data type or pair of data types. Merge join is based on the idea of sorting the left- and right-hand tables into order and then scanning them in parallel. So, both data types must be capable of being fully ordered, and the join operator must be one that can only succeed for pairs of values that fall at the same place in the sort order. In practice this means that the join operator must behave like equality. It is possible to merge-join two distinct data types so long as they are logically compatible. For example, the smallint-versus-integer equality operator is merge-joinable. We only need sorting operators that will bring both data types into a logically compatible sequence.

Execution of a merge join requires that the system be able to identify four operators related to the merge-join equality operator: less-than comparison for the left operand data type, less-than comparison for the right operand data type, less-than comparison between the two data types, and greater-than comparison between the two data types. It is possible to specify these operators individually by name, as the SORT1, SORT2, LTCMP, and GTCMP options respectively. The system will fill in the default names if any of these are omitted when MERGES is specified.

1
<left_sort_op>

If this operator can support a merge join, the less-than operator that sorts the left-hand data type of this operator. < is the default if not specified.

1
<right_sort_op>

If this operator can support a merge join, the less-than operator that sorts the right-hand data type of this operator. < is the default if not specified.

1
<less_than_op>

If this operator can support a merge join, the less-than operator that compares the input data types of this operator. < is the default if not specified.

1
<greater_than_op>

If this operator can support a merge join, the greater-than operator that compares the input data types of this operator. > is the default if not specified. To give a schema-qualified operator name in optional arguments, use the OPERATOR() syntax, for example:

1
COMMUTATOR = OPERATOR(myschema.===) ,

Notes

Any functions used to implement the operator must be defined as IMMUTABLE.

Examples

Here is an example of creating an operator for adding two complex numbers, assuming we have already created the definition of type complex. First define the function that does the work, then define the operator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE FUNCTION complex_add(complex, complex)
    RETURNS complex
    AS 'filename', 'complex_add'
    LANGUAGE C IMMUTABLE STRICT;
CREATE OPERATOR + (
    leftarg = complex,
    rightarg = complex,
    procedure = complex_add,
    commutator = +
);

To use this operator in a query:

1
SELECT (a + b) AS c FROM test_complex;

Compatibility

CREATE OPERATOR is a HAWQ language extension. The SQL standard does not provide for user-defined operators.

CREATE OPERATOR CLASS

Defines a new operator class.

Synopsis

1
2
3
ALTER DATABASE <name> SET <parameter> { TO | = } { <value> | DEFAULT }

ALTER DATABASE <name> RESET <parameter>

Description

ALTER DATABASE changes the attributes of a HAWQ database.

SET and RESET changes the session default for a configuration parameter for a HAWQ database. Whenever a new session is subsequently started in that database, the specified value becomes the session default value. The database-specific default overrides whatever setting is present in the server configuration file (hawq-site.xml). Only the database owner or a superuser can change the session defaults for a database. Certain parameters cannot be set this way, or can only be set by a superuser.

Parameters

1
<name>

The name of the database whose attributes are to be altered. Note: HAWQ reserves the database “hcatalog” for system use. You cannot connect to or alter the system “hcatalog” database.

1
<parameter>

Set this database’s session default for the specified configuration parameter to the given value. If value is DEFAULT or if RESET is used, the database-specific setting is removed, so the system-wide default setting will be inherited in new sessions. Use RESET ALL to clear all database-specific settings. See About Server Configuration Parameters for information about user-settable configuration parameters.

Notes

It is also possible to set a configuration parameter session default for a specific role (user) rather than to a database. Role-specific settings override database-specific ones if there is a conflict. See ALTER ROLE.

Examples

To set the default schema search path for the mydatabase database:

1
ALTER DATABASE mydatabase SET search_path TO myschema, public, pg_catalog;

Compatibility

The ALTER DATABASE statement is a HAWQ extension.

CREATE RESOURCE QUEUE

Defines a new resource queue.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
CREATE RESOURCE QUEUE <name> WITH (<queue_attribute>=<value> [, ... ])

where <queue_attribute> is:

    PARENT=<queue_name>
    MEMORY_LIMIT_CLUSTER=<percentage>
    CORE_LIMIT_CLUSTER=<percentage>
    [ACTIVE_STATEMENTS=<integer>]
    [ALLOCATION_POLICY='even']
    [VSEG_RESOURCE_QUOTA='mem:<memory_units>']
    [RESOURCE_OVERCOMMIT_FACTOR=<double>]
    [NVSEG_UPPER_LIMIT=<integer>]
    [NVSEG_LOWER_LIMIT=<integer>]
    [NVSEG_UPPER_LIMIT_PERSEG=<double>]
    [NVSEG_LOWER_LIMIT_PERSEG=<double>]
     <memory_units> ::= {128mb|256mb|512mb|1024mb|2048mb|4096mb|
                        8192mb|16384mb|1gb|2gb|4gb|8gb|16gb}
     <percentage> ::= <integer>%

Description

Creates a new resource queue for HAWQ workload management. A resource queue must specify a parent queue. Only a superuser can create a resource queue.

Resource queues with an ACTIVE_STATEMENTS threshold set a maximum limit on the number of queries that can be executed by roles assigned to that queue. It controls the number of active queries that are allowed to run at the same time. The value for ACTIVE_STATEMENTS should be an integer greater than 0. If not specified, the default value is 20.

When creating the resource queue, use the MEMORY_LIMIT_CLUSTER and CORE_LIMIT_CLUSTER queue attributes to tune the allowed resource usage of the resource queue. MEMORY_LIMIT_CLUSTER and CORE_LIMIT_CLUSTER must set to the same value for a resource queue. In addition the sum of the percentages of MEMORY_LIMIT_CLUSTER (and CORE_LIMIT_CLUSTER) for resource queues that share the same parent cannot exceed 100%.

You can optionally configure the maximum or minimum number of virtual segments to use when executing a query by setting NVSEG_UPPER_LIMIT/NVSEG_LOWER_LIMIT or NVSEG_UPPER_LIMIT_PERSEG/NVSEG_LOWER_LIMIT_PERSEG attributes for the resource queue.

After defining a resource queue, you can assign a role to the queue by using the ALTER ROLE or CREATE ROLE command. You can only assign roles to the leaf-level resource queues (resource queues that do not have any children.)

Parameters

1
<name>

Required. The name of the resource queue. The name must not already be in use and must not be pg_default or pg_root.

1
PARENT=<queue_name>

Required. The parent queue of the new resource queue. The parent queue must already exist. This attribute is used to organize resource queues into a tree structure. You cannot specify pg_default as a parent queue. Resource queues that are parents to other resource queues are also called branch queues. Resource queues without any children are also called leaf queues. If you do not have any existing resource queues, use pg_root as the starting point for new resource queues.

The parent queue cannot have any roles assigned.

1
MEMORY_LIMIT_CLUSTER=<percentage>

Required. Defines how much memory a resource queue can consume from its parent resource queue and consequently dispatch to the execution of parallel statements. Since a resource queue obtains its memory from its parent, the actual memory limit is based from its parent queue. The valid values are 1% to 100%. The value of MEMORY_LIMIT_CLUSTER must be identical to the value of CORE_LIMIT_CLUSTER. The sum of values for MEMORY_LIMIT_CLUSTER of this queue plus other queues that share the same parent cannot exceed 100%. The HAWQ resource manager periodically validates this restriction.

1
CORE_LIMIT_CLUSTER=<percentage>

Required. The percentage of consumable CPU (virtual core) resources that the resource queue can take from its parent resource queue. The valid values are 1% to 100%. The value of MEMORY_LIMIT_CLUSTER must be identical to the value of CORE_LIMIT_CLUSTER. The sum of values for CORE_LIMIT_CLUSTER of this queue and queues that share the same parent cannot exceed 100%.

1
ACTIVE_STATEMENTS=<integer>

Optional. Defines the limit of the number of parallel active statements in one leaf queue. The maximum number of connections cannot exceed this limit. If this limit is reached, the HAWQ resource manager queues more query allocation requests. Note that a single session can have several concurrent statement executions that occupy multiple connection resources. The value for ACTIVE_STATEMENTS should be an integer greater than 0. The default value is 20.

1
ALLOCATION_POLICY=<string>

Optional. Defines the resource allocation policy for parallel statement execution. The default value is even.

Note: This release only supports an even allocation policy. Even if you do not specify this attribute, the resource queue still applies an even allocation policy. Future releases will support alternative allocation policies.

Setting the allocation policy to even means resources are always evenly dispatched based on current concurrency. When multiple query resource allocation requests are queued, the resource queue tries to evenly dispatch resources to queued requests until one of the following conditions are encountered:

  • There are no more allocated resources in this queue to dispatch, or
  • The ACTIVE_STATEMENTS limit has been reached

For each query resource allocation request, the HAWQ resource manager determines the minimum and maximum size of a virtual segment based on multiple factors including query cost, user configuration, table properties, and so on. For example, a hash distributed table requires fixed size of virtual segments. With an even allocation policy, the HAWQ resource manager uses the minimum virtual segment size requirement and evenly dispatches resources to each query resource allocation request in the resource queue.

1
VSEG_RESOURCE_QUOTA=‘mem:{128mb | 256mb | 512mb | 1024mb | 2048mb | 4096mb | 8192mb | 16384mb | 1gb | 2gb | 4gb | 8gb | 16gb}’

Optional. This quota defines how resources are split across multiple virtual segments. For example, when the HAWQ resource manager determines that 256GB memory and 128 vcores should be allocated to the current resource queue, there are multiple solutions on how to divide the resources across virtual segments. For example, you could use a) 2GB/1 vcore * 128 virtual segments or b) 1GB/0.5 vcore * 256 virtual segments. Therefore, you can use this attribute to make the HAWQ resource manager calculate the number of virtual segments based on how to divide the memory. For example, if VSEG_RESOURCE_QUOTA=’mem:512mb’, then the resource queue will use 512MB/0.25 vcore * 512 virtual segments. The default value is ’mem:256mb’.

Note: To avoid resource fragmentation, make sure that the segment resource capacity configured for HAWQ (in HAWQ Standalone mode: hawq_rm_memory_limit_perseg; in YARN mode: yarn.nodemanager.resource.memory-mb must be a multiple of the resource quotas for all virtual segments and CPU to memory ratio must be a multiple of the amount configured for yarn.scheduler.minimum-allocation-mb.

1
RESOURCE_OVERCOMMIT_FACTOR=<double>

Optional. This factor defines how much a resource can be overcommitted. For example, if RESOURCE_OVERCOMMIT_FACTOR is set to 3.0 and MEMORY_LIMIT_CLUSTER is set to 30%, then the maximum possible resource allocation in this queue is 90% (30% x 3.0). If the resulting maximum is bigger than 100%, then 100% is adopted. The minimum value that this attribute can be set to is 1.0. The default value is 2.0.

1
NVSEG_UPPER_LIMIT=<integer> / NVSEG_UPPER_LIMIT_PERSEG=<double>

Optional. These limits restrict the range of number of virtual segments allocated in this resource queue for executing one query statement. NVSEG_UPPER_LIMIT defines an upper limit of virtual segments for one statement execution regardless of actual cluster size, while NVSEG_UPPER_LIMIT_PERSEG defines the same limit by using the average number of virtual segments in one physical segment. Therefore, the limit defined by NVSEG_UPPER_LIMIT_PERSEG varies dynamically according to the changing size of the HAWQ cluster. For example, if you set NVSEG_UPPER_LIMIT=10 all query resource requests are strictly allocated no more than 10 virtual segments. If you set NVSEG_UPPER_LIMIT_PERSEG=2 and assume that currently there are 5 available HAWQ segments in the cluster, query resource requests are allocated 10 virtual segments at the most.

NVSEG_UPPER_LIMIT cannot be set to a lower value than NVSEG_LOWER_LIMIT if both limits are enabled. In addition, the upper limit cannot be set to a value larger than the value set in global configuration parameter hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit.

By default, both limits are set to -1, which means the limits are disabled. NVSEG_UPPER_LIMIT has higher priority than NVSEG_UPPER_LIMIT_PERSEG. If both limits are set, then NVSEG_UPPER_LIMIT_PERSEG is ignored. If you have enabled resource quotas for the query statement, then these limits are ignored.

Note: If the actual lower limit of the number of virtual segments becomes greater than the upper limit, then the lower limit is automatically reduced to be equal to the upper limit. This situation is possible when user sets both NVSEG_UPPER_LIMITand NVSEG_LOWER_LIMIT_PERSEG. After expanding the cluster, the dynamic lower limit may become greater than the value set for the fixed upper limit.

1
NVSEG_LOWER_LIMIT=<integer> / NVSEG_LOWER_LIMIT_PERSEG=<double>

Optional. These limits specify the minimum number of virtual segments allocated for one statement execution in order to guarantee query performance. NVSEG_LOWER_LIMIT defines the lower limit of virtual segments for one statement execution regardless the actual cluster size, while NVSEG_LOWER_LIMIT_PERSEG defines the same limit by the average virtual segment number in one segment. Therefore, the limit defined by NVSEG_LOWER_LIMIT_PERSEG varies dynamically along with the size of HAWQ cluster. NVSEG_UPPER_LIMIT_PERSEG cannot be less than NVSEG_LOWER_LIMIT_PERSEG if both limits are set enabled.

For example, if you set NVSEG_LOWER_LIMIT=10, and one statement execution potentially needs no fewer than 10 virtual segments, then this request has at least 10 virtual segments allocated. If you set NVSEG_UPPER_LIMIT_PERSEG=2, assuming there are currently 5 available HAWQ segments in the cluster, and one statement execution potentially needs no fewer than 10 virtual segments, then the query resource request will be allocated at least 10 virtual segments. If one statement execution needs at most 4 virtual segments, the resource manager will allocate at most 4 virtual segments instead of 10 since this resource request does not need more than 9 virtual segments. By default, both limits are set to -1, which means the limits are disabled. NVSEG_LOWER_LIMIT has higher priority than NVSEG_LOWER_LIMIT_PERSEG. If both limits are set, then NVSEG_LOWER_LIMIT_PERSEG is ignored. If you have enabled resource quotas for the query statement, then these limits are ignored.

Note: If the actual lower limit of the number of virtual segments becomes greater than the upper limit, then the lower limit is automatically reduced to be equal to the upper limit. This situation is possible when user sets both NVSEG_UPPER_LIMITand NVSEG_LOWER_LIMIT_PERSEG. After expanding the cluster, the dynamic lower limit may become greater than the value set for the fixed upper limit.

Notes

To check on the configuration of a resource queue, you can query the pg_resqueue catalog table. To see the runtime status of all resource queues, you can use the pg_resqueue_status. See Checking Existing Resource Queues.

CREATE RESOURCE QUEUE cannot be run within a transaction.

To see the status of a resource queue, see Checking Existing Resource Queues.

Examples

Create a resource queue as a child of pg_root with an active query limit of 20 and memory and core limits of 50%:

1
2
CREATE RESOURCE QUEUE myqueue WITH (PARENT='pg_root', ACTIVE_STATEMENTS=20,
MEMORY_LIMIT_CLUSTER=50%, CORE_LIMIT_CLUSTER=50%);

Create a resource queue as a child of pg_root with memory and CPU limits and a resource overcommit factor:

1
2
CREATE RESOURCE QUEUE test_queue_1 WITH (PARENT='pg_root',
MEMORY_LIMIT_CLUSTER=50%, CORE_LIMIT_CLUSTER=50%, RESOURCE_OVERCOMMIT_FACTOR=2);

Compatibility

CREATE RESOURCE QUEUE is a HAWQ extension. There is no provision for resource queues or workload management in the SQL standard.

CREATE ROLE

Defines a new database role (user or group).

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE ROLE <name> [[WITH] <option> [ ... ]]
where <option> can be:

      SUPERUSER | NOSUPERUSER
    | CREATEDB | NOCREATEDB
    | CREATEROLE | NOCREATEROLE
    | CREATEEXTTABLE | NOCREATEEXTTABLE
      [ ( <attribute>='<value>'[, ...] ) ]
           where attribute and value are:
           type='readable'|'writable'
           protocol='gpfdist'|'http'
    | INHERIT | NOINHERIT
    | LOGIN | NOLOGIN
    | CONNECTION LIMIT <connlimit>
    | [ ENCRYPTED | UNENCRYPTED ] PASSWORD '<password>'
    | VALID UNTIL '<timestamp>'
    | IN ROLE <rolename> [, ...]
    | ROLE <rolename> [, ...]
    | ADMIN <rolename> [, ...]
    | RESOURCE QUEUE <queue_name>
    | [ DENY <deny_point> ]
    | [ DENY BETWEEN <deny_point> AND <deny_point>]

Description

CREATE ROLE adds a new role to a HAWQ system. A role is an entity that can own database objects and have database privileges. A role can be considered a user, a group, or both depending on how it is used. You must have CREATEROLE privilege or be a database superuser to use this command.

Note that roles are defined at the system-level and are valid for all databases in your HAWQ system.

Parameters

1
<name>

The name of the new role.

1
2
SUPERUSER,
NOSUPERUSER

If SUPERUSER is specified, the role being defined will be a superuser, who can override all access restrictions within the database. Superuser status is dangerous and should be used only when really needed. You must yourself be a superuser to create a new superuser. NOSUPERUSER is the default.

1
2
CREATEDB,
NOCREATEDB

If CREATEDB is specified, the role being defined will be allowed to create new roles, alter other roles, and drop other roles. NOCREATEROLE (the default) will deny a role the ability to create roles or modify roles other than their own.

1
2
CREATEEXTTABLE,
NOCREATEEXTTABLE

If CREATEEXTTABLE is specified, the role being defined is allowed to create external tables. The default is readable and the default protocol is gpfdist if not specified. NOCREATEEXTTABLE (the default) denies the role the ability to create external tables. Using the file protocol when creating external tables is not supported. This is because HAWQ cannot guarantee scheduling executors on a specific host. Likewise, you cannot use the EXECUTE command with ON ALL and ON HOST for the same reason. Use the ON MASTER/number/SEGMENT segment_id to specify which segment instances are to execute the command.

1
2
INHERIT,
NOINHERIT

If specified, INHERIT (the default) allows the role to use whatever database privileges have been granted to all roles it is directly or indirectly a member of. With NOINHERIT, membership in another role only grants the ability to SET ROLE to that other role.

1
2
LOGIN,
NOLOGIN

If specified, LOGIN allows a role to log in to a database. A role having the LOGIN attribute can be thought of as a user. Roles with NOLOGIN (the default) are useful for managing database privileges, and can be thought of as groups.

1
CONNECTION LIMIT <connlimit>

The number maximum of concurrent connections this role can make. The default of -1 means there is no limitation.

1
PASSWORD <password>

Sets the user password for roles with the LOGIN attribute. If you do not plan to use password authentication you can omit this option. If no is specified, the password will be set to null and password authentication will always fail for that user. A null can optionally be written explicitly as PASSWORD NULL.

1
2
ENCRYPTED,
UNENCRYPTED

These key words control whether the password is stored encrypted in the system catalogs. (If neither is specified, the default behavior is determined by the configuration parameter password_encryption.) If the presented password string is already in MD5-encrypted format, then it is stored encrypted as-is, regardless of whether ENCRYPTED or UNENCRYPTED is specified (since the system cannot decrypt the specified encrypted password string). This allows reloading of encrypted passwords during dump/restore. Note that older clients may lack support for the MD5 authentication mechanism that is needed to work with passwords that are stored encrypted.

1
VALID UNTIL ’<timestamp>

The VALID UNTIL clause sets a date and time after which the role’s password is no longer valid. If this clause is omitted the password will never expire.

1
IN ROLE <rolename>

Adds the new role as a member of the named roles. Note that there is no option to add the new role as an administrator; use a separate GRANT command to do that.

1
ROLE <rolename>

Adds the named roles as members of this role, making this new role a group.

1
ADMIN <rolename>

The ADMIN clause is like ROLE, but the named roles are added to the new role WITH ADMIN OPTION, giving them the right to grant membership in this role to others.

1
RESOURCE QUEUE <queue_name>

The name of the resource queue to which the new user-level role is to be assigned. Only roles with LOGIN privilege can be assigned to a resource queue. The special keyword NONE means that the role is assigned to the default resource queue. A role can only belong to one resource queue.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
DENY <deny_point>,
DENY BETWEEN <deny_point> AND <deny_point>

The DENY and DENY BETWEEN keywords set time-based constraints that are enforced at login. DENY sets a day or a day and time to deny access. DENY BETWEEN sets an interval during which access is denied. Both use the parameter <deny_point> that has the following format:

DAY <day> [ TIME '<time>' ]
The two parts of the <deny_point> parameter use the following formats:

For <day>:

{'Sunday' | 'Monday' | 'Tuesday' |'Wednesday' | 'Thursday' | 'Friday' |
'Saturday' | 0-6 }
For <time>:

{ 00-23 : 00-59 | 01-12 : 00-59 { AM | PM }}
The DENY BETWEEN clause uses two <deny_point> parameters:

DENY BETWEEN deny_point AND deny_point

Notes

The preferred way to add and remove role members (manage groups) is to use GRANT and REVOKE.

The VALID UNTIL clause defines an expiration time for a password only, not for the role. The expiration time is not enforced when logging in using a non-password-based authentication method.

The INHERIT attribute governs inheritance of grantable privileges (access privileges for database objects and role memberships). It does not apply to the special role attributes set by CREATE ROLE and ALTER ROLE. For example, being a member of a role with CREATEDB privilege does not immediately grant the ability to create databases, even if INHERIT is set.

The INHERIT attribute is the default for reasons of backwards compatibility. In prior releases of HAWQ, users always had access to all privileges of groups they were members of. However, NOINHERIT provides a closer match to the semantics specified in the SQL standard.

Be careful with the CREATEROLE privilege. There is no concept of inheritance for the privileges of a CREATEROLE-role. That means that even if a role does not have a certain privilege but is allowed to create other roles, it can easily create another role with different privileges than its own (except for creating roles with superuser privileges). For example, if a role has the CREATEROLE privilege but not the CREATEDB privilege, it can create a new role with the CREATEDB privilege. Therefore, regard roles that have the CREATEROLE privilege as almost-superuser-roles.

The CONNECTION LIMIT option is never enforced for superusers.

Caution must be exercised when specifying an unencrypted password with this command. The password will be transmitted to the server in clear-text, and it might also be logged in the client’s command history or the server log. The client program createuser, however, transmits the password encrypted. Also, psql contains a command password that can be used to safely change the password later.

Examples

Create a role that can log in, but don’t give it a password:

1
CREATE ROLE jonathan LOGIN;

Create a role that belongs to a resource queue:

1
CREATE ROLE jonathan LOGIN RESOURCE QUEUE poweruser;

Create a role with a password that is valid until the end of 2009 (CREATE USER is the same as CREATE ROLE except that it implies LOGIN):

1
CREATE USER joelle WITH PASSWORD 'jw8s0F4' VALID UNTIL '2010-01-01';

Create a role that can create databases and manage other roles:

1
CREATE ROLE admin WITH CREATEDB CREATEROLE;

Create a role that does not allow login access on Sundays:

1
CREATE ROLE user3 DENY DAY 'Sunday';

Compatibility

The SQL standard defines the concepts of users and roles, but it regards them as distinct concepts and leaves all commands defining users to be specified by the database implementation. In HAWQ, users and roles are unified into a single type of object. Roles therefore have many more optional attributes than they do in the standard.

CREATE ROLE is in the SQL standard, but the standard only requires the syntax:

CREATE ROLE <name> [WITH ADMIN <rolename>] Allowing multiple initial administrators, and all the other options of CREATE ROLE, are HAWQ extensions.

The behavior specified by the SQL standard is most closely approximated by giving users the NOINHERIT attribute, while roles are given the INHERIT attribute.

CREATE SCHEMA

Defines a new schema.

Synopsis

1
2
3
4
CREATE SCHEMA <schema_name> [AUTHORIZATION <username>]
[<schema_element> [ ... ]]

CREATE SCHEMA AUTHORIZATION <rolename> [<schema_element> [ ... ]]

Description

CREATE SCHEMA enters a new schema into the current database. The schema name must be distinct from the name of any existing schema in the current database.

A schema is essentially a namespace: it contains named objects (tables, data types, functions, and operators) whose names may duplicate those of other objects existing in other schemas. Named objects are accessed either by qualifying their names with the schema name as a prefix, or by setting a search path that includes the desired schema(s). A CREATE command specifying an unqualified object name creates the object in the current schema (the one at the front of the search path, which can be determined with the function current_schema).

Optionally, CREATE SCHEMA can include subcommands to create objects within the new schema. The subcommands are treated essentially the same as separate commands issued after creating the schema, except that if the AUTHORIZATION clause is used, all the created objects will be owned by that role.

Parameters

1
<schema_name>

The name of a schema to be created. If this is omitted, the user name is used as the schema name. The name cannot begin with pg_, as such names are reserved for system catalog schemas.

1
<rolename>

The name of the role who will own the schema. If omitted, defaults to the role executing the command. Only superusers may create schemas owned by roles other than themselves.

1
<schema_element>

An SQL statement defining an object to be created within the schema. Currently, only CREATE TABLE, CREATE VIEW, CREATE INDEX, CREATE SEQUENCE, and GRANT are accepted as clauses within CREATE SCHEMA. Other kinds of objects may be created in separate commands after the schema is created.

Notes

To create a schema, the invoking user must have the CREATE privilege for the current database or be a superuser.

Examples

Create a schema:

1
CREATE SCHEMA myschema;

Create a schema for role joe (the schema will also be named joe):

1
CREATE SCHEMA AUTHORIZATION joe;

Compatibility

The SQL standard allows a DEFAULT CHARACTER SET clause in CREATE SCHEMA, as well as more subcommand types than are presently accepted by HAWQ.

The SQL standard specifies that the subcommands in CREATE SCHEMA may appear in any order. The present HAWQ implementation does not handle all cases of forward references in subcommands; it may sometimes be necessary to reorder the subcommands in order to avoid forward references.

According to the SQL standard, the owner of a schema always owns all objects within it. HAWQ allows schemas to contain objects owned by users other than the schema owner. This can happen only if the schema owner grants the CREATE privilege on the schema to someone else.

CREATE SEQUENCE

Defines a new sequence generator.

Synopsis

1
2
3
4
5
6
7
8
CREATE [TEMPORARY | TEMP] SEQUENCE <name>
    [INCREMENT [BY] <value>]
    [MINVALUE <minvalue> | NO MINVALUE]
    [MAXVALUE <maxvalue> | NO MAXVALUE]
    [START [ WITH ] <start>]
    [CACHE <cache>]
    [[NO] CYCLE]
    [OWNED BY { <table>.<column> | NONE }]

Description

CREATE SEQUENCE creates a new sequence number generator. This involves creating and initializing a new special single-row table. The generator will be owned by the user issuing the command.

If a schema name is given, then the sequence is created in the specified schema. Otherwise it is created in the current schema. Temporary sequences exist in a special schema, so a schema name may not be given when creating a temporary sequence. The sequence name must be distinct from the name of any other sequence, table, or view in the same schema.

After a sequence is created, you use the nextval function to operate on the sequence. For example, to insert a row into a table that gets the next value of a sequence:

1
INSERT INTO distributors VALUES (nextval('myserial'), 'acme');

You can also use the function setval to operate on a sequence, but only for queries that do not operate on distributed data. For example, the following query is allowed because it resets the sequence counter value for the sequence generator process on the master:

1
SELECT setval('myserial', 201);

But the following query will be rejected in HAWQ because it operates on distributed data:

1
INSERT INTO product VALUES (setval('myserial', 201), 'gizmo');

In a regular (non-distributed) database, functions that operate on the sequence go to the local sequence table to get values as they are needed. In HAWQ, however, keep in mind that each segment is its own distinct database process. Therefore the segments need a single point of truth to go for sequence values so that all segments get incremented correctly and the sequence moves forward in the right order. A sequence server process runs on the master and is the point-of-truth for a sequence in a HAWQ distributed database. Segments get sequence values at runtime from the master.

Because of this distributed sequence design, there are some limitations on the functions that operate on a sequence in HAWQ:

  • lastval and currval functions are not supported.
  • setval can only be used to set the value of the sequence generator on the master, it cannot be used in subqueries to update records on distributed table data.
  • nextval sometimes grabs a block of values from the master for a segment to use, depending on the query. So values may sometimes be skipped in the sequence if all of the block turns out not to be needed at the segment level. Note that a regular PostgreSQL database does this too, so this is not something unique to HAWQ.

Although you cannot update a sequence directly, you can use a query like:

1
SELECT * FROM <sequence_name>;

to examine the parameters and current state of a sequence. In particular, the last_value field of the sequence shows the last value allocated by any session.

Parameters

1
TEMPORARY | TEMP

If specified, the sequence object is created only for this session, and is automatically dropped on session exit. Existing permanent sequences with the same name are not visible (in this session) while the temporary sequence exists, unless they are referenced with schema-qualified names.

1
<name>

The name (optionally schema-qualified) of the sequence to be created.

1
<increment>

Specifies which value is added to the current sequence value to create a new value. A positive value will make an ascending sequence, a negative one a descending sequence. The default value is 1.

1
2
<minvalue>
NO MINVALUE

Determines the minimum value a sequence can generate. If this clause is not supplied or NO MINVALUE is specified, then defaults will be used. The defaults are 1 and -263-1 for ascending and descending sequences, respectively.

1
2
<maxvalue>
NO MAXVALUE

Determines the maximum value for the sequence. If this clause is not supplied or NO MAXVALUE is specified, then default values will be used. The defaults are 263-1 and -1 for ascending and descending sequences, respectively.

1
<start>

Allows the sequence to begin anywhere. The default starting value is for ascending sequences and for descending ones.

1
<cache>

Specifies how many sequence numbers are to be preallocated and stored in memory for faster access. The minimum (and default) value is 1 (no cache).

1
2
CYCLE
NO CYCLE

Allows the sequence to wrap around when the (for ascending) or (for descending) has been reached. If the limit is reached, the next number generated will be the (for ascending) or (for descending). If NO CYCLE is specified, any calls to nextval after the sequence has reached its maximum value will return an error. If not specified, NO CYCLE is the default.

1
2
OWNED BY <table>.<column>
OWNED BY NONE

Causes the sequence to be associated with a specific table column, such that if that column (or its whole table) is dropped, the sequence will be automatically dropped as well. The specified table must have the same owner and be in the same schema as the sequence. OWNED BY NONE, the default, specifies that there is no such association.

Notes

Sequences are based on bigint arithmetic, so the range cannot exceed the range of an eight-byte integer (-9223372036854775808 to 9223372036854775807).

Although multiple sessions are guaranteed to allocate distinct sequence values, the values may be generated out of sequence when all the sessions are considered. For example, session A might reserve values 1..10 and return nextval=1, then session B might reserve values 11..20 and return nextval=11 before session A has generated nextval=2. Thus, you should only assume that the nextval values are all distinct, not that they are generated purely sequentially. Also,last_value will reflect the latest value reserved by any session, whether or not it has yet been returned by nextval.

Examples

Create a sequence named myseq:

1
CREATE SEQUENCE myseq START 101;

Insert a row into a table that gets the next value:

1
INSERT INTO distributors VALUES (nextval('myseq'), 'acme');

Reset the sequence counter value on the master:

1
SELECT setval('myseq', 201);

Illegal use of setval in HAWQ (setting sequence values on distributed data):

1
INSERT INTO product VALUES (setval('myseq', 201), 'gizmo');

Compatibility

CREATE SEQUENCE conforms to the SQL standard, with the following exceptions:

  • The AS data_type expression specified in the SQL standard is not supported.
  • Obtaining the next value is done using the nextval() function instead of the NEXT VALUE FOR expression specified in the SQL standard.
  • The OWNED BY clause is a HAWQ extension.

CREATE TABLE

Defines a new table.

Synopsis

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE <table_name> (
[ { <column_name> <data_type> [ DEFAULT <default_expr> ]
   [<column_constraint> [ ... ]
[ ENCODING ( <storage_directive> [,...] ) ]
]
   | <table_constraint>
   | LIKE <other_table> [{INCLUDING | EXCLUDING}
                      {DEFAULTS | CONSTRAINTS}] ...} ]
    [, ... ] ]
    [<column_reference_storage_directive> [, …] ]
    )
    [ INHERITS ( <parent_table> [, ... ] ) ]
    [ WITH ( <storage_parameter>=<value> [, ... ] )
    [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ]
    [ TABLESPACE <tablespace> ]
    [ DISTRIBUTED BY (<column>, [ ... ] ) | DISTRIBUTED RANDOMLY ]
    [ PARTITION BY <partition_type> (<column>)
    [ SUBPARTITION BY <partition_type> (<column>) ]
       [ SUBPARTITION TEMPLATE ( <template_spec> ) ]
    [...]
    ( <partition_spec> )
       | [ SUBPARTITION BY partition_type (<column>) ]
           [...]
    ( <partition_spec>
      [ ( <subpartition_spec>
           [(...)]
         ) ]
    )

 CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE <table_name> (
 [ { <column_name> <data_type> [ DEFAULT <default_expr> ]
 [ ENCODING ( <storage_directive> [,...] ) ]
     | <table_constraint>
     | LIKE <other_table> [{INCLUDING | EXCLUDING}
                        {DEFAULTS | CONSTRAINTS}] ...} ]
       [, ... ]
       [<column_reference_storage_directive> [, …] ]
       )
     FORMAT 'TEXT'
             [( [HEADER]
             [DELIMITER '<delimiter>' | 'OFF']
             [NULL '<null string>']
             [ESCAPE '<escape>' | 'OFF']
             [NEWLINE 'LF' | 'CR' | 'CRLF']
             [FILL MISSING FIELDS] )]
       | 'CSV'
             [( [HEADER]
             [QUOTE '<quote>']
             [DELIMITER '<delimiter>']
             [NULL '<null string>']
             [FORCE NOT NULL <column> [, ...]]
             [ESCAPE '<escape>']
             [NEWLINE 'LF' | 'CR' | 'CRLF']
             [FILL MISSING FIELDS] )]
       | 'ORC'
             [( [COMPRESSTYPE 'NONE' | 'SNAPPY' | 'LZ4']
             [DICTHRESHOLD [0-1]]
             [BLOOMFILTER [column_name] [, ...]] )]
     [ ENCODING '<encoding>' ]
     [ DISTRIBUTED BY (<column>, [ ... ] ) | DISTRIBUTED RANDOMLY ]
     [ PARTITION BY <partition_type> (<column>)
     [ SUBPARTITION BY <partition_type> (<column>) ]
        [ SUBPARTITION TEMPLATE ( <template_spec> ) ]
     [...]
     ( <partition_spec> )
        | [ SUBPARTITION BY partition_type (<column>) ]
            [...]
     ( <partition_spec>
       [ ( <subpartition_spec>
            [(...)]
          ) ]


where <column_constraint> is:

   [CONSTRAINT <constraint_name>]
   NOT NULL | NULL
   | CHECK ( <expression> )

where <storage_directive> for a column is:

   COMPRESSTYPE={ZLIB | SNAPPY | GZIP | NONE}
  | COMPRESSLEVEL={0-9}
  | BLOCKSIZE={8192-2097152}

where <storage_parameter> for a table is:

   APPENDONLY={TRUE}
   BLOCKSIZE={8192-2097152}
   bucketnum={<x>}
   ORIENTATION={ROW | PARQUET}
   COMPRESSTYPE={ZLIB | SNAPPY | GZIP | NONE}
   COMPRESSLEVEL={0-9}
   FILLFACTOR={10-100}
   OIDS=[TRUE|FALSE]
   PAGESIZE={1024-1073741823}
   ROWGROUPSIZE={1024-1073741823}

and <table_constraint> is:

   [CONSTRAINT <constraint_name>]
   | CHECK ( <expression> )

where <partition_type> is:

    LIST  | RANGE

where <partition_specification> is:

            <partition_element> [, ...]

and <partition_element> is:

   DEFAULT PARTITION <name>
  | [PARTITION <name>] VALUES (<list_value> [,...] )
  | [PARTITION <name>]
     START ([<datatype>] '<start_value>') [INCLUSIVE | EXCLUSIVE]
     [ END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE] ]
     [ EVERY ([<datatype>] [<number> | INTERVAL] '<interval_value>') ]
  | [PARTITION <name>]
     END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE]
     [ EVERY ([<datatype>] [<number> | INTERVAL] '<interval_value>') ]
 [ WITH ( <partition_storage_parameter>=<value> [, ... ] ) ]
 [<column_reference_storage_directive> [, …] ]
 [ TABLESPACE <tablespace> ]

where <subpartition_spec> or <template_spec> is:

            <subpartition_element> [, ...]
and <subpartition_element> is:

   DEFAULT SUBPARTITION <name>
  | [SUBPARTITION <name>] VALUES (<list_value> [,...] )
  | [SUBPARTITION <name>]
     START ([<datatype>] '<start_value>') [INCLUSIVE | EXCLUSIVE]
     [ END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE] ]
     [ EVERY ([<datatype>] [<number> | INTERVAL] '<interval_value>') ]
  | [SUBPARTITION <name>]
     END ([<datatype>] '<end_value>') [INCLUSIVE | EXCLUSIVE]
     [ EVERY ([<datatype>] [<number> | INTERVAL] '<interval_value>') ]
[ WITH ( <partition_storage_parameter>=<value> [, ... ] ) ]
[<column_reference_storage_directive> [, …] ]
[ TABLESPACE <tablespace> ]

where <storage_directive> is:

   COMPRESSTYPE={ZLIB | SNAPPY | GZIP | NONE}
 | COMPRESSLEVEL={0-9}
 | BLOCKSIZE={8192-2097152}

where <column_reference_storage_directive> is:

   COLUMN column_name ENCODING (<storage_directive> [, ... ] ), ...
 |
    DEFAULT COLUMN ENCODING (<storage_directive> [, ... ] )

where <storage_parameter> for a partition is:

   APPENDONLY={TRUE}
   BLOCKSIZE={8192-2097152}
   ORIENTATION={ROW | PARQUET}
   COMPRESSTYPE={ZLIB | SNAPPY | GZIP | NONE}
   COMPRESSLEVEL={0-9}
   FILLFACTOR={10-100}
   OIDS=[TRUE|FALSE]
   PAGESIZE={1024-1073741823}
   ROWGROUPSIZE={1024-1073741823}

Description

CREATE TABLE creates a new, initially empty table in the current database. The table is owned by the user issuing the command. If a schema name is given then the table is created in the specified schema. Otherwise it is created in the current schema. Temporary tables exist in a special schema, so a schema name may not be given when creating a temporary table. The name of the table must be distinct from the name of any other table, external table, sequence, or view in the same schema.

The optional constraint clauses specify conditions that new rows must satisfy for an insert operation to succeed. A constraint is an SQL object that helps define the set of valid values in the table in various ways. Constraints apply to tables, not to partitions. You cannot add a constraint to a partition or subpartition.

There are two ways to define constraints: table constraints and column constraints. A column constraint is defined as part of a column definition. A table constraint definition is not tied to a particular column, and it can encompass more than one column. Every column constraint can also be written as a table constraint; a column constraint is only a notational convenience for use when the constraint only affects one column.

When creating a table, there is an additional clause to declare the HAWQ distribution policy. If a DISTRIBUTED BY clause is not supplied, HAWQ assigns a RANDOM distribution policy to the table, where the rows are distributed based on a round-robin or random distribution. You can also choose to distribute data with a hash-based policy, where the bucketnum attribute sets the number of hash buckets used by a hash-distributed table. Columns of geometric or user-defined data types are not eligible as HAWQ distribution key columns. The number of buckets affects how many virtual segments will be used in processing.

By default, a HASH distributed table is created with the number of hash buckets specified by the parameter <default_hash_table_bucket_number>. This can be changed in session level or in the create table DDL with bucketnum storage parameter.

Note: Column-oriented tables are no longer supported. Use Parquet tables for HAWQ internal tables.

The PARTITION BY clause allows you to divide the table into multiple sub-tables (or parts) that, taken together, make up the parent table and share its schema. Though the sub-tables exist as independent tables, HAWQ restricts their use in important ways. Internally, partitioning is implemented as a special form of inheritance. Each child table partition is created with a distinct CHECK constraint which limits the data the table can contain, based on some defining criteria. The CHECK constraints are also used by the query planner to determine which table partitions to scan in order to satisfy a given query predicate. These partition constraints are managed automatically by HAWQ.

Parameters

1
GLOBAL | LOCAL

These keywords are present for SQL standard compatibility, but have no effect in HAWQ.

1
TEMPORARY | TEMP

If specified, the table is created as a temporary table. Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT). Existing permanent tables with the same name are not visible to the current session while the temporary table exists, unless they are referenced with schema-qualified names. Any indexes created on a temporary table are automatically temporary as well.

1
<table_name>

The name (optionally schema-qualified) of the table to be created.

1
<column_name>

The name of a column to be created in the new table.

1
<data_type>

The data type of the column. This may include array specifiers.

1
DEFAULT <default_expr>

The DEFAULT clause assigns a default data value for the column whose column definition it appears within. The value is any variable-free expression (subqueries and cross-references to other columns in the current table are not allowed). The data type of the default expression must match the data type of the column. The default expression will be used in any insert operation that does not specify a value for the column. If there is no default for a column, then the default is null.

1
INHERITS

The optional INHERITS clause specifies a list of tables from which the new table automatically inherits all columns. Use of INHERITS creates a persistent relationship between the new child table and its parent table(s). Schema modifications to the parent(s) normally propagate to children as well, and by default the data of the child table is included in scans of the parent(s).

In HAWQ, the INHERITS clause is not used when creating partitioned tables. Although the concept of inheritance is used in partition hierarchies, the inheritance structure of a partitioned table is created using the PARTITION BY clause.

If the same column name exists in more than one parent table, an error is reported unless the data types of the columns match in each of the parent tables. If there is no conflict, then the duplicate columns are merged to form a single column in the new table. If the column name list of the new table contains a column name that is also inherited, the data type must likewise match the inherited column(s), and the column definitions are merged into one. However, inherited and new column declarations of the same name need not specify identical constraints: all constraints provided from any declaration are merged together and all are applied to the new table. If the new table explicitly specifies a default value for the column, this default overrides any defaults from inherited declarations of the column. Otherwise, any parents that specify default values for the column must all specify the same default, or an error will be reported.

1
LIKE <other_table> [{INCLUDING | EXCLUDING} {DEFAULTS | CONSTRAINTS}]

The LIKE clause specifies a table from which the new table automatically copies all column names, data types, not-null constraints, and distribution policy. Storage properties like append-only or partition structure are not copied. Unlike INHERITS, the new table and original table are completely decoupled after creation is complete. Default expressions for the copied column definitions will only be copied if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having null defaults.

Not-null constraints are always copied to the new table. CHECK constraints will only be copied if INCLUDING CONSTRAINTS is specified; other types of constraints will never be copied. Also, no distinction is made between column constraints and table constraints — when constraints are requested, all check constraints are copied.

Note also that unlike INHERITS, copied columns and constraints are not merged with similarly named columns and constraints. If the same name is specified explicitly or in another LIKE clause an error is signalled.

1
NULL | NOT NULL

Specifies if the column is or is not allowed to contain null values. NULL is the default.

1
CHECK ( <expression> )

The CHECK clause specifies an expression producing a Boolean result which new rows must satisfy for an insert operation to succeed. Expressions evaluating to TRUE or UNKNOWN succeed. Should any row of an insert operation produce a FALSE result an error exception is raised and the insert does not alter the database. A check constraint specified as a column constraint should reference that column’s value only, while an expression appearing in a table constraint may reference multiple columns. CHECK expressions cannot contain subqueries nor refer to variables other than columns of the current row.

1
WITH ( <storage_option>=<value> )

The WITH clause can be used to set storage options for the table or its indexes. Note that you can also set storage parameters on a particular partition or subpartition by declaring the WITH clause in the partition specification. Note: You cannot create a table with both column encodings and compression parameters in a WITH clause.

The following storage options are available:

  • APPENDONLY — Set to TRUE to create the table as an append-only table. If FALSE is specified, an error message displays stating that heap tables are not supported.
  • BLOCKSIZE — Set to the size, in bytes for each block in a table. The BLOCKSIZE must be between 8192 and 2097152 bytes, and be a multiple of 8192. The default is 32768.
  • bucketnum — Set to the number of hash buckets to be used in creating a hash-distributed table, specified as an integer greater than 0 and no more than the value of default_hash_table_bucket_number. The default when the table is created is 6 times the segment count. However, explicitly setting the bucket number when creating a hash table is recommended.
  • ORIENTATION — Set to row (the default) for row-oriented storage, or parquet. The parquet column-oriented format can be more efficient for large-scale queries. This option is only valid if APPENDONLY=TRUE.
  • COMPRESSTYPE — Set to ZLIB, SNAPPY, or GZIP to specify the type of compression used. ZLIB provides more compact compression ratios at lower speeds. Parquet tables support SNAPPY and GZIP compression. Append-only tables support SNAPPY and ZLIB compression. This option is valid only if APPENDONLY=TRUE.
  • COMPRESSLEVEL — Set to an integer value from 1 (fastest compression) to 9 (highest compression ratio). If not specified, the default is 1. This option is valid only if APPENDONLY=TRUE and COMPRESSTYPE=[ZLIB|GZIP].
  • OIDS — Set to OIDS=FALSE (the default) so that rows do not have object identifiers assigned to them. Do not enable OIDS when creating a table. On large tables, such as those in a typical HAWQ system, using OIDs for table rows can cause wrap-around of the 32-bit OID counter. Once the counter wraps around, OIDs can no longer be assumed to be unique, which not only makes them useless to user applications, but can also cause problems in the HAWQ system catalog tables. In addition, excluding OIDs from a table reduces the space required to store the table on disk by 4 bytes per row, slightly improving performance. OIDS are not allowed on partitioned tables.
1
ON COMMIT

The behavior of temporary tables at the end of a transaction block can be controlled using ON COMMIT. The three options are:

  • PRESERVE ROWS - No special action is taken at the ends of transactions for temporary tables. This is the default behavior.
  • DELETE ROWS - All rows in the temporary table will be deleted at the end of each transaction block. Essentially, an automatic TRUNCATE is done at each commit.
  • DROP - The temporary table will be dropped at the end of the current transaction block.
1
TABLESPACE <tablespace>

The name of the tablespace in which the new table is to be created. If not specified, the database’s default tablespace dfs_default is used. Creating table on tablespace pg_default is not allowed.

1
2
DISTRIBUTED BY (<column>, [ … ] )
DISTRIBUTED RANDOMLY

Used to declare the HAWQ distribution policy for the table. The default is RANDOM distribution. DISTIBUTED BY can use hash distribution with one or more columns declared as the distribution key. If hash distribution is desired, it must be specified using the first eligible column of the table as the distribution key.

1
PARTITION BY

Declares one or more columns by which to partition the table.

1
<partition_type>

Declares partition type: LIST (list of values) or RANGE (a numeric or date range).

1
<partition_specification>

Declares the individual partitions to create. Each partition can be defined individually or, for range partitions, you can use the EVERY clause (with a START and optional END clause) to define an increment pattern to use to create the individual partitions.

  • DEFAULT PARTITION <name> — Declares a default partition. When data does not match to an existing partition, it is inserted into the default partition. Partition designs that do not have a default partition will reject incoming rows that do not match to an existing partition.
  • PARTITION <name> — Declares a name to use for the partition. Partitions are created using the following naming convention: parentname_level#_prt_givenname.
  • VALUES — For list partitions, defines the value(s) that the partition will contain.
  • START — For range partitions, defines the starting range value for the partition. By default, start values are INCLUSIVE. For example, if you declared a start date of ’2008-01-01’, then the partition would contain all dates greater than or equal to ’2008-01-01’. Typically the data type of the START expression is the same type as the partition key column. If that is not the case, then you must explicitly cast to the intended data type.
  • END — For range partitions, defines the ending range value for the partition. By default, end values are EXCLUSIVE. For example, if you declared an end date of ’2008-02-01’, then the partition would contain all dates less than but not equal to ’2008-02-01’. Typically the data type of the END expression is the same type as the partition key column. If that is not the case, then you must explicitly cast to the intended data type.
  • EVERY — For range partitions, defines how to increment the values from START to END to create individual partitions. Typically the data type of the EVERY expression is the same type as the partition key column. If that is not the case, then you must explicitly cast to the intended data type.
  • WITH — Sets the table storage options for a partition. For example, you may want older partitions to be append-only tables and newer partitions to be regular heap tables.
  • TABLESPACE — The name of the tablespace in which the partition is to be created.
1
SUBPARTITION BY

Declares one or more columns by which to subpartition the first-level partitions of the table. The format of the subpartition specification is similar to that of a partition specification described above.

1
SUBPARTITION TEMPLATE

Instead of declaring each subpartition definition individually for each partition, you can optionally declare a subpartition template to be used to create the subpartitions. This subpartition specification would then apply to all parent partitions.

Notes

Using OIDs in new applications is not recommended. Avoid assuming that OIDs are unique across tables; if you need a database-wide unique identifier, use the combination of table OID and row OID for the purpose.

Primary key and foreign key constraints are not supported in HAWQ. For inherited tables, table privileges are not inherited in the current implementation.

HAWQ also supports the parquet columnar storage format. Parquet tables can be more efficient for increasing performance on large queries.

Setting Parameters for Parquet Tables

You can set three kinds of parameters for a parquet table.

  • Set the parquet orientation parameter:
1
with (appendonly=true, orientation=parquet);
  • Set the compression type parameter. Parquet tables can be compressed using either SNAPPY or GZIP. GZIP supports compression level values between 1 and 9. SNAPPY does not support compression level; providing a compression level when using SNAPPY will cause the create table operation to fail. Specifying a compression level but no compression type when creating a parquet table will default to GZIP compression. Note: For best performance with parquet storage, use SNAPPY compression.
  • Set the data storage parameter: By default, the two parameters, PAGESIZE and ROWGROUPSIZE are set to 1MB/8MB for common and partitioned tables. Note: The page size should be less than the rowgroup size. This is because rowgroup includes the metadata information of a single page even for a single column table. The parameters PAGESIZE and ROWGROUPSIZE are valid for parquet tables, while BLOCKSIZE is valid for append-only tables

About Orc Storage

Orc table is a kind of plugable storage hdfs external table. It supports compress type: none, snappy and lz4. In version 3.1.0 adds dictionary encoding for string column. If the number of distinct keys in a dictionary is greater than DICTHRESHOLD of the total number of non-null rows, turn off dictionary encoding then. lz4 and DICTHRESHOLD=0.8 is recommended for best performance. For data type, it supports bool, int2, int4, int8, float4, float8, char, varchar, text, bytea, date, time, timestamp and array. In version 3.1.0, it supports partition table. Partition table split and exchange is not supported yet.

ORC Examples

Create an orc table with partition specification:

1
2
3
4
5
6
CREATE TABLE sales (id int, date date) format 'orc'
    PARTITION BY RANGE (date)
( START (date '2016-01-01') INCLUSIVE
  END (date '2016-01-06') EXCLUSIVE
  EVERY (INTERVAL '2 day')
  with (format 'orc' (compresstype 'lz4')));

About Parquet Storage

DDL and DML: Most DDL and DML operations are valid for a parquet table. The usage for DDL and DML operations is similar to append-only tables. Valid operations on parquet tables include:

  • Parquet table creation (with/without partition, with/without compression type)
  • Insert and Select

Compression type and level: You can only set the compression type at the table level. HAWQ does not support setting column level compression. The specified compression type is propagated to the columns. All the columns must have the same compress type and level.

Using SNAPPY compression with parquet files is recommended for best performance.

Data type: HAWQ supports all data types except arrays and user defined types.

Alter table: HAWQ does not support adding a new column to an existing parquet table or dropping a column. You can use ALTER TABLE for a partition operation.

FillFactor/OIDS/Checksum: HAWQ does not support these operations when creating parquet tables. The default value for checksum for a parquet table is false. You cannot set this value or specify fillfactor and oids.

Memory occupation: When inserting or loading data to a parquet table, the whole rowgroup is stored in physical memory until the size exceeds the threshold or the end of the INSERT operation. Once either occurs, the entire rowgroup is flushed to disk. Also, at the beginning of the INSERT operation, each column is pre-allocated a page buffer. The column pre-allocated page buffer size should be min(pageSizeLimit, rowgroupSizeLimit/estimatedColumnWidth/estimatedRecordWidth) for the first rowgroup. For the following rowgroup, it should be min(pageSizeLimit, actualColumnChunkSize in last rowgroup * 1.05), of which 1.05 is the estimated scaling factor. When reading data from a parquet table, the requested columns of the row group are loaded into memory. Memory is allocated 8 MB by default. Ensure that memory occupation does not exceed physical memory when setting ROWGROUPSIZE or PAGESIZE, otherwise you may encounter an out of memory error.

Bulk vs. trickle loads: Only bulk loads are recommended for use with parquet tables. Trickle loads can result in bloated footers and larger data files.

Parquet Examples

Create an append-only table using the parquet format:

1
2
3
4
CREATE TABLE customer ( id integer, fname text, lname text,
    address text, city text, state text, zip text )
WITH (APPENDONLY=true, ORIENTATION=parquet, OIDS=FALSE)
DISTRIBUTED BY (id);

Create a parquet table with twelve monthly partitions:

1
2
3
4
5
6
7
8
CREATE TABLE sales (id int, date date, amt decimal(10,2))
WITH (APPENDONLY=true, ORIENTATION=parquet, OIDS=FALSE)
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
  ( START (date '2016-01-01') INCLUSIVE
    END   (date '2017-01-01') EXCLUSIVE
    EVERY (INTERVAL '1 month')
  );

Add a new partition to the sales table:

1
2
3
ALTER TABLE sales ADD PARTITION
   START (date '2017-01-01') INCLUSIVE
   END (date '2017-02-01') EXCLUSIVE;

AO Examples

Append-only tables support ZLIB and SNAPPY compression types.

Create a table named rank in the schema named baby and distribute the data using the columns rank, gender, and year:

1
2
CREATE TABLE baby.rank ( id int, rank int, year smallint, gender char(1), count int )
DISTRIBUTED BY (rank, gender, year);

Create table films and table distributors. The first column will be used as the HAWQ distribution key by default:

1
2
3
4
5
6
7
8
9
CREATE TABLE films (
    code char(5), title varchar(40) NOT NULL, did integer NOT NULL,
    date_prod date, kind varchar(10), len interval hour to minute
);

CREATE TABLE distributors (
    did integer,
    name varchar(40) NOT NULL CHECK (name <> '')
);

Create a three level partitioned table using subpartition templates and default partitions at each level:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CREATE TABLE sales (id int, year int, month int, day int,
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
SUBPARTITION BY RANGE (month)
SUBPARTITION TEMPLATE (
START (1) END (13) EVERY (1),
DEFAULT SUBPARTITION other_months )
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE (
SUBPARTITION usa VALUES ('usa'),
SUBPARTITION europe VALUES ('europe'),
SUBPARTITION asia VALUES ('asia'),
DEFAULT SUBPARTITION other_regions)
( START (2002) END (2010) EVERY (1),
DEFAULT PARTITION outlying_years);

Create a hash-distributed table named “sales” with 100 buckets.

1
2
3
CREATE TABLE sales(id int, profit float)
WITH (bucketnum=100)
DISTRIBUTED BY (id);

Compatibility

The CREATE TABLE command conforms to the SQL standard, with the following exceptions:

  • Temporary Tables — In the SQL standard, temporary tables are defined just once and automatically exist (starting with empty contents) in every session that needs them. HAWQ instead requires each session to issue its own CREATE TEMPORARY TABLE command for each temporary table to be used. This allows different sessions to use the same temporary table name for different purposes, whereas the standard’s approach constrains all instances of a given temporary table name to have the same table structure. The standard’s distinction between global and local temporary tables is not in HAWQ. HAWQ will accept the GLOBAL and LOCAL keywords in a temporary table declaration, but they have no effect. If the ON COMMIT clause is omitted, the SQL standard specifies that the default behavior as ON COMMIT DELETE ROWS. However, the default behavior in HAWQ is ON COMMIT PRESERVE ROWS. The ON COMMIT DROP option does not exist in the SQL standard.
  • Column Check Constraints — The SQL standard says that CHECK column constraints may only refer to the column they apply to; only CHECK table constraints may refer to multiple columns. HAWQ does not enforce this restriction; it treats column and table check constraints alike.
  • NULL Constraint — The NULL constraint is a HAWQ extension to the SQL standard that is included for compatibility with some other database systems (and for symmetry with the NOT NULL constraint). Since it is the default for any column, its presence is not required.
  • Inheritance — Multiple inheritance via the INHERITS clause is a HAWQ language extension. SQL:1999 and later define single inheritance using a different syntax and different semantics. SQL:1999-style inheritance is not yet supported by HAWQ. Partitioning — Table partitioning via the PARTITION BY clause is a HAWQ language extension.
  • Zero-column tables — HAWQ allows a table of no columns to be created (for example, CREATE TABLE foo();). This is an extension from the SQL standard, which does not allow zero-column tables. Zero-column tables are not in themselves very useful, but disallowing them creates odd special cases for ALTER TABLE DROP COLUMN, so this spec restriction is ignored.
  • WITH clause — The WITH clause is an extension; neither storage parameters nor OIDs are in the standard.
  • Tablespaces — The HAWQ concept of tablespaces is not part of the SQL standard. The clauses TABLESPACE and USING INDEX TABLESPACE are extensions.
  • Data Distribution — The HAWQ concept of a parallel or distributed database is not part of the SQL standard. The DISTRIBUTED clauses are extensions.

CREATE TABLE AS

Defines a new table from the results of a query.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
CREATE [ [GLOBAL | LOCAL] {TEMPORARY | TEMP} ] TABLE <table_name>
   [(<column_name> [, ...] )]
   [ WITH ( storage_parameter=<value> [, ... ] )
   [ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP}]
   [TABLESPACE <tablespace>]
   AS <query>
   [DISTRIBUTED BY (<column>, [ ... ] ) | DISTRIBUTED RANDOMLY]

where <storage_parameter> is:

   APPENDONLY={TRUE}
   BLOCKSIZE={8192-2097152}
   bucketnum={<x>}
   ORIENTATION={ROW | PARQUET}
   COMPRESSTYPE={ZLIB | SNAPPY | GZIP | NONE}
   COMPRESSLEVEL={0-9 | 1}
   FILLFACTOR={10-100}
   OIDS=[TRUE | FALSE]
   PAGESIZE={1024-1073741823}
   ROWGROUPSIZE={1024-1073741823}

Description

CREATE TABLE AS creates a table and fills it with data computed by a SELECT command. The table columns have the names and data types associated with the output columns of the SELECT, however you can override the column names by giving an explicit list of new column names.

CREATE TABLE AS creates a new table and evaluates the query just once to fill the new table initially. The new table will not track subsequent changes to the source tables of the query.

Parameters

1
GLOBAL | LOCAL

These keywords are present for SQL standard compatibility, but have no effect in HAWQ.

1
TEMPORARY | TEMP

If specified, the new table is created as a temporary table. Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT). Existing permanent tables with the same name are not visible to the current session while the temporary table exists, unless they are referenced with schema-qualified names. Any indexes created on a temporary table are automatically temporary as well.

1
<table_name>

The name (optionally schema-qualified) of the new table to be created.

1
<column_name>

The name of a column in the new table. If column names are not provided, they are taken from the output column names of the query. If the table is created from an EXECUTE command, a column name list cannot be specified.

1
WITH (<storage_parameter>=<value> )

The WITH clause can be used to set storage options for the table or its indexes. Note that you can also set different storage parameters on a particular partition or subpartition by declaring the WITH clause in the partition specification. The following storage options are available:

  • APPENDONLY — Set to TRUE to create the table as an append-only table. If FALSE, an error message displays stating that heap tables are not supported.
  • BLOCKSIZE — Set to the size, in bytes for each block in a table. The BLOCKSIZE must be between 8192 and 2097152 bytes, and be a multiple of 8192. The default is 32768.
  • bucketnum — Set to the number of hash buckets to be used in creating a hash-distributed table. If changing the number of hash buckets, use WITH to specify bucketnum in creating a hash-distributed table. If distribution is specified by column, the table will inherit the value.
  • ORIENTATION — Set to row (the default) for row-oriented storage, or parquet. This option is only valid if APPENDONLY=TRUE. Heap-storage tables can only be row-oriented.
  • COMPRESSTYPE — Set to ZLIB, SNAPPY, or GZIP to specify the type of compression used. ZLIB provides more compact compression ratios at lower speeds. Parquet tables support SNAPPY and GZIP compression. Append-only tables support SNAPPY and ZLIB compression. This option is valid only if APPENDONLY=TRUE.
  • COMPRESSLEVEL — Set to an integer value from 1 (fastest compression) to 9 (highest compression ratio). If not declared, the default is 1. This option is valid only if APPENDONLY=TRUE and COMPRESSTYPE=[ZLIB|GZIP].
  • OIDS — Set to OIDS=FALSE (the default) so that rows do not have object identifiers assigned to them. Do not enable OIDS when creating a table. On large tables, such as those in a typical HAWQ system, using OIDs for table rows can cause wrap-around of the 32-bit OID counter. Once the counter wraps around, OIDs can no longer be assumed to be unique, which not only makes them useless to user applications, but can also cause problems in the HAWQ system catalog tables. In addition, excluding OIDs from a table reduces the space required to store the table on disk by 4 bytes per row, slightly improving performance.
1
ON COMMIT

The behavior of temporary tables at the end of a transaction block can be controlled using ON COMMIT. The three options are:

  • PRESERVE ROWS — No special action is taken at the ends of transactions for temporary tables. This is the default behavior.
  • DELETE ROWS — All rows in the temporary table will be deleted at the end of each transaction block. Essentially, an automatic TRUNCATE is done at each commit.
  • DROP — The temporary table will be dropped at the end of the current transaction block.
1
TABLESPACE <tablespace>

The tablespace is the name of the tablespace in which the new table is to be created. If not specified, the database’s default tablespace is used.

1
AS <query>

A [SELECT](/docs/hawq/reference/sql/select/ command, or an EXECUTE command that runs a prepared SELECT query.

1
2
DISTRIBUTED BY (<column>, [ … ] )
DISTRIBUTED RANDOMLY

Used to declare the HAWQ distribution policy for the table. The default is RANDOM distribution. DISTIBUTED BY can use hash distribution with one or more columns declared as the distribution key. If hash distribution is desired, it can be specified using bucketnum attribute, using the first eligible column of the table as the distribution key.

Notes

This command is functionally similar to SELECT INTO, but it is preferred since it is less likely to be confused with other uses of the SELECT INTO syntax. Furthermore, CREATE TABLE AS offers a superset of the functionality offered by SELECT INTO.

CREATE TABLE AS can be used for fast data loading from external table data sources. See CREATE EXTERNAL TABLE.

Examples

Create a new table films_recent consisting of only recent entries from the table films:

1
2
CREATE TABLE films_recent AS SELECT * FROM films WHERE
date_prod >= '2007-01-01';

Create a new temporary table films_recent, consisting of only recent entries from the table films, using a prepared statement. The new table has OIDs and will be dropped at commit:

1
2
3
4
PREPARE recentfilms(date) AS SELECT * FROM films WHERE
date_prod > $1;
CREATE TEMP TABLE films_recent WITH (OIDS) ON COMMIT DROP AS
EXECUTE recentfilms('2007-01-01');

Compatibility

CREATE TABLE AS conforms to the SQL standard, with the following exceptions:

  • The standard requires parentheses around the subquery clause; in HAWQ, these parentheses are optional.
  • The standard defines a WITH [NO] DATA clause; this is not currently implemented by HAWQ. The behavior provided by HAWQ is equivalent to the standard’s WITH DATA case. WITH NO DATA can be simulated by appending LIMIT 0 to the query.
  • HAWQ handles temporary tables differently from the standard; see CREATE TABLE for details.
  • The WITH clause is a HAWQ extension; neither storage parameters nor OIDs are in the standard.
  • The HAWQ concept of tablespaces is not part of the standard. The TABLESPACE clause is an extension.

CREATE TABLESPACE

Defines a new tablespace.

Synopsis

1
2
CREATE TABLESPACE <tablespace_name> [OWNER <username>]
       FILESPACE <filespace_name>

Description

CREATE TABLESPACE registers a new tablespace for your HAWQ system. The tablespace name must be distinct from the name of any existing tablespace in the system.

A tablespace allows superusers to define an alternative location on the file system where the data files containing database objects (such as tables) may reside.

A user with appropriate privileges can pass a tablespace name to CREATE DATABASE or CREATE TABLE to have the data files for these objects stored within the specified tablespace.

In HAWQ, there must be a file system location defined for the master and each segment in order for the tablespace to have a location to store its objects across an entire HAWQ system. This collection of file system locations is defined in a filespace object. A filespace must be defined before you can create a tablespace. See hawq filespace for more information.

Parameters

1
<tablespacename>

The name of a tablespace to be created. The name cannot begin with pg_, as such names are reserved for system tablespaces.

1
OWNER <username>

The name of the user who will own the tablespace. If omitted, defaults to the user executing the command. Only superusers may create tablespaces, but they can assign ownership of tablespaces to non-superusers.

1
FILESPACE <filespace_name>

The name of a HAWQ filespace that was defined using the hawq filespace management utility.

Notes

You must first create a filespace to be used by the tablespace. See “hawq filespace” for more information.

Tablespaces are only supported on systems that support symbolic links.

CREATE TABLESPACE cannot be executed inside a transaction block.

Examples

Create a new tablespace by specifying the corresponding filespace to use:

1
CREATE TABLESPACE mytblspace FILESPACE myfilespace;

Compatibility

CREATE TABLESPACE is a HAWQ extension.

CREATE TYPE

Defines a new data type.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
CREATE TYPE <name> AS ( <attribute_name>
         <data_type> [, ... ] )

CREATE TYPE <name> (
    INPUT = <input_function>,
    OUTPUT = <output_function>
    [, RECEIVE = <receive_function>]
    [, SEND = <send_function>]
    [, INTERNALLENGTH = {<internallength> | VARIABLE}]
    [, PASSEDBYVALUE]
    [, ALIGNMENT = <alignment>]
    [, STORAGE = <storage>]
    [, DEFAULT = <default>]
    [, ELEMENT = <element>]
    [, DELIMITER = <delimiter>] )

CREATE TYPE name

Description

CREATE TYPE registers a new data type for use in the current database. The user who defines a type becomes its owner.

If a schema name is given then the type is created in the specified schema. Otherwise it is created in the current schema. The type name must be distinct from the name of any existing type or domain in the same schema. The type name must also be distinct from the name of any existing table in the same schema.

Composite Types

The first form of CREATE TYPE creates a composite type. This is the only form currently supported by HAWQ. The composite type is specified by a list of attribute names and data types. This is essentially the same as the row type of a table, but using CREATE TYPE avoids the need to create an actual table when all that is wanted is to define a type. A stand-alone composite type is useful as the argument or return type of a function.

Base Types

The second form of CREATE TYPE creates a new base type (scalar type). The parameters may appear in any order, not only that shown in the syntax, and most are optional. You must register two or more functions (using CREATE FUNCTION) before defining the type. The support functions <input_function> and <output_function> are required, while the functions <receive_function>, <send_function> and <analyze_function> are optional. Generally these functions have to be coded in C or another low-level language. In HAWQ, any function used to implement a data type must be defined as IMMUTABLE.

The <input_function> converts the type’s external textual representation to the internal representation used by the operators and functions defined for the type. <output_function> performs the reverse transformation. The input function may be declared as taking one argument of type cstring, or as taking three arguments of types cstring, oid, integer. The first argument is the input text as a C string, the second argument is the type’s own OID (except for array types, which instead receive their element type’s OID), and the third is the typmod of the destination column, if known (-1 will be passed if not). The input function must return a value of the data type itself. Usually, an input function should be declared STRICT; if it is not, it will be called with a NULL first parameter when reading a NULL input value. The function must still return NULL in this case, unless it raises an error. (This case is mainly meant to support domain input functions, which may need to reject NULL inputs.) The output function must be declared as taking one argument of the new data type. The output function must return type cstring. Output functions are not invoked for NULL values.

The optional <receive_function> converts the type’s external binary representation to the internal representation. If this function is not supplied, the type cannot participate in binary input. The binary representation should be chosen to be cheap to convert to internal form, while being reasonably portable. (For example, the standard integer data types use network byte order as the external binary representation, while the internal representation is in the machine’s native byte order.) The receive function should perform adequate checking to ensure that the value is valid. The receive function may be declared as taking one argument of type internal, or as taking three arguments of types internal, oid, integer. The first argument is a pointer to a StringInfo buffer holding the received byte string; the optional arguments are the same as for the text input function. The receive function must return a value of the data type itself. Usually, a receive function should be declared STRICT; if it is not, it will be called with a NULL first parameter when reading a NULL input value. The function must still return NULL in this case, unless it raises an error. (This case is mainly meant to support domain receive functions, which may need to reject NULL inputs.) Similarly, the optional <send_function> converts from the internal representation to the external binary representation. If this function is not supplied, the type cannot participate in binary output. The send function must be declared as taking one argument of the new data type. The send function must return type bytea. Send functions are not invoked for NULL values.

You should at this point be wondering how the input and output functions can be declared to have results or arguments of the new type, when they have to be created before the new type can be created. The answer is that the type should first be defined as a shell type, which is a placeholder type that has no properties except a name and an owner. This is done by issuing the command CREATE TYPE name, with no additional parameters. Then the I/O functions can be defined referencing the shell type. Finally, CREATE TYPE with a full definition replaces the shell entry with a complete, valid type definition, after which the new type can be used normally.

While the details of the new type’s internal representation are only known to the I/O functions and other functions you create to work with the type, there are several properties of the internal representation that must be declared to HAWQ. Foremost of these is <internallength>. Base data types can be fixed-length, in which case <internallength> is a positive integer, or variable length, indicated by setting <internallength> to VARIABLE. (Internally, this is represented by setting typlen to -1.) The internal representation of all variable-length types must start with a 4-byte integer giving the total length of this value of the type.

The optional flag PASSEDBYVALUE indicates that values of this data type are passed by value, rather than by reference. You may not pass by value types whose internal representation is larger than the size of the Datum type (4 bytes on most machines, 8 bytes on a few).

The <alignment> parameter specifies the storage alignment required for the data type. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.

The <storage> parameter allows selection of storage strategies for variable-length data types. (Only plain is allowed for fixed-length types.) plain specifies that data of the type will always be stored in-line and not compressed. extended specifies that the system will first try to compress a long data value, and will move the value out of the main table row if it’s still too long. external allows the value to be moved out of the main table, but the system will not try to compress it. main allows compression, but discourages moving the value out of the main table. (Data items with this storage strategy may still be moved out of the main table if there is no other way to make a row fit, but they will be kept in the main table preferentially over extended and external items.)

A default value may be specified, in case a user wants columns of the data type to default to something other than the null value. Specify the default with the DEFAULT key word. (Such a default may be overridden by an explicit DEFAULT clause attached to a particular column.)

To indicate that a type is an array, specify the type of the array elements using the ELEMENT key word. For example, to define an array of 4-byte integers (int4), specify ELEMENT = int4. More details about array types appear below.

To indicate the delimiter to be used between values in the external representation of arrays of this type, delimiter can be set to a specific character. The default delimiter is the comma (,). Note that the delimiter is associated with the array element type, not the array type itself.

Array Types

Whenever a user-defined base data type is created, HAWQ automatically creates an associated array type, whose name consists of the base type’s name prepended with an underscore. The parser understands this naming convention, and translates requests for columns of type foo[] into requests for type _foo. The implicitly-created array type is variable length and uses the built-in input and output functions array_in and array_out.

You might reasonably ask why there is an ELEMENT option, if the system makes the correct array type automatically. The only case where it’s useful to use ELEMENT is when you are making a fixed-length type that happens to be internally an array of a number of identical things, and you want to allow these things to be accessed directly by subscripting, in addition to whatever operations you plan to provide for the type as a whole. For example, type name allows its constituent char elements to be accessed this way. A 2-D point type could allow its two component numbers to be accessed like point[0] and point[1]. Note that this facility only works for fixed-length types whose internal form is exactly a sequence of identical fixed-length fields. A subscriptable variable-length type must have the generalized internal representation used by array_in and array_out. For historical reasons, subscripting of fixed-length array types starts from zero, rather than from one as for variable-length arrays.

Parameters

1
<name>

The name (optionally schema-qualified) of a type to be created.

1
<attribute_name>

The name of an attribute (column) for the composite type.

1
<data_type>

The name of an existing data type to become a column of the composite type.

1
<input_function>

The name of a function that converts data from the type’s external textual form to its internal form.

1
<output_function>

The name of a function that converts data from the type’s internal form to its external textual form.

1
<receive_function>

The name of a function that converts data from the type’s external binary form to its internal form.

1
<send_function>

The name of a function that converts data from the type’s internal form to its external binary form.

1
<internallength>

A numeric constant that specifies the length in bytes of the new type’s internal representation. The default assumption is that it is variable-length.

1
<alignment>

The storage alignment requirement of the data type. Must be one of char, int2, int4, or double. The default is int4.

1
<storage>

The storage strategy for the data type. Must be one of plain, external, extended, or main. The default is plain.

1
<default>

The default value for the data type. If this is omitted, the default is null.

1
<element>

The type being created is an array; this specifies the type of the array elements.

1
<delimiter>

The delimiter character to be used between values in arrays made of this type.

Notes

User-defined type names cannot begin with the underscore character (_) and can only be 62 characters long (or in general NAMEDATALEN - 2, rather than the NAMEDATALEN - 1 characters allowed for other names). Type names beginning with underscore are reserved for internally-created array type names.

Because there are no restrictions on use of a data type once it’s been created, creating a base type is tantamount to granting public execute permission on the functions mentioned in the type definition. (The creator of the type is therefore required to own these functions.) This is usually not an issue for the sorts of functions that are useful in a type definition. But you might want to think twice before designing a type in a way that would require ‘secret’ information to be used while converting it to or from external form.

Examples

This example creates a composite type and uses it in a function definition:

1
2
3
4
5
CREATE TYPE compfoo AS (f1 int, f2 text);

CREATE FUNCTION getfoo() RETURNS SETOF compfoo AS $$
    SELECT fooid, fooname FROM foo
$$ LANGUAGE SQL;

This example creates the base data type box and then uses the type in a table definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CREATE FUNCTION my_box_in_function(cstring) RETURNS box AS
... ;

CREATE FUNCTION my_box_out_function(box) RETURNS cstring AS
... ;

CREATE TYPE box (
    INTERNALLENGTH = 16,
    INPUT = my_box_in_function,
    OUTPUT = my_box_out_function
);

CREATE TABLE myboxes (
    id integer,
    description box
);

If the internal structure of box were an array of four float4 elements, we might instead use:

1
2
3
4
5
6
CREATE TYPE box (
    INTERNALLENGTH = 16,
    INPUT = my_box_in_function,
    OUTPUT = my_box_out_function,
    ELEMENT = float4
);

which would allow a box value’s component numbers to be accessed by subscripting. Otherwise the type behaves the same as before.

This example creates a large object type and uses it in a table definition:

1
2
3
4
5
6
7
8
9
CREATE TYPE bigobj (
    INPUT = lo_filein, OUTPUT = lo_fileout,
    INTERNALLENGTH = VARIABLE
);

CREATE TABLE big_objs (
    id integer,
    obj bigobj
);

Compatibility

CREATE TYPE command is a HAWQ extension. There is a CREATE TYPE statement in the SQL standard that is rather different in detail.

CREATE USER

Defines a new database role with the LOGIN privilege by default.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
CREATE USER <name> [ [WITH] <option> [ ... ] ]
where <option> can be:

   SUPERUSER | NOSUPERUSER
 | CREATEDB | NOCREATEDB
 | CREATEROLE | NOCREATEROLE
 | CREATEUSER | NOCREATEUSER
 | INHERIT | NOINHERIT
 | LOGIN | NOLOGIN
 | [ ENCRYPTED | UNENCRYPTED ] PASSWORD '<password>'
 | VALID UNTIL '<timestamp>'
 | IN ROLE <rolename> [, ...]
 | IN GROUP <rolename> [, ...]
 | ROLE <rolename> [, ...]
 | ADMIN <rolename> [, ...]
 | USER <rolename> [, ...]
 | SYSID <uid>
 | RESOURCE QUEUE <queue_name>

Description

HAWQ does not support CREATE USER. This command has been replaced by CREATE ROLE.

The only difference between CREATE ROLE and CREATE USER is that LOGIN is assumed by default with CREATE USER, whereas NOLOGIN is assumed by default with CREATE ROLE.

Compatibility

There is no CREATE USER statement in the SQL standard.

CREATE VIEW

Defines a new view.

Synopsis

1
2
3
CREATE [OR REPLACE] [TEMP | TEMPORARY] VIEW <name>
       [ ( <column_name> [, ...] ) ]
       AS <query>

Description

CREATE VIEW defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced in a query.

CREATE OR REPLACE VIEW is similar, but if a view of the same name already exists, it is replaced. You can only replace a view with a new query that generates the identical set of columns (same column names and data types).

If a schema name is given then the view is created in the specified schema. Otherwise it is created in the current schema. Temporary views exist in a special schema, so a schema name may not be given when creating a temporary view. The name of the view must be distinct from the name of any other view, table, sequence, or index in the same schema.

Parameters

1
TEMPORARY | TEMP

If specified, the view is created as a temporary view. Temporary views are automatically dropped at the end of the current session. Existing permanent relations with the same name are not visible to the current session while the temporary view exists, unless they are referenced with schema-qualified names. If any of the tables referenced by the view are temporary, the view is created as a temporary view (whether TEMPORARY is specified or not).

1
<name>

The name (optionally schema-qualified) of a view to be created.

1
<column_name>

An optional list of names to be used for columns of the view. If not given, the column names are deduced from the query.

1
<query>

A SELECT command which will provide the columns and rows of the view.

Notes

Views in HAWQ are read only. The system will not allow an insert, update, or delete on a view. You can get the effect of an updatable view by creating rewrite rules on the view into appropriate actions on other tables. For more information see CREATE RULE.

Be careful that the names and data types of the view’s columns will be assigned the way you want. For example, if you run the following command:

1
CREATE VIEW vista AS SELECT 'Hello World';

The result is poor: the column name defaults to ?column?, and the column data type defaults to unknown. If you want a string literal in a view’s result, use the following command:

1
CREATE VIEW vista AS SELECT text 'Hello World' AS hello;

Check that you have permission to access the tables referenced in the view. View ownership determines permissions, not your status as current user. This is true, even if you are a superuser. This concept is unusual, since superusers typically have access to all objects. In the case of views, even superusers must be explicitly granted access to tables referenced if they do not own the view.

However, functions called in the view are treated the same as if they had been called directly from the query using the view. Therefore the user of a view must have permissions to call any functions used by the view.

If you create a view with an ORDER BY clause, the ORDER BY clause is ignored when you do a SELECT from the view.

Examples

Create a view consisting of all comedy films:

1
2
CREATE VIEW comedies AS SELECT * FROM films WHERE kind =
'comedy';

Create a view that gets the top ten ranked baby names:

1
2
CREATE VIEW topten AS SELECT name, rank, gender, year FROM
names, rank WHERE rank < '11' AND names.id=rank.id;

Compatibility

The SQL standard specifies some additional capabilities for the CREATE VIEW statement that are not in HAWQ. The optional clauses for the full SQL command in the standard are:

  • CHECK OPTION — This option has to do with updatable views. All INSERT commands on the view will be checked to ensure data satisfy the view-defining condition (that is, the new data would be visible through the view). If they do not, the insert will be rejected.
  • LOCAL — Check for integrity on this view.
  • CASCADED — Check for integrity on this view and on any dependent view. CASCADED is assumed if neither CASCADED nor LOCAL is specified.

CREATE OR REPLACE VIEW is a HAWQ language extension. So is the concept of a temporary view.

DEALLOCATE

Deallocates a prepared statement.

Synopsis

1
DEALLOCATE [PREPARE] <name>

Description

DEALLOCATE is used to deallocate a previously prepared SQL statement. If you do not explicitly deallocate a prepared statement, it is deallocated when the session ends.

For more information on prepared statements, see PREPARE.

Parameters

1
PREPARE

Optional key word which is ignored.

Examples

Deallocated the previously prepared statement named insert_names:

1
DEALLOCATE insert_names;

Compatibility

The SQL standard includes a DEALLOCATE statement, but it is only for use in embedded SQL.

DECLARE

Defines a cursor.

Synopsis

1
2
3
DECLARE <name> [BINARY] [INSENSITIVE] [NO SCROLL] CURSOR
     [{WITH | WITHOUT} HOLD]
     FOR <query> [FOR READ ONLY]

Description

DECLARE allows a user to create cursors, which can be used to retrieve a small number of rows at a time out of a larger query. Cursors can return data either in text or in binary format using FETCH.

Normal cursors return data in text format, the same as a SELECT would produce. Since data is stored natively in binary format, the system must do a conversion to produce the text format. Once the information comes back in text form, the client application may need to convert it to a binary format to manipulate it. In addition, data in the text format is often larger in size than in the binary format. Binary cursors return the data in a binary representation that may be more easily manipulated. Nevertheless, if you intend to display the data as text anyway, retrieving it in text form will save you some effort on the client side.

As an example, if a query returns a value of one from an integer column, you would get a string of 1 with a default cursor whereas with a binary cursor you would get a 4-byte field containing the internal representation of the value (in big-endian byte order).

Binary cursors should be used carefully. Many applications, including psql, are not prepared to handle binary cursors and expect data to come back in the text format.

Note: When the client application uses the ‘extended query’ protocol to issue a FETCH command, the Bind protocol message specifies whether data is to be retrieved in text or binary format. This choice overrides the way that the cursor is defined. The concept of a binary cursor as such is thus obsolete when using extended query protocol — any cursor can be treated as either text or binary.

Parameters

1
<name>

The name of the cursor to be created.

1
BINARY

Causes the cursor to return data in binary rather than in text format.

1
INSENSITIVE

Indicates that data retrieved from the cursor should be unaffected by updates to the tables underlying the cursor while the cursor exists. In HAWQ, all cursors are insensitive. This key word currently has no effect and is present for compatibility with the SQL standard.

1
NO SCROLL

A cursor cannot be used to retrieve rows in a nonsequential fashion. This is the default behavior in HAWQ, since scrollable cursors (SCROLL) are not supported.

1
2
WITH HOLD
WITHOUT HOLD

WITH HOLD specifies that the cursor may continue to be used after the transaction that created it successfully commits. WITHOUT HOLD specifies that the cursor cannot be used outside of the transaction that created it. WITHOUT HOLD is the default.

1
<query>

A SELECT command which will provide the rows to be returned by the cursor. FOR READ ONLY FOR READ ONLY indicates that the cursor is used in a read-only mode. Cursors can only be used in a read-only mode in HAWQ. HAWQ does not support updatable cursors (FOR UPDATE), so this is the default behavior.

Notes

Unless WITH HOLD is specified, the cursor created by this command can only be used within the current transaction. Thus, DECLARE without WITH HOLD is useless outside a transaction block: the cursor would survive only to the completion of the statement. Therefore HAWQ reports an error if this command is used outside a transaction block. Use BEGIN, COMMIT and ROLLBACK to define a transaction block.

If WITH HOLD is specified and the transaction that created the cursor successfully commits, the cursor can continue to be accessed by subsequent transactions in the same session. (But if the creating transaction is aborted, the cursor is removed.) A cursor created with WITH HOLD is closed when an explicit CLOSE command is issued on it, or the session ends. In the current implementation, the rows represented by a held cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions.

Scrollable cursors are not currently supported in HAWQ. You can only use FETCH to move the cursor position forward, not backwards.

You can see all available cursors by querying the pg_cursors system view.

Examples

Declare a cursor:

1
DECLARE mycursor CURSOR FOR SELECT * FROM mytable;

Compatibility

SQL standard allows cursors only in embedded SQL and in modules. HAWQ permits cursors to be used interactively.

HAWQ does not implement an OPEN statement for cursors. A cursor is considered to be open when it is declared.

The SQL standard allows cursors to move both forward and backward. All HAWQ cursors are forward moving only (not scrollable).

Binary cursors are a HAWQ extension.

DROP AGGREGATE

Removes an aggregate function.

Synopsis

1
DROP AGGREGATE will delete an existing aggregate function. To execute this command the current user must be the owner of the aggregate function.

Description

DROP AGGREGATE will delete an existing aggregate function. To execute this command the current user must be the owner of the aggregate function.

Parameters

1
IF EXISTS

Do not throw an error if the aggregate does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of an existing aggregate function.

1
<type>

An input data type on which the aggregate function operates. To reference a zero-argument aggregate function, write * in place of the list of input data types.

1
CASCADE

Automatically drop objects that depend on the aggregate function.

1
RESTRICT

Refuse to drop the aggregate function if any objects depend on it. This is the default.

Examples

To remove the aggregate function myavg for type integer:

1
DROP AGGREGATE myavg(integer);

Compatibility

There is no DROP AGGREGATE statement in the SQL standard.

DROP DATABASE

Removes a database.

Synopsis

1
DROP DATABASE [IF EXISTS] <name>

Description

DROP DATABASE drops a database. It removes the catalog entries for the database and deletes the directory containing the data. It can only be executed by the database owner. Also, it cannot be executed while you or anyone else are connected to the target database. (Connect to template1 or any other database to issue this command.)

Warning: DROP DATABASE cannot be undone. Use it with care!

Parameters

1
IF EXISTS

Do not throw an error if the database does not exist. A notice is issued in this case.

1
<name>

The name of the database to remove.

Notes

DROP DATABASE cannot be executed inside a transaction block.

This command cannot be executed while connected to the target database. Thus, it might be more convenient to use the program dropdb instead, which is a wrapper around this command.

Examples

Drop the database named testdb:

1
DROP DATABASE testdb;

Compatibility

There is no DROP DATABASE statement in the SQL standard.

DROP EXTERNAL TABLE

Removes an external table definition.

Synopsis

1
DROP EXTERNAL [WEB] TABLE [IF EXISTS] <name> [CASCADE | RESTRICT]

Description

DROP EXTERNAL TABLE drops an existing external table definition from the database system. The external data sources or files are not deleted. To execute this command you must be the owner of the external table.

Parameters

1
WEB

Optional keyword for dropping external web tables.

1
IF EXISTS

Do not throw an error if the external table does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of an existing external table.

1
CASCADE

Automatically drop objects that depend on the external table (such as views).

1
RESTRICT

Refuse to drop the external table if any objects depend on it. This is the default.

Examples

Remove the external table named staging if it exists:

1
DROP EXTERNAL TABLE IF EXISTS staging;

Compatibility

There is no DROP EXTERNAL TABLE statement in the SQL standard.

DROP FILESPACE

Removes a filespace.

Synopsis

1
DROP FILESPACE [IF EXISTS]  <filespacename>

Description

DROP FILESPACE removes a filespace definition and its system-generated data directories from the system.

A filespace can only be dropped by its owner or a superuser. The filespace must be empty of all tablespace objects before it can be dropped. It is possible that tablespaces in other databases may still be using a filespace even if no tablespaces in the current database are using the filespace.

Parameters

1
IF EXISTS

Do not throw an error if the filespace does not exist. A notice is issued in this case.

1
<filespacename>

The name of the filespace to remove.

Examples

Remove the tablespace myfs:

1
DROP FILESPACE myfs;

Compatibility

There is no DROP FILESPACE statement in the SQL standard or in PostgreSQL.

DROP FUNCTION

Removes a function.

Synopsis

1
2
DROP FUNCTION [IF EXISTS] <name> ( [ [<argmode>] [<argname>] <argtype>
    [, ...] ] ) [CASCADE | RESTRICT]

Description

DROP FUNCTION removes the definition of an existing function. To execute this command the user must be the owner of the function. The argument types to the function must be specified, since several different functions may exist with the same name and different argument lists.

Parameters

1
IF EXISTS

Do not throw an error if the function does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of an existing function.

1
<argmode>

The mode of an argument: either IN, OUT, or INOUT. If omitted, the default is IN. Note that DROP FUNCTION does not actually pay any attention to OUT arguments, since only the input arguments are needed to determine the function’s identity. So it is sufficient to list the IN and INOUT arguments.

1
<argname>

The name of an argument. Note that DROP FUNCTION does not actually pay any attention to argument names, since only the argument data types are needed to determine the function’s identity.

1
<argtype>

The data type(s) of the function’s arguments (optionally schema-qualified), if any.

1
CASCADE

Automatically drop objects that depend on the function such as operators.

1
RESTRICT

Refuse to drop the function if any objects depend on it. This is the default.

Examples

Drop the square root function:

1
DROP FUNCTION sqrt(integer);

Compatibility

A DROP FUNCTION statement is defined in the SQL standard, but it is not compatible with this command.

DROP GROUP

Removes a database role.

Synopsis

1
DROP GROUP [IF EXISTS] <name> [, ...]

Description

DROP GROUP is an obsolete command, though still accepted for backwards compatibility. Groups (and users) have been superseded by the more general concept of roles. See DROP ROLE for more information.

Parameters

1
IF EXISTS

Do not throw an error if the role does not exist. A notice is issued in this case.

1
<name>

The name of an existing role.

Compatibility

There is no DROP GROUP statement in the SQL standard.

DROP OPERATOR

Removes an operator.

Synopsis

1
2
DROP OPERATOR [IF EXISTS] <name> ( {<lefttype> | NONE} ,
   {<righttype> | NONE} ) [CASCADE | RESTRICT]

Description

DROP OPERATOR drops an existing operator from the database system. To execute this command you must be the owner of the operator.

Parameters

1
IF EXISTS

Do not throw an error if the operator does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of an existing operator.

1
<lefttype>

The data type of the operator’s left operand; write NONE if the operator has no left operand.

1
<righttype>

The data type of the operator’s right operand; write NONE if the operator has no right operand.

1
CASCADE

Automatically drop objects that depend on the operator.

1
RESTRICT

Refuse to drop the operator if any objects depend on it. This is the default.

Examples

Remove the power operator a^b for type integer:

1
DROP OPERATOR ^ (integer, integer);

Remove the left unary bitwise complement operator ~b for type bit:

1
DROP OPERATOR ~ (none, bit);

Remove the right unary factorial operator x! for type bigint:

1
DROP OPERATOR ! (bigint, none);

Compatibility

There is no DROP OPERATOR statement in the SQL standard.

DROP OPERATOR CLASS

Removes an operator class.

Synopsis

1
DROP OPERATOR CLASS [IF EXISTS] <name> USING <index_method> [CASCADE | RESTRICT]

Description

DROP OPERATOR drops an existing operator class. To execute this command you must be the owner of the operator class.

Parameters

1
IF EXISTS

Do not throw an error if the operator class does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of an existing operator class.

1
<index_method>

The name of the index access method the operator class is for.

1
CASCADE

Automatically drop objects that depend on the operator class.

1
RESTRICT

Refuse to drop the operator class if any objects depend on it. This is the default.

Notes

This command will not succeed if there are any existing indexes that use the operator class. Add CASCADE to drop such indexes along with the operator class.

Examples

Remove the B-tree operator class widget_ops:

1
DROP OPERATOR CLASS widget_ops USING btree;

This command will not succeed if there are any existing indexes that use the operator class. Add CASCADE to drop such indexes along with the operator class.

Compatibility

There is no DROP OPERATOR CLASS statement in the SQL standard.

DROP OWNED

Removes database objects owned by a database role.

Synopsis

1
DROP OWNED BY <name> [, ...] [CASCADE | RESTRICT]

Description

DROP OWNED drops all the objects in the current database that are owned by one of the specified roles. Any privileges granted to the given roles on objects in the current database will also be revoked.

Parameters

1
<name>

The name of a role whose objects will be dropped, and whose privileges will be revoked.

1
CASCADE

Automatically drop objects that depend on the affected objects.

1
RESTRICT

Refuse to drop the objects owned by a role if any other database objects depend on one of the affected objects. This is the default.

Notes

DROP OWNED is often used to prepare for the removal of one or more roles. Because DROP OWNED only affects the objects in the current database, it is usually necessary to execute this command in each database that contains objects owned by a role that is to be removed.

Using the CASCADE option may make the command recurse to objects owned by other users.

The REASSIGN OWNED command is an alternative that reassigns the ownership of all the database objects owned by one or more roles.

Examples

Remove any database objects owned by the role named sally:

1
DROP OWNED BY sally;

Compatibility

The DROP OWNED statement is a HAWQ extension.

DROP RESOURCE QUEUE

Removes a resource queue.

Synopsis

1
DROP RESOURCE QUEUE <queue_name>

Description

This command removes a resource queue from HAWQ. To drop a resource queue, the queue cannot have any roles assigned to it, nor can it have any statements waiting in the queue or have any children resource queues. Only a superuser can drop a resource queue.

Note: The pg_root and pg_default resource queues cannot be dropped.

Parameters

1
<queue_name>

The name of a resource queue to remove.

Notes

Use ALTER ROLE to remove a user from a resource queue.

To see all the currently active queries for all resource queues, perform the following query of the pg_locks table joined with the pg_roles and pg_resqueue tables:

1
2
3
4
SELECT rolname, rsqname, locktype, objid, transaction, pid,
mode, granted FROM pg_roles, pg_resqueue, pg_locks WHERE
pg_roles.rolresqueue=pg_locks.objid AND
pg_locks.objid=pg_resqueue.oid;

To see the roles assigned to a resource queue, perform the following query of the pg_roles and pg_resqueue system catalog tables:

1
2
SELECT rolname, rsqname FROM pg_roles, pg_resqueue WHERE
pg_roles.rolresqueue=pg_resqueue.oid;

Examples

Remove a role from a resource queue (and move the role to the default resource queue, pg_default):

1
ALTER ROLE bob RESOURCE QUEUE NONE;

Remove the resource queue named adhoc:

1
DROP RESOURCE QUEUE adhoc;

Compatibility

The DROP RESOURCE QUEUE statement is a HAWQ extension.

DROP ROLE

Removes a database role.

Synopsis

1
DROP ROLE [IF EXISTS] <name> [, ...]

Description

DROP ROLE removes the specified role(s). To drop a superuser role, you must be a superuser yourself. To drop non-superuser roles, you must have CREATEROLE privilege.

A role cannot be removed if it is still referenced in any database; an error will be raised if so. Before dropping the role, you must drop all the objects it owns (or reassign their ownership) and revoke any privileges the role has been granted. The REASSIGN OWNED and DROP OWNED commands can be useful for this purpose.

However, it is not necessary to remove role memberships involving the role; DROP ROLE automatically revokes any memberships of the target role in other roles, and of other roles in the target role. The other roles are not dropped nor otherwise affected.

Parameters

1
IF EXISTS

Do not throw an error if the role does not exist. A notice is issued in this case.

1
<name>

The name of the role to remove.

Examples

Remove the roles named sally and bob:

1
DROP ROLE sally, bob;

Compatibility

The SQL standard defines DROP ROLE, but it allows only one role to be dropped at a time, and it specifies different privilege requirements than HAWQ uses.

DROP SCHEMA

Removes a schema.

Synopsis

1
DROP SCHEMA [IF EXISTS] <name> [, ...] [CASCADE | RESTRICT]

Description

DROP SCHEMA removes schemas from the database. A schema can only be dropped by its owner or a superuser. Note that the owner can drop the schema (and thereby all contained objects) even if he does not own some of the objects within the schema.

Parameters

1
IF EXISTS

Do not throw an error if the schema does not exist. A notice is issued in this case.

1
<name>

The name of the schema to remove.

1
CASCADE

Automatically drops any objects contained in the schema (tables, functions, etc.).

1
RESTRICT

Refuse to drop the schema if it contains any objects. This is the default.

Examples

Remove the schema mystuff from the database, along with everything it contains:

1
DROP SCHEMA mystuff CASCADE;

#### Compatibility

DROP SCHEMA is fully conforming with the SQL standard, except that the standard only allows one schema to be dropped per command. Also, the IF EXISTS option is a HAWQ extension.

DROP SEQUENCE

Removes a sequence.

Synopsis

1
DROP SEQUENCE [IF EXISTS]  <name> [, ...] [CASCADE | RESTRICT]

Description

DROP SEQUENCE removes a sequence generator table. You must own the sequence to drop it (or be a superuser).

Parameters

1
IF EXISTS

Do not throw an error if the sequence does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of the sequence to remove.

1
CASCADE

Automatically drop objects that depend on the sequence.

1
RESTRICT

Refuse to drop the sequence if any objects depend on it. This is the default.

Examples

Remove the sequence myserial:

1
DROP SEQUENCE myserial;

Compatibility

DROP SEQUENCE is fully conforming with the SQL standard, except that the standard only allows one sequence to be dropped per command. Also, the IF EXISTS option is a HAWQ extension.

DROP TABLE

Removes a table.

Synopsis

1
DROP TABLE [IF EXISTS] <name> [, ...] [CASCADE | RESTRICT]

Description

DROP TABLE removes tables from the database. Only its owner may drop a table. To empty a table of rows without removing the table definition, use TRUNCATE.

DROP TABLE always removes any indexes, rules, and constraints that exist for the target table. However, to drop a table that is referenced by a view, CASCADE must be specified. CASCADE will remove a dependent view entirely.

Parameters

1
IF EXISTS

Do not throw an error if the table does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of the table to remove.

1
CASCADE

Automatically drop objects that depend on the table (such as views).

1
RESTRICT

Refuse to drop the table if any objects depend on it. This is the default.

Examples

Remove the table mytable:

1
DROP TABLE mytable;

Compatibility

DROP TABLE is fully conforming with the SQL standard, except that the standard only allows one table to be dropped per command. Also, the IF EXISTS option is a HAWQ extension.

DROP TABLESPACE

Removes a tablespace.

Synopsis

1
DROP TABLESPACE [IF EXISTS] <tablespacename>

Description

DROP TABLESPACE removes a tablespace from the system.

A tablespace can only be dropped by its owner or a superuser. The tablespace must be empty of all database objects before it can be dropped. It is possible that objects in other databases may still reside in the tablespace even if no objects in the current database are using the tablespace.

Parameters

1
IF EXISTS

Do not throw an error if the tablespace does not exist. A notice is issued in this case.

1
<tablespacename>

The name of the tablespace to remove.

Examples

Remove the tablespace mystuff:

1
DROP TABLESPACE mystuff;

Compatibility

DROP TABLESPACE is a HAWQ extension.

DROP TYPE

Removes a data type.

Synopsis

1
DROP TYPE [IF EXISTS] <name> [, ...] [CASCADE | RESTRICT]

Description

DROP TYPE will remove a user-defined data type. Only the owner of a type can remove it.

Parameters

1
IF EXISTS

Do not throw an error if the type does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of the data type to remove.

1
CASCADE

Automatically drop objects that depend on the type (such as table columns, functions, operators).

1
RESTRICT

Refuse to drop the type if any objects depend on it. This is the default.

Examples

Remove the data type box;

1
DROP TYPE box;

Compatibility

This command is similar to the corresponding command in the SQL standard, apart from the IF EXISTS option, which is a HAWQ extension. But note that the CREATE TYPE command and the data type extension mechanisms in HAWQ differ from the SQL standard.

DROP USER

Removes a database role.

Synopsis

1
DROP USER [IF EXISTS] <name> [, ...]

Description

DROP USER is an obsolete command, though still accepted for backwards compatibility. Groups (and users) have been superseded by the more general concept of roles. See DROP ROLE for more information.

Parameters

1
IF EXISTS

Do not throw an error if the role does not exist. A notice is issued in this case.

1
<name>

The name of an existing role

Compatibility

There is no DROP USER statement in the SQL standard. The SQL standard leaves the definition of users to the implementation.

DROP VIEW

Removes a view.

Synopsis

1
DROP VIEW [IF EXISTS] <name. [, ...] [CASCADE | RESTRICT]

Description

1
DROP VIEW will remove an existing view. Only the owner of a view can remove it.

Parameters

1
IF EXISTS

Do not throw an error if the view does not exist. A notice is issued in this case.

1
<name>

The name (optionally schema-qualified) of the view to remove.

1
CASCADE

Automatically drop objects that depend on the view (such as other views).

1
RESTRICT

Refuse to drop the view if any objects depend on it. This is the default.

Examples

Remove the view topten;

1
DROP VIEW topten;

Compatibility

DROP VIEW is fully conforming with the SQL standard, except that the standard only allows one view to be dropped per command. Also, the IF EXISTS option is a HAWQ extension.

END

Commits the current transaction.

Synopsis

1
END [WORK | TRANSACTION]

Description

END commits the current transaction. All changes made by the transaction become visible to others and are guaranteed to be durable if a crash occurs. This command is a HAWQ extension that is equivalent to COMMIT.

Parameters

1
2
WORK
TRANSACTION

Optional keywords. They have no effect.

Examples

Commit the current transaction:

1
END;

Compatibility

END is a HAWQ extension that provides functionality equivalent to COMMIT, which is specified in the SQL standard.

EXECUTE

Executes a prepared SQL statement.

Synopsis

1
EXECUTE <name> [ (<parameter> [, ...] ) ]

Description

EXECUTE is used to execute a previously prepared statement. Since prepared statements only exist for the duration of a session, the prepared statement must have been created by a PREPARE statement executed earlier in the current session.

If the PREPARE statement that created the statement specified some parameters, a compatible set of parameters must be passed to the EXECUTE statement, or else an error is raised. Note that (unlike functions) prepared statements are not overloaded based on the type or number of their parameters; the name of a prepared statement must be unique within a database session.

For more information on the creation and usage of prepared statements, see PREPARE.

Parameters

1
<name>

The name of the prepared statement to execute.

1
<parameter>

The actual value of a parameter to the prepared statement. This must be an expression yielding a value that is compatible with the data type of this parameter, as was determined when the prepared statement was created.

Examples

Create a prepared statement for an INSERT statement, and then execute it:

1
2
3
PREPARE fooplan (int, text, bool, numeric) AS INSERT INTO
foo VALUES($1, $2, $3, $4);
EXECUTE fooplan(1, 'Hunter Valley', 't', 200.00);

Compatibility

The SQL standard includes an EXECUTE statement, but it is only for use in embedded SQL. This version of the EXECUTE statement also uses a somewhat different syntax.

EXPLAIN

Shows the query plan of a statement.

Synopsis

1
EXPLAIN [ANALYZE] [VERBOSE] <statement>

Description

EXPLAIN displays the query plan that the HAWQ planner generates for the supplied statement. Query plans are a tree plan of nodes. Each node in the plan represents a single operation, such as table scan, join, aggregation or a sort.

Plans should be read from the bottom up as each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations. If the query requires joins, aggregations, or sorts (or other operations on the raw rows), then there will be additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually the HAWQ motion nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the operations responsible for moving rows between the segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the following cost estimates that the planner made for the execution of that plan node:

  • cost — measured in units of disk page fetches; that is, 1.0 equals one sequential disk page read. The first estimate is the start-up cost (cost of getting to the first row) and the second is the total cost (cost of getting all rows). Note that the total cost assumes that all rows will be retrieved, which may not always be the case (if using LIMIT for example).
  • rows — the total number of rows output by this plan node. This is usually less than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally the top-level nodes estimate will approximate the number of rows actually returned, updated, or deleted by the query.
  • width — total bytes of all the rows output by this plan node. It is important to note that the cost of an upper-level node includes the cost of all its child nodes. The topmost node of the plan has the estimated total execution cost for the plan. This is this number that the planner seeks to minimize. It is also important to realize that the cost only reflects things that the query planner cares about. In particular, the cost does not consider the time spent transmitting result rows to the client.

EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information:

  • The total elapsed time (in milliseconds) that it took to run the query.
  • The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
  • The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the one with the longest time to end is the one chosen.
  • The segment id number of the segment that produced the most rows for an operation. For relevant operations, the work_mem used by the operation. If work_mem was not sufficient to perform the operation in memory, the plan will show how much data was spilled to disk and how many passes over the data were required for the lowest performing segment. For example:
1
2
3
4
5
6
Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile
I/O affecting 2 workers.
[seg0] pass 0: 488 groups made from 488 rows; 263 rows written to
workfile
[seg0] pass 1: 263 groups made from 263 rows

Note You cannot set the work_mem property. The work_mem property is for information only. The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that segment. * The <time> to first row may be omitted if it is the same as the <time> to end. Important: Keep in mind that the statement is actually executed when EXPLAIN ANALYZE is used. Although EXPLAIN ANALYZE will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on a DML statement without letting the command affect your data, use this approach:

1
2
3
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Parameters

1
<name>

The name of the prepared statement to execute.

1
<parameter>

The actual value of a parameter to the prepared statement. This must be an expression yielding a value that is compatible with the data type of this parameter, as was determined when the prepared statement was created.

Notes

In order to allow the query planner to make reasonably informed decisions when optimizing queries, the ANALYZE statement should be run to record statistics about the distribution of data within the table. If you have not done this (or if the statistical distribution of the data in the table has changed significantly since the last time ANALYZE was run), the estimated costs are unlikely to conform to the real properties of the query, and consequently an inferior query plan may be chosen.

Examples

To illustrate how to read an EXPLAIN query plan, consider the following example for a very simple query:

1
2
3
4
5
6
7
EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                  QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)

-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
      Filter: name::text ~~ 'Joelle'::text

If we read the plan from the bottom up, the query planner starts by doing a sequential scan of the names table. Notice that the WHERE clause is being applied as a filter condition. This means that the scan operation checks the condition for each row it scans, and outputs only the ones that pass the condition.

The results of the scan operation are passed up to a gather motion operation. In HAWQ, a gather motion is when segments send rows up to the master. In this case we have 2 segment instances sending to 1 master instance (2:1). This operation is working on slice1 of the parallel query execution plan. In HAWQ, a query plan is divided into slices so that portions of the query plan can be worked on in parallel by the segments.

The estimated startup cost for this plan is 00.00 (no cost) and a total cost of 20.88 disk page fetches. The planner is estimating that this query will return one row.

Compatibility

There is no EXPLAIN statement defined in the SQL standard.

FETCH

Retrieves rows from a query using a cursor.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
FETCH [ <forward_direction> { FROM | IN } ] <cursorname>

where forward_direction can be empty or one of:

    NEXT
    FIRST
    LAST
    ABSOLUTE <count>
    RELATIVE <count>
    <count>
    ALL
    FORWARD
    FORWARD <count>
    FORWARD ALL

Description

FETCH retrieves rows using a previously-created cursor.

A cursor has an associated position, which is used by FETCH. The cursor position can be before the first row of the query result, on any particular row of the result, or after the last row of the result. When created, a cursor is positioned before the first row. After fetching some rows, the cursor is positioned on the row most recently retrieved. If FETCH runs off the end of the available rows then the cursor is left positioned after the last row. FETCH ALL will always leave the cursor positioned after the last row.

The forms NEXT, FIRST, LAST, ABSOLUTE, RELATIVE fetch a single row after moving the cursor appropriately. If there is no such row, an empty result is returned, and the cursor is left positioned before the first row or after the last row as appropriate.

The forms using FORWARD retrieve the indicated number of rows moving in the forward direction, leaving the cursor positioned on the last-returned row (or after all rows, if the count exceeds the number of rows available). Note that it is not possible to move a cursor position backwards in HAWQ, since scrollable cursors are not supported. You can only move a cursor forward in position using FETCH.

RELATIVE 0 and FORWARD 0 request fetching the current row without moving the cursor, that is, re-fetching the most recently fetched row. This will succeed unless the cursor is positioned before the first row or after the last row, in which case no row is returned.

Outputs

On successful completion, a FETCH command returns a command tag of the form

1
FETCH count

The count is the number of rows fetched (possibly zero). Note that in psql, the command tag will not actually be displayed, since psql displays the fetched rows instead.

Parameters

1
<forward_direction>

Defines the fetch direction and number of rows to fetch. Only forward fetches are allowed in HAWQ. It can be one of the following:

1
NEXT

Fetch the next row. This is the default if direction is omitted.

1
FIRST

Fetch the first row of the query (same as ABSOLUTE 1). Only allowed if it is the first FETCH operation using this cursor.

1
LAST

Fetch the last row of the query (same as ABSOLUTE -1).

1
ABSOLUTE <count>

Fetch the specified row of the query. Position after last row if count is out of range. Only allowed if the row specified by count moves the cursor position forward.

1
RELATIVE <count>

Fetch the specified row of the query count rows ahead of the current cursor position. RELATIVE 0 re-fetches the current row, if any. Only allowed if count moves the cursor position forward.

1
<count>

Fetch the next count number of rows (same as FORWARD count).

1
ALL

Fetch all remaining rows (same as FORWARD ALL).

1
FORWARD

Fetch the next row (same as NEXT).

1
FORWARD <count>

Fetch the next count number of rows. FORWARD 0 re-fetches the current row.

1
FORWARD ALL

Fetch all remaining rows.

1
<cursorname>

The name of an open cursor.

Notes

HAWQ does not support scrollable cursors, so you can only use FETCH to move the cursor position forward.

ABSOLUTE fetches are not any faster than navigating to the desired row with a relative move: the underlying implementation must traverse all the intermediate rows anyway.

Updating data via a cursor is currently not supported by HAWQ.

DECLARE is used to define a cursor. Use MOVE to change cursor position without retrieving data.

Examples

– Start the transaction:

1
BEGIN;

– Set up a cursor:

1
DECLARE mycursor CURSOR FOR SELECT * FROM films;

– Fetch the first 5 rows in the cursor mycursor:

1
2
3
4
5
6
7
8
FETCH FORWARD 5 FROM mycursor;
 code  |          title          | did | date_prod  |   kind   |  len
-------+-------------------------+-----+------------+----------+-------
 BL101 | The Third Man           | 101 | 1949-12-23 | Drama    | 01:44
 BL102 | The African Queen       | 101 | 1951-08-11 | Romantic | 01:43
 JL201 | Une Femme est une Femme | 102 | 1961-03-12 | Romantic | 01:25
 P_301 | Vertigo                 | 103 | 1958-11-14 | Action   | 02:08
 P_302 | Becket                  | 103 | 1964-02-03 | Drama    | 02:28

– Close the cursor and end the transaction:

1
2
CLOSE mycursor;
COMMIT;

Compatibility

SQL standard allows cursors only in embedded SQL and in modules. HAWQ permits cursors to be used interactively.

The variant of FETCH described here returns the data as if it were a SELECT result rather than placing it in host variables. Other than this point, FETCH is fully upward-compatible with the SQL standard.

The FETCH forms involving FORWARD, as well as the forms FETCH count and FETCH ALL, in which FORWARD is implicit, are HAWQ extensions. BACKWARD is not supported.

The SQL standard allows only FROM preceding the cursor name; the option to use IN is an extension.

GRANT

Defines access privileges.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
GRANT { {SELECT | INSERT | UPDATE | DELETE | REFERENCES }
[,...] | ALL [PRIVILEGES] }
    ON [TABLE] <tablename> [, ...]
    TO {<rolename> | PUBLIC} [, ...] [WITH GRANT OPTION]

GRANT { {USAGE | SELECT | UPDATE} [,...] | ALL [PRIVILEGES] }
    ON SEQUENCE <sequencename> [, ...]
    TO { <rolename> | PUBLIC } [, ...] [WITH GRANT OPTION]

GRANT { {CREATE | CONNECT | TEMPORARY | TEMP} [,...] | ALL
[PRIVILEGES] }
    ON DATABASE <dbname> [, ...]
    TO {<rolename> | PUBLIC} [, ...] [WITH GRANT OPTION]

GRANT { EXECUTE | ALL [PRIVILEGES] }
    ON FUNCTION <funcname> ( [ [<argmode>] [<argname>] <argtype> [, ...]
    ] ) [, ...]
    TO {<rolename> | PUBLIC} [, ...] [WITH GRANT OPTION]

GRANT { USAGE | ALL [PRIVILEGES] }
    ON LANGUAGE <langname> [, ...]
    TO {<rolename> | PUBLIC} [, ...] [WITH GRANT OPTION]

GRANT { {CREATE | USAGE} [,...] | ALL [PRIVILEGES] }
    ON SCHEMA <schemaname> [, ...]
    TO {<rolename> | PUBLIC} [, ...] [WITH GRANT OPTION]

GRANT { CREATE | ALL [PRIVILEGES] }
    ON TABLESPACE <tablespacename> [, ...]
    TO {<rolename> | PUBLIC} [, ...] [WITH GRANT OPTION]

GRANT <parent_role> [, ...]
    TO <member_role> [, ...] [WITH ADMIN OPTION]

GRANT { SELECT | INSERT | ALL [PRIVILEGES] }
    ON PROTOCOL <protocolname>
    TO <username>

Description

The GRANT command has two basic variants: one that grants privileges on a database object (table, view, sequence, database, function, procedural language, schema, or tablespace), and one that grants membership in a role.

GRANT on Database Objects

This variant of the GRANT command gives specific privileges on a database object to one or more roles. These privileges are added to those already granted, if any.

The key word PUBLIC indicates that the privileges are to be granted to all roles, including those that may be created later. PUBLIC may be thought of as an implicitly defined group-level role that always includes all roles. Any particular role will have the sum of privileges granted directly to it, privileges granted to any role it is presently a member of, and privileges granted to PUBLIC.

If WITH GRANT OPTION is specified, the recipient of the privilege may in turn grant it to others. Without a grant option, the recipient cannot do that. Grant options cannot be granted to PUBLIC.

There is no need to grant privileges to the owner of an object (usually the role that created it), as the owner has all privileges by default. The right to drop an object, or to alter its definition in any way is not described by a grantable privilege; it is inherent in the owner, and cannot be granted or revoked. The owner implicitly has all grant options for the object, too.

Depending on the type of object, the initial default privileges may include granting some privileges to PUBLIC. The default is no public access for tables, schemas, and tablespaces; CONNECT privilege and TEMP table creation privilege for databases; EXECUTE privilege for functions; and USAGE privilege for languages. The object owner may of course revoke these privileges.

GRANT on Roles

This variant of the GRANT command grants membership in a role to one or more other roles. Membership in a role is significant because it conveys the privileges granted to a role to each of its members.

If WITH ADMIN OPTION is specified, the member may in turn grant membership in the role to others, and revoke membership in the role as well. Database superusers can grant or revoke membership in any role to anyone. Roles having CREATEROLE privilege can grant or revoke membership in any role that is not a superuser.

Unlike the case with privileges, membership in a role cannot be granted to PUBLIC.

Parameters

1
SELECT

Allows SELECT from any column of the specified table, view, or sequence. Also allows the use of COPY TO. For sequences, this privilege also allows the use of the currval function.

1
INSERT

Allows INSERT of a new row into the specified table. Also allows COPY FROM.

1
UPDATE

Allows UPDATE of any column of the specified table. SELECT … FOR UPDATE and SELECT … FOR SHARE also require this privilege (as well as the SELECT privilege). For sequences, this privilege allows the use of the nextval and setval functions.

1
DELETE

Allows DELETE of a row from the specified table.

1
REFERENCES

This keyword is accepted, although foreign key constraints are currently not supported in HAWQ. To create a foreign key constraint, it is necessary to have this privilege on both the referencing and referenced tables.

1
TRIGGER

Allows the creation of a trigger on the specified table. Note: HAWQ does not support triggers.

1
CREATE

For databases, allows new schemas to be created within the database. For schemas, allows new objects to be created within the schema. To rename an existing object, you must own the object and have this privilege for the containing schema.

For tablespaces, allows tables and indexes to be created within the tablespace, and allows databases to be created that have the tablespace as their default tablespace. (Note that revoking this privilege will not alter the placement of existing objects.)

1
CONNECT

Allows the user to connect to the specified database. This privilege is checked at connection startup (in addition to checking any restrictions imposed by pg_hba.conf).

1
2
TEMPORARY
TEMP

Allows temporary tables to be created while using the database.

1
EXECUTE

Allows the use of the specified function and the use of any operators that are implemented on top of the function. This is the only type of privilege that is applicable to functions. (This syntax works for aggregate functions, as well.)

1
USAGE

For procedural languages, allows the use of the specified language for the creation of functions in that language. This is the only type of privilege that is applicable to procedural languages. For schemas, allows access to objects contained in the specified schema (assuming that the objects’ own privilege requirements are also met). Essentially this allows the grantee to look up objects within the schema.

For sequences, this privilege allows the use of the currval and nextval functions.

1
ALL PRIVILEGES

Grant all of the available privileges at once. The PRIVILEGES key word is optional in HAWQ, though it is required by strict SQL.

1
PUBLIC

A special group-level role that denotes that the privileges are to be granted to all roles, including those that may be created later.

1
WITH GRANT OPTION

The recipient of the privilege may in turn grant it to others.

1
WITH ADMIN OPTION

The member of a role may in turn grant membership in the role to others.

Notes

Database superusers can access all objects regardless of object privilege settings. One exception to this rule is view objects. Access to tables referenced in the view is determined by permissions of the view owner not the current user (even if the current user is a superuser).

If a superuser chooses to issue a GRANT or REVOKE command, the command is performed as though it were issued by the owner of the affected object. In particular, privileges granted via such a command will appear to have been granted by the object owner. For role membership, the membership appears to have been granted by the containing role itself.

GRANT and REVOKE can also be done by a role that is not the owner of the affected object, but is a member of the role that owns the object, or is a member of a role that holds privileges WITH GRANT OPTION on the object. In this case the privileges will be recorded as having been granted by the role that actually owns the object or holds the privileges WITH GRANT OPTION.

Granting permission on a table does not automatically extend permissions to any sequences used by the table, including sequences tied to SERIAL columns. Permissions on a sequence must be set separately.

HAWQ does not support granting or revoking privileges for individual columns of a table. One possible workaround is to create a view having just the desired columns and then grant privileges to that view.

Use psql’s z meta-command to obtain information about existing privileges for an object.

Examples

Grant insert privilege to all roles on table mytable:

1
GRANT INSERT ON mytable TO PUBLIC;

Grant all available privileges to role sally on the view topten. Note that while the above will indeed grant all privileges if executed by a superuser or the owner of topten, when executed by someone else it will only grant those permissions for which the granting role has grant options.

1
GRANT ALL PRIVILEGES ON topten TO sally;

Grant membership in role admins to user joe:

1
GRANT admins TO joe;

Compatibility

The PRIVILEGES key word in is required in the SQL standard, but optional in HAWQ. The SQL standard does not support setting the privileges on more than one object per command.

HAWQ allows an object owner to revoke his own ordinary privileges: for example, a table owner can make the table read-only to himself by revoking his own INSERT privileges. This is not possible according to the SQL standard. HAWQ treats the owner’s privileges as having been granted by the owner to himself; therefore he can revoke them too. In the SQL standard, the owner’s privileges are granted by an assumed system entity.

The SQL standard allows setting privileges for individual columns within a table.

The SQL standard provides for a USAGE privilege on other kinds of objects: character sets, collations, translations, domains.

Privileges on databases, tablespaces, schemas, and languages are HAWQ extensions.

INSERT

Creates new rows in a table.

Synopsis

1
2
3
INSERT INTO <table> [( <column> [, ...] )]
   {DEFAULT VALUES | VALUES ( {<expression> | DEFAULT} [, ...] )
   [, ...] | <query>}

Description

INSERT inserts new rows into a table. One can insert one or more rows specified by value expressions, or zero or more rows resulting from a query.

The target column names may be listed in any order. If no list of column names is given at all, the default is the columns of the table in their declared order. The values supplied by the VALUES clause or query are associated with the explicit or implicit column list left-to-right.

Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is no default.

If the expression for any column is not of the correct data type, automatic type conversion will be attempted.

You must have INSERT privilege on a table in order to insert into it.

Outputs On successful completion, an INSERT command returns a command tag of the form:

1
INSERT oid  count

The count is the number of rows inserted. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. Otherwise oid is zero.

Parameters

1
<table>

The name (optionally schema-qualified) of an existing table.

1
<column>

The name of a column in table. The column name can be qualified with a subfield name or array subscript, if needed. (Inserting into only some fields of a composite column leaves the other fields null.) DEFAULT VALUES All columns will be filled with their default values.

1
<expression>

An expression or value to assign to the corresponding column.

1
DEFAULT

The corresponding column will be filled with its default value.

1
<query>

A query (SELECT statement) that supplies the rows to be inserted. Refer to the SELECT statement for a description of the syntax.

Examples

Insert a single row into table films:

1
2
INSERT INTO films VALUES ('UA502', 'Bananas', 105,
'1971-07-13', 'Comedy', '82 minutes');

In this example, the length column is omitted and therefore it will have the default value:

1
2
INSERT INTO films (code, title, did, date_prod, kind) VALUES
('T_601', 'Yojimbo', 106, '1961-06-16', 'Drama');

This example uses the DEFAULT clause for the date_prod column rather than specifying a value:

1
2
INSERT INTO films VALUES ('UA502', 'Bananas', 105, DEFAULT,
'Comedy', '82 minutes');

To insert a row consisting entirely of default values:

1
INSERT INTO films DEFAULT VALUES;

To insert multiple rows using the multirow VALUES syntax:

1
2
3
INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

This example inserts some rows into table films from a table tmp_films with the same column layout as films:

1
2
INSERT INTO films SELECT * FROM tmp_films WHERE date_prod <
'2004-05-07';

Compatibility

INSERT conforms to the SQL standard. The case in which a column name list is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard.

Possible limitations of the query clause are documented under SELECT.

PREPARE

Prepare a statement for execution.

Synopsis

1
PREPARE <name> [ (<datatype> [, ...] ) ] AS <statement>

Description

PREPARE creates a prepared statement, possibly with unbound parameters. A prepared statement is a server-side object that can be used to optimize performance. A prepared statement may be subsequently executed with a binding for its parameters. HAWQ may choose to replan the query for different executions of the same prepared statement.

Prepared statements can take parameters: values that are substituted into the statement when it is executed. When creating the prepared statement, refer to parameters by position, using $1, $2, etc. A corresponding list of parameter data types can optionally be specified. When a parameter’s data type is not specified or is declared as unknown, the type is inferred from the context in which the parameter is used (if possible). When executing the statement, specify the actual values for these parameters in the EXECUTE statement.

Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again. This also means that a single prepared statement cannot be used by multiple simultaneous database clients; however, each client can create their own prepared statement to use. The prepared statement can be manually cleaned up using the DEALLOCATE command.

Prepared statements have the largest performance advantage when a single session is being used to execute a large number of similar statements. The performance difference will be particularly significant if the statements are complex to plan or rewrite, for example, if the query involves a join of many tables or requires the application of several rules. If the statement is relatively simple to plan and rewrite but relatively expensive to execute, the performance advantage of prepared statements will be less noticeable.

Parameters

1
<name>

An arbitrary name given to this particular prepared statement. It must be unique within a single session and is subsequently used to execute or deallocate a previously prepared statement.

1
<datatype>

The data type of a parameter to the prepared statement. If the data type of a particular parameter is unspecified or is specified as unknown, it will be inferred from the context in which the parameter is used. To refer to the parameters in the prepared statement itself, use $1, $2, etc.

1
<statement>

Any SELECT, INSERT, or VALUES statement.

Notes

In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. HAWQ collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan may be suboptimal. To examine the query plan HAWQ has chosen for a prepared statement, use EXPLAIN.

For more information on query planning and the statistics collected by HAWQ for that purpose, see the ANALYZE documentation.

You can see all available prepared statements of a session by querying the pg_prepared_statements system view.

Examples

Create a prepared statement for an INSERT statement, and then execute it:

1
2
3
PREPARE fooplan (int, text, bool, numeric) AS INSERT INTO
foo VALUES($1, $2, $3, $4);
EXECUTE fooplan(1, 'Hunter Valley', 't', 200.00);

Create a prepared statement for a SELECT statement, and then execute it. Note that the data type of the second parameter is not specified, so it is inferred from the context in which $2 is used:

1
2
3
PREPARE usrrptplan (int) AS SELECT * FROM users u, logs l
WHERE u.usrid=$1 AND u.usrid=l.usrid AND l.date = $2;
EXECUTE usrrptplan(1, current_date);

Compatibility

The SQL standard includes a PREPARE statement, but it is only for use in embedded SQL. This version of the PREPARE statement also uses a somewhat different syntax.

REASSIGN OWNED

Changes the ownership of database objects owned by a database role.

Synopsis

1
REASSIGN OWNED BY <old_role> [, ...] TO <new_role>

Description

REASSIGN OWNED reassigns all the objects in the current database that are owned by <old_role> to <new_role>. Note that it does not change the ownership of the database itself.

Parameters

1
<old_role>

The name of a role. The ownership of all the objects in the current database owned by this role will be reassigned to <new_role>.

1
<new_role>

The name of the role that will be made the new owner of the affected objects.

Notes

REASSIGN OWNED is often used to prepare for the removal of one or more roles. Because REASSIGN OWNED only affects the objects in the current database, it is usually necessary to execute this command in each database that contains objects owned by a role that is to be removed.

The DROP OWNED command is an alternative that drops all the database objects owned by one or more roles.

The REASSIGN OWNED command does not affect the privileges granted to the old roles in objects that are not owned by them. Use DROP OWNED to revoke those privileges.

Examples

Reassign any database objects owned by the role named sally and bob to admin;

1
REASSIGN OWNED BY sally, bob TO admin;

Compatibility

The REASSIGN OWNED statement is a HAWQ extension.

RELEASE SAVEPOINT

Destroys a previously defined savepoint.

Synopsis

1
RELEASE [SAVEPOINT] <savepoint_name>

Description

RELEASE SAVEPOINT destroys a savepoint previously defined in the current transaction.

Destroying a savepoint makes it unavailable as a rollback point, but it has no other user visible behavior. It does not undo the effects of commands executed after the savepoint was established. (To do that, see ROLLBACK TO SAVEPOINT.) Destroying a savepoint when it is no longer needed may allow the system to reclaim some resources earlier than transaction end.

RELEASE SAVEPOINT also destroys all savepoints that were established after the named savepoint was established.

Parameters

1
<savepoint_name>

The name of the savepoint to destroy.

Examples

To establish and later destroy a savepoint:

1
2
3
4
5
6
BEGIN;
    INSERT INTO table1 VALUES (3);
    SAVEPOINT my_savepoint;
    INSERT INTO table1 VALUES (4);
    RELEASE SAVEPOINT my_savepoint;
COMMIT;

The above transaction will insert both 3 and 4.

Compatibility

This command conforms to the SQL standard. The standard specifies that the key word SAVEPOINT is mandatory, but HAWQ allows it to be omitted.

RESET

Restores the value of a system configuration parameter to the default value.

Synopsis

1
2
RESET <configuration_parameter>
RESET ALL

Description

RESET restores system configuration parameters to their default values. RESET is an alternative spelling for SET configuration_parameter TO DEFAULT.

The default value is defined as the value that the parameter would have had, had no SET ever been issued for it in the current session. The actual source of this value might be a compiled-in default, the master hawq-site.xml configuration file, command-line options, or per-database or per-user default settings.

Parameters

1
<configuration_parameter>

The name of a system configuration parameter. See HAWQ Parameter Reference for a list of configuration parameters.

1
ALL

Resets all settable configuration parameters to their default values.

Notes

It is also possible to set a configuration parameter session default for a specific role (user) rather than to a database. Role-specific settings override database-specific ones if there is a conflict. See ALTER ROLE.

Examples

Set the hawq_rm_stmt_vseg_memory configuration parameter to its default value:

1
RESET hawq_rm_stmt_vseg_memory;

Compatibility

RESET is a HAWQ extension.

REVOKE

Removes access privileges.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
REVOKE [GRANT OPTION FOR] { {SELECT | INSERT | UPDATE | DELETE
       | REFERENCES | TRUNCATE } [,...] | ALL [PRIVILEGES] }
       ON [TABLE] <tablename> [, ...]
       FROM {<rolename> | PUBLIC} [, ...]
       [CASCADE | RESTRICT]

REVOKE [GRANT OPTION FOR] { {USAGE | SELECT | UPDATE} [,...]
       | ALL [PRIVILEGES] }
       ON SEQUENCE <sequencename> [, ...]
       FROM { <rolename> | PUBLIC } [, ...]
       [CASCADE | RESTRICT]

REVOKE [GRANT OPTION FOR] { {CREATE | CONNECT
       | TEMPORARY | TEMP} [,...] | ALL [PRIVILEGES] }
       ON DATABASE <dbname> [, ...]
       FROM {<rolename> | PUBLIC} [, ...]
       [CASCADE | RESTRICT]

REVOKE [GRANT OPTION FOR] {EXECUTE | ALL [PRIVILEGES]}
       ON FUNCTION <funcname> ( [[<argmode>] [<argname>] <argtype>
                              [, ...]] ) [, ...]
       FROM {<rolename> | PUBLIC} [, ...]
       [CASCADE | RESTRICT]

REVOKE [GRANT OPTION FOR] {USAGE | ALL [PRIVILEGES]}
       ON LANGUAGE <langname> [, ...]
       FROM {<rolename> | PUBLIC} [, ...]
       [ CASCADE | RESTRICT ]

REVOKE [GRANT OPTION FOR] { {CREATE | USAGE} [,...]
       | ALL [PRIVILEGES] }
       ON SCHEMA <schemaname> [, ...]
       FROM {<rolename> | PUBLIC} [, ...]
       [CASCADE | RESTRICT]

REVOKE [GRANT OPTION FOR] { CREATE | ALL [PRIVILEGES] }
       ON TABLESPACE <tablespacename> [, ...]
       FROM { <rolename> | PUBLIC } [, ...]
       [CASCADE | RESTRICT]

REVOKE [ADMIN OPTION FOR] <parent_role> [, ...]
       FROM <member_role> [, ...]
       [CASCADE | RESTRICT]

Description

REVOKE command revokes previously granted privileges from one or more roles. The key word PUBLIC refers to the implicitly defined group of all roles.

See the description of the GRANT command for the meaning of the privilege types.

Note that any particular role will have the sum of privileges granted directly to it, privileges granted to any role it is presently a member of, and privileges granted to PUBLIC. Thus, for example, revoking SELECT privilege from PUBLIC does not necessarily mean that all roles have lost SELECT privilege on the object: those who have it granted directly or via another role will still have it.

If GRANT OPTION FOR is specified, only the grant option for the privilege is revoked, not the privilege itself. Otherwise, both the privilege and the grant option are revoked.

If a role holds a privilege with grant option and has granted it to other roles then the privileges held by those other roles are called dependent privileges. If the privilege or the grant option held by the first role is being revoked and dependent privileges exist, those dependent privileges are also revoked if CASCADE is specified, else the revoke action will fail. This recursive revocation only affects privileges that were granted through a chain of roles that is traceable to the role that is the subject of this REVOKE command. Thus, the affected roles may effectively keep the privilege if it was also granted through other roles.

When revoking membership in a role, GRANT OPTION is instead called ADMIN OPTION, but the behavior is similar.

Parameters

See GRANT.

Examples

Revoke insert privilege for the public on table films:

1
REVOKE INSERT ON films FROM PUBLIC;

Revoke all privileges from role sally on view topten. Note that this actually means revoke all privileges that the current role granted (if not a superuser).

1
REVOKE ALL PRIVILEGES ON topten FROM sally;

Revoke membership in role admins from user joe:

1
REVOKE admins FROM joe;

Compatibility

The compatibility notes of the GRANT command also apply to REVOKE.

Either RESTRICT or CASCADE is required according to the standard, but HAWQ assumes RESTRICT by default.

ROLLBACK

Aborts the current transaction.

Synopsis

1
ROLLBACK [WORK | TRANSACTION]

Description

ROLLBACK rolls back the current transaction and causes all the updates made by the transaction to be discarded.

Parameters

1
2
WORK
TRANSACTION

Optional key words. They have no effect.

Notes

Use COMMIT to successfully end the current transaction.

Issuing ROLLBACK when not inside a transaction does no harm, but it will provoke a warning message.

Examples

To discard all changes made in the current transaction:

1
ROLLBACK;

Compatibility

The SQL standard only specifies the two forms ROLLBACK and ROLLBACK WORK. Otherwise, this command is fully conforming.

ROLLBACK TO SAVEPOINT

Rolls back the current transaction to a savepoint.

Synopsis

1
ROLLBACK [WORK | TRANSACTION] TO [SAVEPOINT] <savepoint_name>

Description

This command will roll back all commands that were executed after the savepoint was established. The savepoint remains valid and can be rolled back to again later, if needed.

ROLLBACK TO SAVEPOINT implicitly destroys all savepoints that were established after the named savepoint.

Parameters

1
2
WORK
TRANSACTION

Optional key words. They have no effect.

1
<savepoint_name>

The name of a savepoint to roll back to.

Notes

Use RELEASE SAVEPOINT to destroy a savepoint without discarding the effects of commands executed after it was established.

Specifying a savepoint name that has not been established is an error.

Cursors have somewhat non-transactional behavior with respect to savepoints. Any cursor that is opened inside a savepoint will be closed when the savepoint is rolled back. If a previously opened cursor is affected by a FETCH command inside a savepoint that is later rolled back, the cursor position remains at the position that FETCH left it pointing to (that is, FETCH is not rolled back). Closing a cursor is not undone by rolling back, either. A cursor whose execution causes a transaction to abort is put in a can’t-execute state, so while the transaction can be restored using ROLLBACK TO SAVEPOINT, the cursor can no longer be used.

Examples

To undo the effects of the commands executed after my_savepoint was established:

1
ROLLBACK TO SAVEPOINT my_savepoint;

Cursor positions are not affected by a savepoint rollback:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
BEGIN;
DECLARE foo CURSOR FOR SELECT 1 UNION SELECT 2;
SAVEPOINT foo;

FETCH 1 FROM foo;

column
----------
        1

ROLLBACK TO SAVEPOINT foo;
FETCH 1 FROM foo;

column
----------
        2

COMMIT;

Compatibility

The SQL standard specifies that the key word SAVEPOINT is mandatory, but HAWQ (and Oracle) allow it to be omitted. SQL allows

SAVEPOINT

Defines a new savepoint within the current transaction.

Synopsis

1
SAVEPOINT <savepoint_name>

Description

SAVEPOINT establishes a new savepoint within the current transaction.

A savepoint is a special mark inside a transaction that allows all commands that are executed after it was established to be rolled back, restoring the transaction state to what it was at the time of the savepoint.

Parameters

1
<savepoint_name>

The name of the new savepoint.

Notes

Use ROLLBACK TO SAVEPOINT to rollback to a savepoint. Use RELEASE SAVEPOINT to destroy a savepoint, keeping the effects of commands executed after it was established.

Savepoints can only be established when inside a transaction block. There can be multiple savepoints defined within a transaction.

Examples

To establish a savepoint and later undo the effects of all commands executed after it was established:

1
2
3
4
5
6
7
BEGIN;
    INSERT INTO table1 VALUES (1);
    SAVEPOINT my_savepoint;
    INSERT INTO table1 VALUES (2);
    ROLLBACK TO SAVEPOINT my_savepoint;
    INSERT INTO table1 VALUES (3);
COMMIT;

The above transaction will insert the values 1 and 3, but not 2.

To establish and later destroy a savepoint:

1
2
3
4
5
6
BEGIN;
    INSERT INTO table1 VALUES (3);
    SAVEPOINT my_savepoint;
    INSERT INTO table1 VALUES (4);
    RELEASE SAVEPOINT my_savepoint;
COMMIT;

The above transaction will insert both 3 and 4.

Compatibility

SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In HAWQ, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming.

SELECT

Retrieves rows from a table or view.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
SELECT [ALL | DISTINCT [ON (<expression> [, ...])]]
   * | <expression> [[AS] <output_name>] [, ...]
   [FROM <from_item> [, ...]]
   [WHERE <condition>]
   [GROUP BY <grouping_element> [, ...]]
   [HAVING <condition> [, ...]]
   [WINDOW <window_name> AS (<window_specification>)]
   [{UNION | INTERSECT | EXCEPT} [ALL] <select>]
   [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]]
   [LIMIT {<count> | ALL}]
   [OFFSET <start>]
where <grouping_element> can be one of:

  ()
  <expression>
  ROLLUP (<expression> [,...])
  CUBE (<expression> [,...])
  GROUPING SETS ((<grouping_element> [, ...]))
where <window_specification> can be:

  [<window_name>]
  [PARTITION BY <expression> [, ...]]
  [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
     [{RANGE | ROWS}
          { UNBOUNDED PRECEDING
          | <expression> PRECEDING
          | CURRENT ROW
          | BETWEEN <window_frame_bound> AND <window_frame_bound> }]]
                    where <window_frame_bound> can be one of:
                        UNBOUNDED PRECEDING
                        <expression> PRECEDING
                        CURRENT ROW
                        <expression> FOLLOWING
                        UNBOUNDED FOLLOWING
where <from_item> can be one of:

[ONLY] <table_name> [[AS] <alias> [( <column_alias> [, ...] )]]
(select) [AS] <alias> [( <column_alias> [, ...] )]
<function_name> ( [<argument> [, ...]] ) [AS] <alias>
             [( <column_alias> [, ...]
                | <column_definition> [, ...] )]
<function_name> ( [<argument> [, ...]] ) AS
              ( <column_definition> [, ...] )
<from_item> [NATURAL] <join_type>
            <from_item>
          [ON <join_condition> | USING ( <join_column> [, ...] )]

Description

SELECT retrieves rows from zero or more tables. The general processing of SELECT is as follows:

  • All elements in the FROM list are computed. (Each element in the FROM list is a real or virtual table.) If more than one element is specified in the FROM list, they are cross-joined together.
  • If the WHERE clause is specified, all rows that do not satisfy the condition are eliminated from the output.
  • If the GROUP BY clause is specified, the output is divided into groups of rows that match on one or more of the defined grouping elements. If the HAVING clause is present, it eliminates groups that do not satisfy the given condition.
  • If a window expression is specified (and optional WINDOW clause), the output is organized according to the positional (row) or value-based (range) window frame.
  • DISTINCT eliminates duplicate rows from the result. DISTINCT ON eliminates rows that match on all the specified expressions. ALL (the default) will return all candidate rows, including duplicates.
  • The actual output rows are computed using the SELECT output expressions for each selected row.
  • Using the operators UNION, INTERSECT, and EXCEPT, the output of more than one SELECT statement can be combined to form a single result set. The UNION operator returns all rows that are in one or both of the result sets. The INTERSECT operator returns all rows that are strictly in both result sets. The EXCEPT operator returns the rows that are in the first result set but not in the second. In all three cases, duplicate rows are eliminated unless ALL is specified.
  • If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
  • If the LIMIT or OFFSET clause is specified, the SELECT statement only returns a subset of the result rows.

You must have SELECT privilege on a table to read its values.

Parameters

The SELECT List

The SELECT list (between the key words SELECT and FROM) specifies expressions that form the output rows of the SELECT statement. The expressions can (and usually do) refer to columns computed in the FROM clause.

Using the clause [AS] <output_name>, another name can be specified for an output column. This name is primarily used to label the column for display. It can also be used to refer to the column’s value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead. The AS keyword is optional in most cases (such as when declaring an alias for column names, constants, function calls, and simple unary operator expressions). In cases where the declared alias is a reserved SQL keyword, the <output_name> must be enclosed in double quotes to avoid ambiguity.

An <expression> in the SELECT list can be a constant value, a column reference, an operator invocation, a function call, an aggregate expression, a window expression, a scalar subquery, and so on. There are a number of constructs that can be classified as an expression but do not follow any general syntax rules.

Instead of an expression, * can be written in the output list as a shorthand for all the columns of the selected rows. Also, you can write table_name.* as a shorthand for the columns coming from just that table.

The FROM Clause

The FROM clause specifies one or more source tables for the SELECT. If multiple sources are specified, the result is the Cartesian product (cross join) of all the sources. But usually qualification conditions are added to restrict the returned rows to a small subset of the Cartesian product. The FROM clause can contain the following elements:

1
<table_name>

The name (optionally schema-qualified) of an existing table or view. If ONLY is specified, only that table is scanned. If ONLY is not specified, the table and all its descendant tables (if any) are scanned.

1
<alias>

A substitute name for the FROM item containing the alias. An alias is used for brevity or to eliminate ambiguity for self-joins (where the same table is scanned multiple times). When an alias is provided, it completely hides the actual name of the table or function; for example given FROM foo AS f, the remainder of the SELECT must refer to this FROM item as f not foo. If an alias is written, a column alias list can also be written to provide substitute names for one or more columns of the table.

1
<select>

A sub-SELECT can appear in the FROM clause. This acts as though its output were created as a temporary table for the duration of this single SELECT command. Note that the sub-SELECT must be surrounded by parentheses, and an alias must be provided for it. A VALUES command can also be used here. See “Non-standard Clauses” in the Compatibility section for limitations of using correlated sub-selects in HAWQ.

1
<function_name>

Function calls can appear in the FROM clause. (This is especially useful for functions that return result sets, but any function can be used.) This acts as though its output were created as a temporary table for the duration of this single SELECT command. An alias may also be used. If an alias is written, a column alias list can also be written to provide substitute names for one or more attributes of the function’s composite return type. If the function has been defined as returning the record data type, then an alias or the key word AS must be present, followed by a column definition list in the form (<column_name> <data_type> [, … ] ). The column definition list must match the actual number and types of columns returned by the function.

1
<join_type>

One of: * [INNER] JOIN * LEFT [OUTER] JOIN * RIGHT [OUTER] JOIN * FULL [OUTER] JOIN * CROSS JOIN

For the INNER and OUTER join types, a join condition must be specified, namely exactly one of NATURAL, ON , or USING ( [, …]). See below for the meaning. For CROSS JOIN, none of these clauses may appear.

A JOIN clause combines two FROM items. Use parentheses if necessary to determine the order of nesting. In the absence of parentheses, JOINs nest left-to-right. In any case JOIN binds more tightly than the commas separating FROM items.

CROSS JOIN and INNER JOIN produce a simple Cartesian product, the same result as you get from listing the two items at the top level of FROM, but restricted by the join condition (if any). CROSS JOIN is equivalent to INNER JOIN ON (TRUE), that is, no rows are removed by qualification. These join types are just a notational convenience, since they do nothing you could not do with plain FROM and WHERE.

LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the JOIN clause’s own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards.

Conversely, RIGHT OUTER JOIN returns all the joined rows, plus one row for each unmatched right-hand row (extended with nulls on the left). This is just a notational convenience, since you could convert it to a LEFT OUTER JOIN by switching the left and right inputs.

FULL OUTER JOIN returns all the joined rows, plus one row for each unmatched left-hand row (extended with nulls on the right), plus one row for each unmatched right-hand row (extended with nulls on the left).

1
ON <join_condition>

<join_condition> is an expression resulting in a value of type boolean (similar to a WHERE clause) that specifies which rows in a join are considered to match.

1
USING (<join_column> [, …])

A clause of the form USING ( a, b, … ) is shorthand for ON left_table.a = right_table.a AND left_table.b = right_table.b …. Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both.

1
NATURAL

NATURAL is shorthand for a USING list that mentions all columns in the two tables that have the same names.

The WHERE Clause

The optional WHERE clause has the general form:

1
WHERE <condition>

where <condition> is any expression that evaluates to a result of type boolean. Any row that does not satisfy this condition will be eliminated from the output. A row satisfies the condition if it returns true when the actual row values are substituted for any variable references.

The GROUP BY Clause

The optional GROUP BY clause has the general form:

1
GROUP BY <grouping_element> [, ...]

where can be one of:

1
2
3
4
5
()
<expression>
ROLLUP (<expression> [,...])
CUBE (<expression> [,...])
GROUPING SETS ((<grouping_element> [, ...]))

GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions. can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name.

Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without GROUP BY, an aggregate produces a single value computed across all the selected rows). When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.

HAWQ has the following additional OLAP grouping extensions (often referred to as supergroups):

1
ROLLUP

A ROLLUP grouping is an extension to the GROUP BY clause that creates aggregate subtotals that roll up from the most detailed level to a grand total, following a list of grouping columns (or expressions). ROLLUP takes an ordered list of grouping columns, calculates the standard aggregate values specified in the GROUP BY clause, then creates progressively higher-level subtotals, moving from right to left through the list. Finally, it creates a grand total. A ROLLUP grouping can be thought of as a series of grouping sets. For example:

1
GROUP BY ROLLUP (a,b,c)

is equivalent to:

1
GROUP BY GROUPING SETS( (a,b,c), (a,b), (a), () )

Notice that the n elements of a ROLLUP translate to n+1 grouping sets. Also, the order in which the grouping expressions are specified is significant in a ROLLUP.

1
CUBE

A CUBE grouping is an extension to the GROUP BY clause that creates subtotals for all of the possible combinations of the given list of grouping columns (or expressions). In terms of multidimensional analysis, CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions. For example: GROUP BY CUBE (a,b,c) is equivalent to:

1
GROUP BY GROUPING SETS( (a,b,c), (a,b), (a,c), (b,c), (a), (b), (c), () )

Notice that n elements of a CUBE translate to 2n grouping sets. Consider using CUBE in any situation requiring cross-tabular reports. CUBE is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension. For instance, a commonly requested cross-tabulation might need subtotals for all the combinations of month, state, and product.

1
GROUPING SETS

You can selectively specify the set of groups that you want to create using a GROUPING SETS expression within a GROUP BY clause. This allows precise specification across multiple dimensions without computing a whole ROLLUP or CUBE. For example:

1
GROUP BY GROUPING SETS( (a,c), (a,b) )

If using the grouping extension clauses ROLLUP, CUBE, or GROUPING SETS, two challenges arise. First, how do you determine which result rows are subtotals, and then the exact level of aggregation for a given subtotal. Or, how do you differentiate between result rows that contain both stored NULL values and “NULL” values created by the ROLLUP or CUBE. Secondly, when duplicate grouping sets are specified in the GROUP BY clause, how do you determine which result rows are duplicates? There are two additional grouping functions you can use in the SELECT list to help with this:

  • grouping( [, …]) — The grouping function can be applied to one or more grouping attributes to distinguish super-aggregated rows from regular grouped rows. This can be helpful in distinguishing a “NULL” representing the set of all values in a super-aggregated row from a NULL value in a regular row. Each argument in this function produces a bit — either 1 or 0, where 1 means the result row is super-aggregated, and 0 means the result row is from a regular grouping. The grouping function returns an integer by treating these bits as a binary number and then converting it to a base-10 integer.
  • group_id() — For grouping extension queries that contain duplicate grouping sets, the group_id function is used to identify duplicate rows in the output. All unique grouping set output rows will have a group_id value of 0. For each duplicate grouping set detected, the group_id function assigns a group_id number greater than 0. All output rows in a particular duplicate grouping set are identified by the same group_id number.

The WINDOW Clause

The WINDOW clause is used to define a window

SELECT INTO

Defines a new table from the results of a query.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT [ALL | DISTINCT [ON ( <expression> [, ...] )]]
    * | <expression> [AS <output_name>] [, ...]
    INTO [TEMPORARY | TEMP] [TABLE] <new_table>
    [FROM <from_item> [, ...]]
    [WHERE <condition>]
    [GROUP BY <expression> [, ...]]
    [HAVING <condition> [, ...]]
    [{UNION | INTERSECT | EXCEPT} [ALL] <select>]
    [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]]
    [LIMIT {<count> | ALL}]
    [OFFSET <start>]
    [FOR {UPDATE | SHARE} [OF <table_name> [, ...]] [NOWAIT]
    [...]]

Description

SELECT INTO creates a new table and fills it with data computed by a query. The data is not returned to the client, as it is with a normal SELECT. The new table’s columns have the names and data types associated with the output columns of the SELECT. Data is always distributed randomly.

Parameters

The majority of parameters for SELECT INTO are the same as SELECT.

1
2
TEMPORARY,
TEMP

If specified, the table is created as a temporary table.

1
<new_table>

The name (optionally schema-qualified) of the table to be created.

Examples

Create a new table films_recent consisting of only recent entries from the table films:

1
2
SELECT * INTO films_recent FROM films WHERE date_prod >=
'2006-01-01';

Compatibility

The SQL standard uses SELECT INTO to represent selecting values into scalar variables of a host program, rather than creating a new table. The HAWQ usage of SELECT INTO to represent table creation is historical. It is best to use CREATE TABLE AS for this purpose in new applications.

SET

Changes the value of a HAWQ configuration parameter.

Synopsis

1
2
SET [SESSION | LOCAL] <configuration_parameter> {TO | =} {<value> | '<value>' | DEFAULT}
SET [SESSION | LOCAL] TIME ZONE {<timezone> | LOCAL | DEFAULT}

Description

The SET command changes server configuration parameters. Any configuration parameter classified as a session parameter can be changed on-the-fly with SET. See About Server Configuration Parameters. SET only affects the value used by the current session.

If SET or SET SESSION is issued within a transaction that is later aborted, the effects of the SET command disappear when the transaction is rolled back. Once the surrounding transaction is committed, the effects will persist until the end of the session, unless overridden by another SET.

The effects of SET LOCAL only last till the end of the current transaction, whether committed or not. A special case is SET followed by SET LOCAL within a single transaction: the SET LOCAL value will be seen until the end of the transaction, but afterwards (if the transaction is committed) the SET value will take effect.

Parameters

1
SESSION

Specifies that the command takes effect for the current session. This is the default.

1
LOCAL

Specifies that the command takes effect for only the current transaction. After COMMIT or ROLLBACK, the session-level setting takes effect again. Note that SET LOCAL will appear to have no effect if it is executed outside of a transaction.

1
<configuration_parameter>

The name of a HAWQ configuration parameter. Only parameters classified as session can be changed with SET. See About Server Configuration Parameters.

1
<value>

New value of parameter. Values can be specified as string constants, identifiers, numbers, or comma-separated lists of these. DEFAULT can be used to specify resetting the parameter to its default value. If specifying memory sizing or time units, enclose the value in single quotes.

1
TIME ZONE

SET TIME ZONE value is an alias for SET timezone TO value.

1
2
LOCAL,
DEFAULT

Set the time zone to your local time zone (the one that the server’s operating system defaults to).

1
<timezone>

The specification. Examples of syntactically valid values: ‘PST8PDT’

1
2
3
'Europe/Rome'
-7 (time zone 7 hours west from UTC)
INTERVAL '-08:00' HOUR TO MINUTE (time zone 8 hours west from UTC).

Examples

Set the schema search path:

1
SET search_path TO my_schema, public;

Set the style of date to traditional POSTGRES with “day before month” input convention:

1
SET datestyle TO postgres, dmy;

Set the time zone for San Mateo, California (Pacific Time):

1
SET TIME ZONE 'PST8PDT';

Set the time zone for Italy:

1
SET TIME ZONE 'Europe/Rome';

Compatibility

SET TIME ZONE extends the syntax defined in the SQL standard. The standard allows only numeric time zone offsets while HAWQ allows more flexible time-zone specifications. All other SET features are HAWQ extensions.

SET ROLE

Sets the current role identifier of the current session.

Synopsis

1
2
3
SET [SESSION | LOCAL] ROLE <rolename>
SET [SESSION | LOCAL] ROLE NONE
RESET ROLE

Description

This command sets the current role identifier of the current SQL-session context to be <rolename>. The role name may be written as either an identifier or a string literal. After SET ROLE, permissions checking for SQL commands is carried out as though the named role were the one that had logged in originally.

The specified <rolename> must be a role that the current session user is a member of. If the session user is a superuser, any role can be selected.

The NONE and RESET forms reset the current role identifier to be the current session role identifier. These forms may be executed by any user.

Parameters

1
SESSION

Specifies that the command takes effect for the current session. This is the default.

1
LOCAL

Specifies that the command takes effect for only the current transaction. After COMMIT or ROLLBACK, the session-level setting takes effect again. Note that SET LOCAL will appear to have no effect if it is executed outside of a transaction.

1
<rolename>

The name of a role to use for permissions checking in this session.

1
2
NONE
RESET

Reset the current role identifier to be the current session role identifier (that of the role used to log in).

Notes

Using this command, it is possible to either add privileges or restrict privileges. If the session user role has the INHERITS attribute, then it automatically has all the privileges of every role that it could SET ROLE to; in this case SET ROLE effectively drops all the privileges assigned directly to the session user and to the other roles it is a member of, leaving only the privileges available to the named role. On the other hand, if the session user role has the NOINHERITS attribute, SET ROLE drops the privileges assigned directly to the session user and instead acquires the privileges available to the named role.

In particular, when a superuser chooses to SET ROLE to a non-superuser role, she loses her superuser privileges.

SET ROLE has effects comparable to SET SESSION AUTHORIZATION, but the privilege checks involved are quite different. Also, SET SESSION AUTHORIZATION determines which roles are allowable for later SET ROLE commands, whereas changing roles with SET ROLE does not change the set of roles allowed to a later SET ROLE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT SESSION_USER, CURRENT_USER;

 session_user | current_user
 --------------+--------------
 peter        | peter

 SET ROLE 'paul';
 SELECT SESSION_USER, CURRENT_USER;
  session_user | current_user
 --------------+--------------
  peter        | paul

Compatibility

HAWQ allows identifier syntax (<rolename>), while the SQL standard requires the role name to be written as a string literal. SQL does not allow this command during a transaction; HAWQ does not make this restriction. The SESSION and LOCAL modifiers are a HAWQ extension, as is the RESET syntax.

SET SESSION AUTHORIZATION

Sets the session role identifier and the current role identifier of the current session.

Synopsis

1
2
3
SET [SESSION | LOCAL] SESSION AUTHORIZATION <rolename>
SET [SESSION | LOCAL] SESSION AUTHORIZATION DEFAULT
RESET SESSION AUTHORIZATION

Description

This command sets the session role identifier and the current role identifier of the current SQL-session context to <rolename> . The role name may be written as either an identifier or a string literal. Using this command, it is possible, for example, to temporarily become an unprivileged user and later switch back to being a superuser.

The session role identifier is initially set to be the (possibly authenticated) role name provided by the client. The current role identifier is normally equal to the session user identifier, but may change temporarily in the context of setuid functions and similar mechanisms; it can also be changed by SET ROLE. The current user identifier is relevant for permission checking.

The session user identifier may be changed only if the initial session user (the authenticated user) had the superuser privilege. Otherwise, the command is accepted only if it specifies the authenticated user name.

The DEFAULT and RESET forms reset the session and current user identifiers to be the originally authenticated user name. These forms may be executed by any user.

Parameters

1
SESSION

Specifies that the command takes effect for the current session. This is the default.

1
LOCAL

Specifies that the command takes effect for only the current transaction. After COMMIT or ROLLBACK, the session-level setting takes effect again. Note that SET LOCAL will appear to have no effect if it is executed outside of a transaction.

1
<rolename>

The name of the role to assume.

1
2
NONE
RESET

Reset the session and current role identifiers to be that of the role used to log in.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT SESSION_USER, CURRENT_USER;
 session_user | current_user
 --------------+--------------
  peter        | peter

SET SESSION AUTHORIZATION 'paul';
SELECT SESSION_USER, CURRENT_USER;
 session_user | current_user
--------------+--------------
 paul         | paul

Compatibility

The SQL standard allows some other expressions to appear in place of the literal <rolename>, but these options are not important in practice. HAWQ allows identifier syntax (<rolename>), while SQL does not. SQL does not allow this command during a transaction; HAWQ does not make this restriction. The SESSION and LOCAL modifiers are a HAWQ extension, as is the RESET syntax.

SHOW

Shows the value of a system configuration parameter.

Synopsis

1
2
SHOW <configuration_parameter>
SHOW ALL

Description

SHOW displays the current settings of HAWQ system configuration parameters. These parameters can be set using the SET statement, or by editing the hawq-site.xml configuration file of the HAWQ master. Note that some parameters viewable by SHOW are read-only — their values can be viewed but not set.

Parameters

1
<configuration_parameter>

The name of a system configuration parameter.

1
ALL

Shows the current value of all configuration parameters.

Examples

Show the current setting of the parameter search_path:

1
SHOW search_path;

Show the current setting of all parameters:

1
SHOW ALL;

Compatibility

SHOW is a HAWQ extension.

TRUNCATE

Empties a table of all rows.

Synopsis

1
TRUNCATE [TABLE] <name> [, ...] [CASCADE | RESTRICT]

Description

TRUNCATE quickly removes all rows from a table or set of tables.This is most useful on large tables.

Parameters

1
<name>

Required. The name (optionally schema-qualified) of a table to be truncated.

1
CASCADE

Since this key word applies to foreign key references (which are not supported in HAWQ) it has no effect.

1
RESTRICT

Since this key word applies to foreign key references (which are not supported in HAWQ) it has no effect.

Notes

Only the owner of a table may TRUNCATE it. TRUNCATE will not perform the following:

  • Run any user-defined ON DELETE triggers that might exist for the tables. Note: HAWQ does not support user-defined triggers.
  • Truncate any tables that inherit from the named table. Only the named table is truncated, not its child tables.

Examples

Empty the table films:

1
TRUNCATE films;

Compatibility

There is no TRUNCATE command in the SQL standard.

VACUUM

Garbage-collects and optionally analyzes a database.

Note: HAWQ VACUUM support is provided only for system catalog tables. VACUUMing a HAWQ user table has no effect.

Synopsis

1
2
3
VACUUM [FULL] [FREEZE] [VERBOSE] <table>
VACUUM [FULL] [FREEZE] [VERBOSE] ANALYZE
              [<table> [(<column> [, ...] )]]

Description

VACUUM reclaims storage occupied by deleted tuples. In normal HAWQ operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present on disk until a VACUUM is done. Therefore it is necessary to do VACUUM periodically, especially on frequently-updated catalog tables. (VACUUM has no effect on a normal HAWQ table, since the delete or update operations are not supported on normal HAWQ table.)

With no parameter, VACUUM processes every table in the current database. With a parameter, VACUUM processes only that table. VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table. This is a handy combination form for routine maintenance scripts. See ANALYZE for more details about its processing.

Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. VACUUM FULL does more extensive processing, including moving of tuples across blocks to try to compact the table to the minimum number of disk blocks. This form is much slower and requires an exclusive lock on each table while it is being processed.

Note: VACUUM FULL is not recommended in HAWQ.

Outputs

When VERBOSE is specified, VACUUM emits progress messages to indicate which table is currently being processed. Various statistics about the tables are printed as well.

Parameters

1
FULL

Selects a full vacuum, which may reclaim more space but takes much longer and exclusively locks the table. Note: A VACUUM FULL is not recommended in HAWQ. See Notes.

1
FREEZE

Specifying FREEZE is equivalent to performing VACUUM with the vacuum_freeze_min_age server configuration parameter set to zero. The FREEZE option is deprecated and will be removed in a future release.

1
VERBOSE

Prints a detailed vacuum activity report for each table.

1
ANALYZE

Updates statistics used by the planner to determine the most efficient way to execute a query.

1
<table>

The name (optionally schema-qualified) of a specific table to vacuum. Defaults to all tables in the current database.

1
<column>

The name of a specific column to analyze. Defaults to all columns.

Notes

VACUUM cannot be executed inside a transaction block.

A recommended practice is to vacuum active production databases frequently (at least nightly), in order to remove expired rows. After adding or deleting a large number of rows, it may be a good idea to issue a VACUUM ANALYZE command for the affected table. This will update the system catalogs with the results of all recent changes, and allow the HAWQ query planner to make better choices in planning queries.

VACUUM causes a substantial increase in I/O traffic, which can cause poor performance for other active sessions. Therefore, it is advisable to vacuum the database at low usage times. The auto vacuum daemon feature, that automates the execution of VACUUM and ANALYZE commands is currently disabled in HAWQ.

Expired rows are held in what is called the free space map. The free space map must be sized large enough to cover the dead rows of all tables in your database. If not sized large enough, space occupied by dead rows that overflow the free space map cannot be reclaimed by a regular VACUUM command.

VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed HAWQ tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table.

VACUUM FULL is not recommended in HAWQ. It is best to size the free space map appropriately. The free space map is configured with the following server configuration parameters:

  • max_fsm_pages
  • max_fsm_relations

Examples

Vacuum all tables in the current database:

1
VACUUM;

Vacuum a specific table only:

1
VACUUM mytable;

Vacuum all tables in the current database and collect statistics for the query planner:

1
VACUUM ANALYZE;

Compatibility

There is no VACUUM statement in the SQL standard.