Solr for Latin Languages

When configuring Solr in non-english languages, in this case in Portuguese, one usually wants:

In the case one may wish to stem words, it’s also easy. Though, one should be careful in using stem, since it may produce too much false positives.
To split words by white spaces, use StandardTokenizerFactory or WhitespaceTokenizerFactory.
For stop words removal, use StopFilterFactory.
In the case of stem, use SnowballPorterFilterFactory.
Here’s an snippet of a Portuguese schema.xml:
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Portuguese" />
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Portuguese" />
        <filter class="solr.ASCIIFoldingFilterFactory"/>
      </analyzer>
    </fieldType>

DB2 copy schema with tables

Sometimes one needs to automate a DB2 schema copy, clone or simple backup.
If the requirements are just to create a new schema and copy the tables and their content into the new schema, it seems to be a simple task.
But this simple task soon reveals to be a hard one, which can be achieved with a quite simple script.

The first step is to created a specific schema for this automation task, e. g. COPY_DATABASE_SCHEMA, so all the procedures will be created there.

The procedure that copies the table content is easy to code. Just declare a cursor that selects the source table data and load it into the target table.
Here is the procedure that performs the copy of the table content:

/**
* Copies, with replacement, the content from the source table
* into the target table.
* It will not check for success, i.e. no loading validation
* will be performed.
*
* @sourceTable: the full qualified source table name
* @targetTable: the full qualified target table name
*/

CREATE PROCEDURE COPY_DATABASE_SCHEMA.COPY_TABLE(
sourceTable VARCHAR(128),
targetTable VARCHAR(128))
LANGUAGE SQL
BEGIN
DECLARE v_version_number INTEGER DEFAULT 1;
DECLARE v_cursor_statement VARCHAR(32672);
DECLARE v_load_command VARCHAR(32672);
DECLARE v_sqlcode INTEGER DEFAULT -1;
DECLARE v_sqlmessage VARCHAR(2048) DEFAULT '';
DECLARE v_rows_read BIGINT DEFAULT -1 ;
DECLARE v_rows_skipped BIGINT DEFAULT -1;
DECLARE v_rows_loaded BIGINT DEFAULT -1;
DECLARE v_rows_rejected BIGINT DEFAULT -1;
DECLARE v_rows_deleted BIGINT DEFAULT -1;
DECLARE v_rows_committed BIGINT DEFAULT -1;
DECLARE v_rows_part_read BIGINT DEFAULT -1;
DECLARE v_rows_part_rejected BIGINT DEFAULT -1;
DECLARE v_rows_part_partitioned BIGINT DEFAULT -1;
DECLARE v_mpp_load_summary VARCHAR(32672) DEFAULT NULL;

SET v_cursor_statement =
   'DECLARE C1 CURSOR FOR SELECT * from ' || sourceTable;
SET v_load_command =
   'load from C1 of cursor insert into ' || targetTable;

CALL db2load(1, v_cursor_statement, v_load_command, v_sqlcode,
         v_sqlmessage, v_rows_read, v_rows_skipped,
         v_rows_loaded, v_rows_rejected, v_rows_deleted,
         v_rows_committed, v_rows_part_read,
         v_rows_part_rejected, v_rows_part_partitioned,
         v_mpp_load_summary) ;
END#

Please note that the target table must already be created in order for this procedure to work, and the script terminator is # and not the usual ;.

This procedure is the hardest part of the script. Copying the contents of any table requires a dynamic prepared query statement to use as a cursor, and the load command is not available inside a procedure, so the db2load procedure from SYSPROC must be used.

The next procedure creates the schema, the tables, and copies the tables contents:

/**
* Copies an entire schema into a new schema named with the current date.
* @sourceSchema: the source schema name
* @targetSchema: the target schema name
* @tableNameSelection: the tables name to include ('%' for all tables)
*/

CREATE PROCEDURE COPY_DATABASE_SCHEMA.COPY_DATABASE(
sourceSchema VARCHAR(50),
targetSchema VARCHAR(50),
tableNameSelection VARCHAR(150)
)
LANGUAGE SQL
BEGIN
-- Variables
DECLARE stmtSchema VARCHAR(250);
DECLARE stmtTableStructure VARCHAR(200);
DECLARE stmtTableContents VARCHAR(250);
DECLARE stmtAlias VARCHAR(250);
DECLARE tableName VARCHAR(128);
DECLARE numPages BIGINT;
DECLARE nError INTEGER DEFAULT 0;
DECLARE at_end SMALLINT DEFAULT 0;
DECLARE not_found CONDITION FOR SQLSTATE '02000';
DECLARE V_SQL VARCHAR(200);
DECLARE V_STMT STATEMENT;
DECLARE V_LOAD_STMT STATEMENT;
DECLARE TGT_TABLE_CUR CURSOR WITH HOLD WITH RETURN FOR V_STMT;
DECLARE LOAD_CUR CURSOR FOR V_LOAD_STMT;
DECLARE CONTINUE HANDLER FOR not_found SET at_end = 1;


-- Create schema
SET stmtSchema = CHAR('CREATE SCHEMA ' concat CHAR(targetSchema));
EXECUTE IMMEDIATE stmtSchema;
SET CURRENT SCHEMA targetSchema;


-- Copy tables and views
SET V_SQL = 'SELECT name, npages FROM SYSIBM.SYSTABLES
       WHERE CREATOR = '
'' || TRIM(sourceSchema) || '''
       AND NAME LIKE '
'' || TRIM(tableNameSelection) || '''
       order by name'
;
PREPARE V_STMT FROM V_SQL;
OPEN TGT_TABLE_CUR;

fetch_loop:
LOOP
FETCH TGT_TABLE_CUR INTO tableName, numPages;
IF at_end &lt;&gt; 0 THEN
  LEAVE fetch_loop;
ELSE  
  SET stmtTableStructure = 'CREATE TABLE ' ||
      targetSchema || '.' || tableName || ' LIKE ' ||
      sourceSchema || '.' || tableName ;
  EXECUTE IMMEDIATE stmtTableStructure;

  IF numPages &gt; 0 THEN
     CALL COPY_DATABASE_SCHEMA.copy_table(
             sourceSchema || '.' || tableName,
             targetSchema || '.' || tableName);
  END IF;
END IF;
END LOOP fetch_loop;

CLOSE TGT_TABLE_CUR;
END#

It dynamically issues DB2 SQL commands to create the schema and the tables structure and makes use of the previously created COPY_TABLE to copy the tables content. See the code comments for more information.

To use this just

CALL COPY_DATABASE_SCHEMA.COPY_DATABASE('sourceSchemaName','targetSchemaName','%');

These procedures have some limitations, for instance views are not supported.
Nevertheless, it is a good start to all that need to clone a schema or a table, since all the missing items can be gathered from the database catalog.

If you to drop an old schema, for instance to recreate it with a a new updated version, you can drop drop the schema with this procedure.

DB2 drop schema and tables

Dropping a non-empty schema in DB2 is not as easy as it seems. The schema must be empty in order to be dropped, i. e. it cannot have any objects in it.

The following script procedure will drop a schema after all tables have been dropped:

/**
*  Drops an entire schema even if it has tables.
*  Views are not supported.
*  @schemaName: the name of the schema to drop
*/

CREATE PROCEDURE COPY_DATABASE_SCHEMA.DROP_SCHEMA(
schemaName VARCHAR(50)
)
LANGUAGE SQL
BEGIN
-- Variables
DECLARE stmtDropSchema VARCHAR(250);
DECLARE stmtDropTable VARCHAR(250);
DECLARE tableName VARCHAR(128);
DECLARE at_end SMALLINT DEFAULT 0;
DECLARE not_found CONDITION FOR SQLSTATE '02000';
DECLARE V_SQL VARCHAR(200);
DECLARE V_STMT STATEMENT;
DECLARE TGT_TABLE_CUR CURSOR WITH RETURN FOR V_STMT;
DECLARE CONTINUE HANDLER FOR not_found SET at_end = 1;

-- Copy tables and views
SET V_SQL = 'SELECT name FROM SYSIBM.SYSTABLES
  WHERE CREATOR = '
'' || TRIM(schemaName) || '''
  AND CREATOR NOT LIKE '
'SYS%''';
PREPARE V_STMT FROM V_SQL;
OPEN TGT_TABLE_CUR;

fetch_loop:
LOOP
   FETCH TGT_TABLE_CUR INTO tableName;
   IF at_end &lt;&gt; 0 THEN LEAVE fetch_loop;
   ELSE
      SET stmtDropTable = 'DROP TABLE ' ||
                  schemaName || '.' || tableName;
      EXECUTE IMMEDIATE stmtDropTable;
   END IF;
END LOOP fetch_loop;

CLOSE TGT_TABLE_CUR;

-- Drop schema
SET stmtDropSchema = CHAR('DROP SCHEMA ' concat
            CHAR(schemaName) concat ' RESTRICT');
EXECUTE IMMEDIATE stmtDropSchema;
END#

Note that the procedure ends with a # instead of the regular ; terminator.
Also note that the procedure is created under the COPY_DATABASE_SCHEMA schema, which most likely may not exist in your database.

To use this just

CALL COPY_DATABASE_SCHEMA.DROP_SCHEMA('deleteSchemaName');