GoSales but bigger

If you are looking for some sample data to play around with check out my post here.

The following describes how I generally create sample data, in this case using the GO SALES database from IBM and getting the Order tables up to approximately 133 million records.

Here we are going to multiply the number of orders.  Two tables are involved, ORDER_HEADER and ORDER_DETAIL and we want to maintain the relationship or ORDER_NUMBER between the two tables.

select count(*) as cnt From GOSALES.order_header;


select count(*) as cnt From GOSALES.order_details;


Create our target tables

create table gosales.test_order_details as (select * from GOSALES.order_details) with no data;
create table gosales.test_order_header  as (select * from GOSALES.order_header) with no data;


We will use the data that is already in these two tables, but ‘duplicate’ it with new order numbers, thus maintaining the relationship of the two tables.  Additionally we will change the Order Dates.


First find the maximum order number

select max(order_number) as last_order_number from gosales.order_header;


For Order Numbers we will add  1000000 to each order number so they remain unique and thus conform to the tables Primary Key.

For Order Dates we will use todays date and subtract a random amount of time, and for order close data we will do the same, except add a random amount of time.

Our SQL looks like this.

insert into gosales.test_order_header 
  ( select order_number + 001000000
         , retailer_name
         , retailer_name_mb 
         , retailer_site_code 
         , retailer_contact_code 
         , sales_staff_code 
         , sales_branch_code
         , current_timestamp -  ( RAND () *604800+ 1 ) seconds as order_date
         , current_timestamp +  ( RAND () *604800 + 1 ) seconds as order_close_date 
         , order_method_code 
     from gosales.order_header 

We now have 53267 records in our test table.
A sample showing the new order_number and dates.

select order_number, retailer_name, order_date, order_close_date from test_order_header fetch first 1 rows only with UR;

------------ --------------- ------------------- ------------------- 
1100001      Kavanagh Sports 2015-09-30 14:19:08 2015-10-04 13:12:43 

Repeat for as many as you like while incrementing the Order_number augmentation

insert into gosales.test_order_header ( select order_number + 002000000, retailer_name , 
insert into gosales.test_order_header ( select order_number + 003000000, retailer_name , 
insert into gosales.test_order_header ( select order_number + 300000000, retailer_name , 

Here I repeated the above 300 times going up to order_number + 300000000 and have 15,980,100 Orders records in my test order header table.

Now is a good time to run RUNSTATS and create an index on your new test order header table.

C:\Program Files\IBM\SQLLIB\BIN>db2 connect to gsdb user db2admin
Enter current password for db2admin:

   Database Connection Information

 Database server        = DB2/NT64 10.5.1
 SQL authorization ID   = DB2ADMIN
 Local database alias   = GSDB

C:\Program Files\IBM\SQLLIB\BIN>db2 runstats on table gosales.test_order_header
DB20000I  The RUNSTATS command completed successfully.

C:\Program Files\IBM\SQLLIB\BIN>db2 create index i01 on  gosales.test_order_header (order_number)
DB20000I  The SQL command completed successfully.

To get our Order Detail records join the existing Order detail table to our new test order header table. Here adding a random part-of-day to make up a ship date

insert into gosales.test_order_details 
(  select a.order_detail_code
        , b.order_number
        , b.order_close_date + ( RAND () *86400 + 1 ) seconds
        , a.product_number
        , a.promotion_code
        , a.quantity
        , a.unit_cost
        , a.unit_price
        , a.unit_sale_price
   from gosales.order_details a
     inner join test_order_header b
             on  b.order_number = a.order_number + 001000000

Repeat this to match the ORDER_HEADER table and you end up with some reasonably sized data to play with.

Here I’ve repeated it 300 times to match the 300x I used to generate data for the order header giving me the following

select count(*) as cnt from test_order_header;


select count(*) as cnt from gosales.test_order_details


Run runstats and add an index to this one as well

db2 runstats on table gosales.test_order_details 

db2 create index i02 on  gosales.test_order_details (order_number)

For some added volume you may want to look at creating a denormalized table combining the two tables plus other tables from the GoSales Database.

create view gosales.order_denorm_v as 
  select h.order_number
        ,b.city          as sales_city
        ,b.prov_state    as sales_prov_state
        ,b.postal_zone   as sales_postal_zone
        ,b.country_code  as sales_country_code
        ,c.country_en    as sales_country
   from gosales.test_order_header h
        inner join gosales.test_order_details d
                on h.order_number= d.order_number
        inner join gosales.order_method o
                on o.order_method_code = h.order_method_code
        inner join gosales.product p 
                on p.product_number = d.product_number
        inner join gosales.branch b
                on b.branch_code = h.sales_branch_code     
        inner join gosales.country c
                on b.country_code = c.country_code

create table gosales.order_denorm as ( select * from gosales.order_denorm_v ) with no data
insert into gosales.order_denorm (select * from gosales.order_denorm_v)

Not quite ready for ORM

Example using SQLAlchemy without ORM

references : http://docs.sqlalchemy.org/en/rel_0_9/core/connections.html

import sqlalchemy
from sqlalchemy import *

engine     = sqlalchemy.create_engine('ibm_db_sa://user:password@localhost:50000/gsdb').execution_options(autocommit=True)
connection = engine.connect()

results    = connection.execute(sql).fetchall()

for row in results:

  print "--------------------"
  print "creator : ", row['creator']
  print "name    : ", row['name']
  print "type    : ", row['TYPE']   
  print "ctime   : ", row['ctime']
  print "remarks : ", row['remarks']

A minor step towards SQLAlchemy proper

import sqlalchemy
from sqlalchemy import * 
from sqlalchemy.orm import * 

engine     = create_engine('ibm_db_sa://user:password@localhost:50000/gsdb')
meta       = MetaData(bind=engine)

Session    = sessionmaker(bind=engine)
session    = Session()

systables  = Table('SYSTABLES', meta ,autoload=True, autoload_with=engine, schema='SYSIBM')

query      = session.query(systables.c.creator,systables.c.name,systables.c.TYPE,systables.c.ctime,systables.c.remarks).filter(and_(systables.c.creator == 'GOSALES'))
results    = query.all()

#for c in query.column_descriptions:
#  print c['name']

for row in results:
  print "--------------------"
  for i in xrange( 0, len(row) ):
    print query.column_descriptions[i]['name'] + ' : ' + str(row[i])

Adding another Drive to your Windows EC2 server

Adding an additional drive to your Windows EC2 server can be done while the server is up and running.

Go to EC2 console and copy the instance ID of the server you want to add a drive to.



On the right hand side of the EC2 Services page else Volumes

And click on Create Volume



Select the Type & Size and click create



Right click the Volume after it has finished Creating and select Attach Volume



Enter Instance ID of the Server to attach it to and the Device will be automatically populated.


Click Attach.


In Windows go to Disk Manager and you will see your new disk as ‘Offline’


Right Click your new Disk and select Online


Right Click your new Disk again and select Initialize Disk and accept the defaults when prompted



This time right click the ‘space’ for the disk and select ‘New Simple Volume’


Follow the wizard and accept all the defaults.

You now have another Disk Drive on your server



Changing the size of an existing Drive on Windows EC2 Servers



Boto3 & S3

List all files in your buckets.

Note Boto3 uses your AWSCLI configuration file & credentials file located in ~/.aws/ For windows this will be C:\Users\[your user]\.aws\

import boto3

s3 = boto3.resource('s3')
buckets = s3.buckets.all()

for bucket in buckets:
  print 'bucket : s3://' + bucket.name + '/'

  for key in bucket.objects.all():
    if key.key.endswith('/'):
      print 'folder : s3://' + bucket.name + '/' + key.key
      print 'file : s3://' + bucket.name + '/' + key.key
C:\temp>c:\python27\python sthree.py
bucket    : s3://grainydata/
folder    : s3://grainydata/temp/
file      : s3://grainydata/temp/FactProductInventory.csv
folder    : s3://grainydata/temp/subdir/


Installing Boto on Windows for python

python -m pip install boto3

Netezza Emulator

Over at the IBM PureData-Netezza Developer Network (NDN) community you will find in the top right hand corner ‘Important Bookmarks’

Clicking on the IBM Netezza Software Emulator will take you to the IBM Netezza Developer Network logon.

You do need an ‘IBM ID’  and registering does mean handing over your name, address and phone number but once you have done that you get to download the following goodness



As well as items such as clients for Linux & Windows, Fluid Query


There is also IBM Fluid Query 1.5 if you are in to that.

The emulator is approximately 2GB and http downloads were problematic but after messing around to get IBM Download Directory working it downloaded fine.

In this case I downloaded IBMNetezzaEmulator7.1.exe

Double click to Install

If you get the following message we have to install from the command line


Use the following command

IBMNetezzaEmulator7.1.exe -I GUI

Default insallation directory will be C:\Users\[your user name]\IBM\IBM Netezza Software Emulator for Developers

Once installed rune the Console executable

“C:\Users\[your user name]\IBM\IBM Netezza Software Emulator for Developers\console.exe”

And click the Start button


Once fully up the Console will look something like the following and you will have a Host IP Address to connect to.


If you already have Putty installed you can click the IP Address hyperlink


Some basics to make sure we are up and running

[nz@netezza ~]$ env | grep NZ_
[nz@netezza ~]$
[nz@netezza ~]$ nzstate
System state is 'Online'.
[nz@netezza ~]$
[nz@netezza ~]$ nzsql -db SYSTEM -u admin -c "SELECT username, createdate, current_timestamp FROM _V_USER"


ADMIN    | 2014-02-28 09:36:22 | 2015-09-26 03:51:56

(1 row)

If you prefer GUI tools, go back and download the Netezza Client.

In this case I have downloaded “C:\Users\{your user name}\Downloads\nz-winclient-v7.2.0.0.zip”

Unzip and install the following

Nzadmin – (admin\nzsetup.exe)great for viewing whats running and overall status of the server

ODBC drivers   driversnzodbcsetup.exe  (If you are keen on a specific SQL IDE then JDBC drivers might be what you might want to install instead)

Admin, database user and data loading guide re found in \doc




Use the same IP Adress as shown in the Console and the default admin user and password



You may receive something similar to the following if you haven’t downloaded exactly the same verion of Client and Emulator.  Mostly you can ignore this.


Connecting successfully you can navigate to see that you have 4 SPU’s


You can also see Active Queries


Seeing as no usage has occurred there will be no databases or Query History.



Aginity Workbench for Netezza is so far the best suited IDE I have found although others often are fine.  The specific areas where Aginity seems to be ahead of the others is include showing Zone Maps, Distribution, context menu items specific to Netezza as well as being able to handle not having schemas.

You can get Aginity Workbench for Netezza from here http://www.aginity.com/workbench/netezza/  after a quick registration.

After you have installed Aginity, similarly use the IP Address shown in the Console to connect


Note, if Netezza ODBC is not populated in the drop down box as seen above, go back and install the ODBC drivers


Get your own server on the Cloud

Assuming you create and destroy lots of VM’s and don’t treat your VM images like pets!

If you are like me then you like to try new software and eventually end up clogging up your computer with junk.

After years of going through cycles of reinstalling Windows, I moved to using VM Workstation and  having a ‘clean Windows’ image that I would clone for trying out software.

After paying a few times to keep VM Workstation up to date, I thought it was time to move to AWS EC2 instances for such things.   In some cases Amazon’s AMI (Amazon Machine Image) market place often already had images with the software already installed.

Let’s walk through the process.

If you are new to Amazon Web Services, go to the getting started page and sign up. https://aws.amazon.com/getting-started/

Once you have an account go to the AWS Console  https://console.aws.amazon.com/console/

Here you see quite a large list of services.  The one we are looking for is EC2 located in the top left of the page.  EC2 stands for ‘Elastic Compute Cloud’ As the description says, it’s all about Virtual Servers in the cloud; Just what we are looking for.


Clicking on EC2 takes you to the EC2 Dashboard where it list’s your servers and related items such as volumes (disk).


Clicking on Running Instances gives you a list of your servers.  Here I have 2 that are not currently running; (in a stopped state)


Getting your own Server in the Cloud

Click Launch Instance and go through the steps provided by the AWS console.  

Step 1 Choose an AMI (Select the Server Operating System)


Step 2 Choose Instance Type (Select the server size)
For simple purposes I usually select the following

Comparisons of server specs can be found here https://aws.amazon.com/ec2/instance-types/

If you are keen on CPU power, try out this matrix that explains ECU vs CPU/Cores.   http://aws.amazon.com/ec2/instance-types/#instance-type-matrix

For windows I find that while available for selection, anything less than 8GB of memory is not pleasant to interact with.   You can resize it down later if you are not using it for a desktop experience.

Remember the bigger the server, the more it costs per hour to run.   If you leave it up and aren’t using it, it’s still costing you.  So the behaviour you should be moving to is; start the server -> use the server -> stop the server and when you no longer need it -> terminate it.
Pricing for servers are located here  https://aws.amazon.com/ec2/pricing/


Step 3 Configure Instance (Network and other options)
Use defaults


Step 4 Add storage
Take the defaults to get 30 GB SSD


Step 5 Tag Instance
Create a Tag


Step 6 Configure Security Group
Update the Security Group name to something that makes sense to you

Read the warning!

As you will only be connecting to the server select ‘My IP’ from the Source drop down.

If your IP address changes or you try and RDP from a different location, you will need to update the Security rule before you can connect to the server


Step 7 Review and Launch
If you have this warning, you missed the My IP RDP rule above;  go back and do it!


Step 8  Create a Key Pair
Enter a name that you will remember is associated with this server.  Us a name that doesn’t contain spaces i.e. use underscores instead of spaces and click “Download key Pair”.

A file with the name you entered and a “.pem” file exstenion will be downloaded to your browsers download directory;  Move the file to a safe location.  If it’s a server you intend on keeping; move it to a really really safe place and take a backup to another location.


Step 9 Click the “Launch Instances” button


Step 10 View Instances
Click on the View Instances button or go back to the EC2 Dashboard and you will see your server is starting


Step 11 RDP
Once Initialization has completed you can RDP to the server

Right click the server and select Connect


Click “Download Remote Desktop File”
A file with the name of the IP address of the server and a file extension of “.rdp” will be downloaded to your browsers download directory.  Move this file to a safe place

Click “Get Password”
Select the previously downloaded .pem file

Click “Decrypt Password”
You will be presented with the Administrator (user account) and the password for the account.

Remember the password!

Click Close and you are done with the AWS Console.

Locate the .rdp file previously downloaded and double click it
Click Connect if you receive the following warning

Select Other and enter the Administrator user name and the password previously decrypted.

If you see the following Warning Click Yes and you should be connected to your Server.

Congratulations you are connected to your new server that is in the Cloud!



To Stop (Pause) your server go back to the AWS EC2 Dashboard, right click your server and select Stop



Sample Databases

MS SQL Server  Adventure Works

To load the sample database for MS SQL Server simply create an EC2 instance using a SQL Express AMI from the AWS market place and attach the MDF file to the instance.

Note, remove the Log file when using the GUI

DB2 Go Sales

If you download and install DB2 Express-C then you will get a Sample database already created and preloaded with a small amount of data.

DB2 Express-C Installation

For DB2 download and install DB2 Express-C on to an EC2 instance.   For this example I used the same Windows server I had used above for MS SQL Server.

Installing DB2 Express-C is straight forward, just follow the prompts.

Note, the installation of DB2 you will create a ‘db2admin’ account and you will be required to set the password.

If you receive the following error message, it does not necessarily mean that your password is too short, it means that the password does not meet the minimum complexity policies for passwords.  Use a password that contains both upper and lower case characters as well as numbers and do give it a reasonable length.


Creating and Loading GoSales Database

Have your db2admin or other account details handy, extract the Downloaded file and run setupGSDB.bat


If you are using IBM Data Studio your connection might look something like this


RDP to Windows AWS EC2

A reminder of how easy it is!

  • In the EC2 Console
  • Right click the instance and select Connect
  • Download the RDP file
  • Click on the button ‘Get Password’  (Have your pem file handy)

Once you are in you probable want to start downloading some stuff

To toggle Internet Explorer Enhanced Security

How to disable Internet Explorer Enhanced Security Configuration (IE ESC) in Windows Server 2012



Connecting to Your Windows Instance Using RDP


AWS CLI for Windows

Installing on Windows

  • http://aws.amazon.com/cli/
  • On the right hand side of the page are the downloads
  • Click on 64-bit for Windows to download the installer for Windows 10
  • Run the downloaded file ( AWSCLI64.MSI ) to install
  • The default installation path is C:\Program Files\Amazon\AWSCLI\
  • Test by running the following on the command line
    "C:\Program Files\Amazon\AWSCLI\aws.exe" --version

    Expected output

    aws-cli/1.8.6 Python/2.7.9 Windows/8
  • To use it from anywhere, add the path to the executable to the path environment variable
    set path=%path%;c:\program files\amazon\awscli\
  • Test again
    C:\temp>aws --version

    Expected output

    aws-cli/1.8.6 Python/2.7.9 Windows/8