.brian & Verica:
NCL "tutorial"

Opening comments

How to deal with NCL. Normally, NCL code is written in a plain text file, and by convention we use the ".ncl" extension even though it isn't necessary. This means we have to be pretty good with a text editor. There are many options we use various flavors of emacs, but others like vi or nedit or whatever.

To run an NCL script, all that needs to be done is to type
>> ncl scriptname
at the prompt (denoted >>). In old versions we had to direct the input by typing "ncl < scriptname" but that isn't necessary any longer.

This assumes you have NCL installed correctly, including setting the environment variable NCARG_ROOT. Instructions and the binaries are available from http://www.ncl.ucar.edu/Download/index.shtml

NCL also has an interactive mode. To enter it, just type "ncl" at your prompt. A small message appears, and then a "ncl #>" which acts as your prompt within the NCL "environment."

[brian@siegfried:/]$ ncl
Copyright (C) 1995-2004 - All Rights Reserved
University Corporation for Atmospheric Research
NCAR Command Language Version 4.2.0.a032
The use of this software is governed by a License Agreement.
See http://ngwww.ucar.edu/ncl/ for more details.
ncl 0> x = 1
ncl 1> y = 2
ncl 2> print(y^x)
(0) 2
ncl 3>

The advantage of using the interactive environment is that you can quickly perform real-time operations or make sloppy plots without saving and re-running a script over and over. The downside is that it is a little cumbersome, especially for performing more elaborate calculations. To leave the NCL environment, just type "exit" at the prompt.

Getting started

I've provided several scripts to work through some basic NCL functionality. I suggest starting with tutorial_script.ncl, which covers some of the entry-level issues of NCL. It might not be the best way to start learning "programming," but if you've seen some Matlab, IDL, python, Fortran, C, C++, or other programming language before, it should be adequate.

Basic syntax is much like other languages. Functions or procedures have names, like funcname, and arguments that are used as input, such as arg1, arg2, ... So a to use a function, just give it input, and it will supply output, e.g.,
output = funcname(arg1,arg2)
If something goes wrong, NCL will spit out errors or warnings.

>> tutorial_script.ncl

Below is most of the text from the file tutorial_script.ncl, turned into pretty HTML. I highly recommend digging into the file itself when you start learning NCL.

For most practical purposes, there are a few sets of procedures and functions provided with NCL. To use them, you have to load them by saying load. This is usually done at the top of your script.

load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/shea_util.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"

Use NCL like a Calculator

NCL distinguishes between integer, float (r4), and double (r8) values. The default is float for values with a decimal point and integer when it looks like an integer.

There are also strings in NCL, which are letters or numbers within quotation marks.

To print to the screen, use "print," or else you need to assign the output of the operation(s) to a variable (see below).

A. integer operations

print(3*12)

B. floating point operations, and built in functions

print(1.e9*cos(0.333))

Use NCL like a better calculator

Assign variable names to numbers or result of calculations.

n1 = 1.e10
pi = 4.*atan(1.)
n2 = 0.5
n3 = 180.
pio180 = pi/n3
something = n1*pio180 + n2/2.5 - n3^n2

to see the output:
print(something)

Conditional statements (boolean logic)

NCL easily handles simple "if statements," and includes "else" statements. However, unlike some languages, there is not "elseif" statement. You have to construct you own elseif by nesting standard if statements.

Also, a semicolon is the NCL comment character. Anything on a line following a ";" is not interpretted.

m = True ;logical variables can be "True" or "False"
if m then
print("m must be true")
else
print("i guess m must be false")
print(" m = " + m)
end if

side note: don't make your logical-type variables as string-type, or you won't get good results
mm = "True"
print(mm) ; see that it is a string.
if mm then
print("mm must be true")
end if
; this should return an error, since the argument isn't logical

Arrays and vectors

For most of us, arrays are the way we think of data. Arrays are just ordered lists of numbers (or other data). If there is just a single list of numbers in some order, think of it as a vector. If it is an array of numbers, it might be like a matrix. Imagine a map of the world with grid lines every degree of longitude and latitude; we could could represent that grid as a matrix of dimensions 360x180. We can extend that to higher dimensions as well, for example height and time. NCL doesn't care how many dimensions your data has, as you'll see below.

Vectors (1D arrays)

fspan stands for "float span," and makes a vector (1-D array) starting at the first argument, spanning to the second argument, using the value of the third argument as the number of equally spaced steps.

ispan is to make a vector of integers. The third argument is the stride (step size) instead of the number of elements.

(This difference in the third argument is the source of much frustration, even to experienced users!)

a_vector = fspan(1.,45.,50)
b_vector = ispan(1,10,2)
c_vector = (/10., 12., -33., 16., -2., 4.55, 100.89, 42./)
empty_vector = new( dimsizes(a_vector), float )

new is a way to allocate space for a variable. It is necessary when you need to assign values element by element (or row by row, etc) and you don't want to overwrite an existing variable. You can't just keep making an existing variable larger and larger using a loop (or you could, but it would be much more expensive and complicated).

The downside of allocating empty variables is that you have to know the size beforehand, which means you have to think!

filled_vector = 3.+empty_vector
random_vector = random_normal(2.5,5.,dimsizes(a_vector))
;there are a few random functions in NCL.

a_matrix = new( (/3, 8/), float)
a_matrix(:,0) = 1.
a_matrix(0,:) = 2.
a_matrix(2,7) = 27.

size_vector = dimsizes(a_matrix)
size will be a vector when there's more than 1 dimension in the matrix

rank_value = dimsizes(dimsizes(a_matrix))
rank of an array is a scalar, just tells how many dimensions you have

HINT: I find using "size" and "rank" to be very convenient in many data analysis scripts, especially when writing them to be flexible (used with an arbitrarily sized dataset).

b_matrix = new(dimsizes(a_matrix),typeof(a_matrix))
[More flexible code, you don't need to hard-wire a number into your script, this will work whenever you want b_matrix to be the same size of a_matrix (and same type).]

HINT: NCL is pretty good at "coecion," i.e., changing the type of variable, but it can get confused, esecially when you try to use double variables in a calculation expecting float variables, building your script using the typeof function can eliminate most of these problems.

Do Loops

do indx = 0,size_vector(0)-1
b_matrix(indx,:) = indx
end do

HINT: Have you noticed that NCL uses 0-based indexing yet? In the above, if you don't have the "-1" for the indx values, your loop will execute until it tries to use indx = size_vector(2), and will then break. Why? Because size_vector(0) = 3 and size_vector(1) = 8, but that's it. Asking for an element that isn't in an array is a sure way to kill your script, often after most of your calculation is done!

CAUTION: NCL is not efficient in loops! It is worth putting effort into
avoiding large loops whenever possible. It is especially valuable to
avoid nested loops when both loops are large. This is often called
"vectorizing" code. Almost every element-by-element operation is
easily vectorized in NCL.

Vectorize code

c_matrix = b_matrix * sqrt(a_matrix)

This replaces a costly loop:

c_matrix = new(dimsizes(b_matrix))
bdims = dimsizes(b)
do i = 0,bdims(0)-1
do j = 0,bdims(1)-1
c_matrix(i,j) = b_matrix(i,j) * sqrt(a_matrix(i,j))
end do
end do

Named coordinates & coordinate arrays

imagine a geophysical dataset with dimensions [time (31 days), height (30 pressure levels), latitude (by 2 degrees), longitude (by 2 degrees)]

DATA = new( (/31,30,90,180/),double)
printVarSummary(DATA)

now you want to name the dimensions
DATA!0 = "time"
DATA!1 = "lev"
DATA!2 = "lat"
DATA!3 = "lon"

printVarSummary(DATA)

but the dimensions might actually be coordinates
and they might have values
time_coord = ispan(1,31,1)
lev_coord = fspan(1000.,2.,30)
lat_coord = fspan(-89.,89.,90)
lon_coord = fspan(0,360,180)

coordinate can have attributes too
lev_coord@long_name = "Pressure Levels"
lev_coord@units = "hPa"
lat_coord@units = "degrees north"
lon_coord@units = "degrees east"

DATA&time = time_coord
DATA&lev = lev_coord
DATA&lat = lat_coord
DATA&lon = lon_coord

printVarSummary(DATA)

the data itself can contain meta data
DATA@long_name = "Brian's data"
DATA@units = "brians per second"

printVarSummary(DATA)

some potentially useful aspect is reordering and sampling:
DATAre = DATA({lev|1000:500},lat|:,lon|:,time|0)
printVarSummary(DATAre)

DATAre = (/random_normal(1.,1.,dimsizes(DATAre))/)
printVarSummary(DATAre)
print(avg(DATAre))

NCL has 3 distinct kinds of indexing, which can be confusing at first.
1. standard, Matlab/Fortran/etc style subscripts, so x(i,j,k) or x(i:i+1,j,k) or whatever
2. coordinate subscripts -- this uses values from the coordinate arrays,
so when you only want data from 45N-55N, x = DATA(:,:,{45:55},:)
3. named subscripts -- lets you reorder arrays, so when you want
just the data from 45N-55N and you need to have latitude be the rightmost dimension:
x = DATA(time|:,lev|:,lon|:,{lat|45:55})
This may not seem useful at first, but once you get used to it, you won't know
how you lived without it.
much more information available at http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/


MASKING

now imagine that this data came from somewhere real
and means something
but we know the instrument doesn't do well when the value
is below 1. which will bias the results
We want to mask those points that have bad measurements.
This is fairly easy in NCL.
You can do it by hand (loops), or you can use "mask"

DATA_filtered = mask(DATAre, DATAre .lt. 1., False)
copy_VarCoords(DATAre,DATA_filtered)
print(avg(DATA_filtered))

now all the points that are below 1 are set to the value
of DATAre@_FillValue (-9999) and are treated as missing values

Just for kicks, lets imagine this is geophysical data and plot it. Don't worry about the details here, we've provided a separate plotting tutorial, so for now you could just cut and paste this. It will make a window pop up. Click in the window (or push a button while it is the current window) to make the plot go away.

wks = gsn_open_wks("x11","random map")
gsn_define_colormap(wks,"testcmap")
mres = True
mres@gsnSpreadColors = True
mres@cnFillOn = True
;mres@cnFillMode = "RasterFill" ; this makes it raster-ized
mres@cnLinesOn = False
mres@cnRasterModeOn = True
mapplot = gsn_csm_contour_map(wks, DATA_filtered({1000.},:,:), mres)

compare with the original, unfiltered data:
mapplot = gsn_csm_contour_map(wks, DATAre({1000.},:,:), mres)

Doing more

I've also prepared some additional resources. The first one is a simple example of using a custom built function, which can often clean up your code and save you from writing the same code block over and over. It is called circle_example.ncl. It uses another file called circle_functions.ncl.

Second, I adapted Gretchen's radiosonde example [link] to show that NCL can do basically the same thing. This also serves as an example of reading ascii data as opposed to netCDF. The file is called sonde.ncl.