Say what? Analysing speech at SAS

I rarely get to talk about my work at SAS since mostly it’s experimental research and development, and therefore kept fairly hush hush. Recently, however, I had to the opportunity to write a guest article for data-informed.com about a project I’ve been working on for the past few months.

Having partnered up with a Scottish speech-to-text company, we built a system which could take reams of audio, transcribe them, and then perform some pretty sophisticated text analysis to work out the people and locations mentioned in the audio (and what their connections were), as well as identifying any number of discussion topics.

The article itself is very high level, so you don’t need to be technical to understand it. It’s an introduction to the concept, and some of the applications for the method.

Click the image below to go read the article.

 

Speech to Analytics
http://data-informed.com/text-analytics-could-unlock-strategic-value-hidden-speech/

 

 

How to invoke a SAS macro stored in a catalog

Having not done the Advanced Base SAS certification, this was a nightmare to work out. I’m documenting it here for my own future use, and to help anyone else who found themselves in the same situation as me.

What situation was that?

SAS Social Network Analysis can create networks from input data, and to do so it makes use of a pre-compiled “link macro” which is bundled with SNA. This link macro needs to be invoked from a base SAS program, but to do that, you need to tell SAS where to find it.

Note – There were literally zero Google hits for the exact name of this link macro. In case you’re curious, it’s called % sfs_net_main_link_macros.

Anyway I eventually found the location of these macros, in a catalog file.

Note – Not easy to find, and not documented. If anyone is in the same situation as me, it was in my <SASHome>\SASFoundation\9.3\snamva\macros folder. The macros are compiled into the sasmacr.sasb7cat file.

So I have a catalog file, how do I invoke the macro held in it?

Once you know, it’s very very simple.

  1. Copy the catalog file into your working directory
  2. Add a libname statement pointing to this working directory
  3. Use the SASMSTORE option

In other words your code should have the following statements:

libname mylib "D:\mylocation";
OPTIONS MSTORED SASMSTORE=mylib;

This will make the next invocation of the macro succeed, since SAS now knows it’s in your libname directory.

Removing duplicate rows in base SAS

If you ever need to remove duplicate rows from a SAS dataset, here’s an approach I use quite often.

Get your data.

Let’s assume it’s in the following format:

ID Name
123 John
456 Bob
123 John

Sort your data.

/* Step 1 - Sort data */
proc sort data=my_lib.my_dataset;

   /* Sort by a field which you want to be unique, 
   and which will be the same for duplicate rows */
   by id; 

run;

Which should give you the following:

ID Name
123 John
123 John
456 Bob


Remove Duplicates

Now that the data is in order, we can remove the duplicates, by only ever keeping the first entry which matches our unique ID.
-.

/* Step 2 - Get rid of duplicates */
data my_lib.my_dataset;

   /* Iterate through this dataset row by row */
   set my_lib.my_dataset;
   /* Grouping each row by the field we sorted on */ 
   by id; 
   /* And only keep a row if it’s the first */
   if first.id; 

run;

ID Name
123 John
456 Bob


Tadaa!

What happened there?

This approach has 3 facets:

  1. Grouping
  2. SAS’ special first.variable
  3. SAS’ feature of only appending (or “outputting”) a row to a dataset if there are no non-assignment statements which evaluate to false.


Grouping: So we effectively rearranged our data so that all identical IDs were grouped together. Given that the rows are identical and you only want to keep one of them, we choose to keep the first of each group.

First.variable: During execution, SAS will iterate through each row of my_data_set and adding it to a new dataset (which it will eventually overwrite my_data_set with). During each iteration, if it hits a row with the first use of an ID value (for example, 123), it will set first.id to true. On the next run, because it’s already seen the value 123 before, first.id is set to false. This gives us a handy flag which will only ever be toggled on unique rows.

Funny Statement Stuff: So how do we flag to SAS that when this value is set to true, to keep the row? When evaluating a data row if at any point we make any floating statement (i.e. not assigning a variable, or in an if or do loop) which evaluates to false, SAS will take that as a sign that it shouldn’t output that row i.e. in this case, it shouldn’t keep it.

So in simple terms, we’re saying – if you’ve seen this value before, don’t save it again.

SAS Dashboard – Fixing the “too many dials” issue

There’s an interesting feature of SAS BI Dashboard that caught me out when trying to put together some KPI gauges.

We wanted a set of dials that would show us the counts for several types of public security offences. Naturally we configured indicators using count as the measure, but found that the dashboard was showing an indicator per record in our system:

kpi1

 

With a bit of faffing around, I eventually found that this was due to the columns I had selected from my data. Specifically, the “uniqueID” column. This led to my big discovery:

The number of dials = The number of unique combinations of column values

In other words, because I had selected a column where the value was always unique, I got a dial for every row in my table. If I selected just the “category” column, I got the aggregated view I expected:

kpi2
Perhaps this just shows my SAS Dashboard naivety, but I thought I’d document it anyway.

 

SAS – How to Export/Import packages

It’s also possible to import/export metadata with the Wizard equivalent

My team and I have been developing a solution which involves a degree of SAS reports and related metadata. I set up a scheduled, automated backup of our information maps, reports, etc for posterity, and out of general paranoia. For this I used SAS’s command line export and import capabiltiies, which probably weren’t designed to be used that way, but which turned out to be really useful.

It took a wee bit of trial and error, so I thought I’d document it here.

(Handy reference link)

Note: If you’re puttying into your SAS server, make sure that your putty session has “Enable X11” ticked.

 Export

(Ignore any new lines in the text below – I’ve added those for readability)

/usr/local/SASHome/SASPlatformObjectFramework/9.3/ExportPackage -host “mysasmachine” -port 8561 -user myuser@saspw -password mypassword -package “myPackage.spk” -objects “/Shared Data/mySourceFolder(Folder)” -includeDep -subprop

  •  includeDep means that all objects that the export depends on are also exported
  • You can also specify “-types” with the types of files you wish to export
  • Without specifying “(Folder)” on mySourceFolder, all files will be exported “flat” i.e. without their folder hierarchy

 Import

 /usr/local/SASHome/SASPlatformObjectFramework/9.3/ImportPackage -host “mysasmachine” -port 8561 -user myuser@saspw -password mypassword -package “myPackage.spk” -target “/Shared Data/myTargetFolder(Folder)”  -subprop myPackage.subprop

  •  Without specifying “(Folder)” on myTargetFolder it would create a new folder with the name of the old parent folder in the new parent folder (e.g. /myTargetFolder/mySourceFolder)