# Setting Up a Lexical Analyser and Parser in Ruby

I wrote this post as I was setting up the lexer and parser for Rubex, a new superset of Ruby that I’m developing.

Let’s demonstrate the basic working of a lexical analyser and parser in action with a demonstration of a very simple addition program. Before you start, please make sure rake, oedipus_lex and racc are installed on your computer.

### Configuring the lexical analyser

The most fundamental need of any parser is that it needs string tokens to work with, which we will provide by way of lexical analysis by using the oedipus_lex gem (the logical successor of rexical). Go ahead and create a file lexer.rex with the following code:

In the above code, we have defined the lexical analyser using Oedipus Lex’s syntax inside the AddLexer class. Let’s go over each element of the lexer one by one:

macro

The macro keyword lets you define macros for certain regular expressions that you might need to write repeatedly. In the above lexer, the macro DIGIT is a regular expression (\d+) for detecting one or more integers. We place the regular expression inside forward slashes (/../) because oedipus_lex requires it that way. The lexer can handle any valid Ruby regular expression. See the Ruby docs for details on Ruby regexps.

rule

The section under the rule keyword defines your rules for the lexical analysis. Now it so happens that we’ve defined a macro for detecting digits, and in order to use that macro in the rules, it must be inside a Ruby string interpolation (#{..}). The line to the right of the /#{DIGIT}/ states the action that must be taken if such a regular expression is encountered. Thus the lexer will return a Ruby Array that contains the first element as :DIGIT. The second element uses the text variable. This is a reserved variable in lex that holds the text that the lexer has matched. Similar the second rule will match any character (.) or a newline (/n) and return an Array with [text, text] inside it.

inner

Under the inner keyword you can specify any code that you want to occur inside your lexer class. This can be any logic that you want your lexer to execute. The Ruby code under the inner section is copied as-is into the final lexer class. In the above example, we’ve written an empty method called do_parse inside this section. This method is mandatory if you want your lexer to sucessfully execute. We’ll be coupling the lexer with racc shortly, so unless you want to write your own parsing logic, you should leave this method empty.

### Configuring the parser

In order for our addition program to be successful, it needs to know what to do with the tokens that are generated by the lexer. For this purpose, we need racc, an LALR(1) parser generator for Ruby. It is similar to yacc or bison and let’s you specify grammars easily.

Go ahead and create a file called parser.racc in the same folder as the previous lexer.rex and Rakefile, and put the following code inside it:

As you can see, we’ve put the logic for the parser inside the AddParser class. Notice that in oedipus_lex, only the parsing logic exists inside the class and everything else (i.e under header and inner) exists outside the class. Let’s go over each part of the parser one by one:

This is the core class that contains the parsing logic for the addition parser. Similar to oedipus_lex, it contains a rule section that specifies the grammar. The parser expects tokens in the form of [:TOKEN_NAME, matched_text]. The :TOKEN_NAME must be a symbol. This token name is matched to literal characters in the grammar (DIGIT in the above case). token and expr are varibles. Have a look at this introduction to LALR(1) grammars for further information.

The header keyword tells racc what code should be put at the top of the parser that it generates. You usually put your require statements here. In this case, we load the lexer class so that the parser can use it for accessing the tokens generated by the lexer. Notice that header has 4 hyphens (-) and a space before it. This is mandatory if your program is to not malfunction.

inner

The inner keyword tells racc what should be put inside the generated parser class. As you can see there are two methods in the above example - next_token and prepare_parser. The next_token method is mandatory for the parser to function and you must include it in your code. It should contain logic that will return the next token for the parser to consider. Moving on the prepare_parser method, it takes a file name that is to be parsed as an argument (how we pass that argument in will be seen later), and initialzes the lexer. It then calls the parse_file method, which is present in the lexer class by default.

The next_token method in turn uses the @lexer object’s next_token method to get a token generated by the lexer so that it can be used by the parser.

### Putting it all together

Our lexical analyser and parser are now coupled to work with each other, and we now use them in a Ruby program to parse a file. Create a new file called adder.rb and put the following code in it:

The prepare_parser is the same one that was defined in the inner section of the parser.racc above. The do_parse method called on the parser will signal the parser to start doing it’s job.

In a separate file called text.txt put the following text:

Oedipus Lex does not have a command line tool like rexical for generating a lexer from the logic specified, but rather has a bunch of rake tasks defined for doing this job. So now create a Rakefile in the same folder and put this code inside it:

Running rake parser will generate a two new files - lexer.rex.rb and parser.racc.rb - which will house the classes and logic for the lexer and parser, respectively. You can use your newly written lexer + parser with a ruby adder.rb text.txt command. It should output 4 as the answer.

You can find all the code in this blogpost here.

# Random Thoughts on Music Theory.

Title explains what this is about.

### 16 August 2016

Was checking out this video (Contortionist - Language 1) and learned about standard C# tuning on a 6 string bass guitar today. He’s used tuning G# C# F# B E A. Killer bass tone. This wiki says something different about C# standard, though.

# Searching for Graduate Degree Courses in USA and Japan.

I’m currently searching for master’s degree courses in various colleges in Japan and USA. I want to pursue a Computer Science degree specializing in distributed systems. Searching for the right graduate degree courses can get depressing. Here I’m posting various links and leads that I came across through the course of my search.

### 5 August 2016

Searching for options in Japan and started with University of Tokyo. Most of their courses seem to be in Japanese but there are a few in English as well. This page has some starting info about the English courses. Also found a collection of colleges here.

So apparently the process for getting into a Japanese college for Master’s can take two paths. The first is like so:

1. Talk to a professor and gain a research assistantship with him/her.
2. Give an exam and enroll for a 2 year master’s course if you pass that exam.

The second is directly give the exam, but I’m not sure how that can be done since they all appear to be written examinations that are conducted in Japan.

### 16 August 2016

Having a look at the graudate schools of University of Tokyo, Tokyo Insitute of Technology and Kyoto University today.

University of Tokyo

UoT seems to have some special selection process for international applicants (link), though it’s not useful for me. There’s a decent contact page here. They’ve also put up a check list for applications here.

Tokyo Inst. of Technology

This also has a good graduate program.Tokyo Inst. of Technology has an international graduate program for overseas applicants. The courses seems to be in English mostly. The school of computer science has also participated in the IGP and accept the IGP(A), IGP(B)3 and IGP(C) types of applicants. I seem to be most qualified for the IGP(A) and IGP(C) applications.

The ‘Education Program of Advanced Information Technology Leaders’ seems to be most relevant to my case. This looks like a good PDF to brief about the program.

All the courses require students to arrange for a Tokyo Tech faculty member to serve as their academic supervisor. This handy web application allows you to do that. They also have the MEXT scholarship for outstanding students.

University of Kyoto

### 17 August 2016

Continuing my research on Tokyo Inst. of Technology. The PDF I pointed to yesterday brought out an interesting observation - IGP(A) students and IGP(C) students seem to have different course work.

### 18 August 2016

It seems the IGP C program at Tokyo Tech. is best for me. I will research that further today. Most probably I’ll need to do a 6 month research assistantship first. Here’s a list of the research groups of the Computer Sci. deptartment at Tokyo Tech.

### 20 August 2016

Tokyo Inst. of Technology

Found a list of faculties under the IGP(C) program here.

### 23 August 2016

Had a look at Kyushu Inst. of Technology today. The program for international students looks good.

Also check out scholarship opportunities at Tokyo Inst. of Technology. Links - 1, 2, 3. There are a bunch of scholarships that can be applied to before you enrol in university. Have a look here.

There’s also the MEXT scholarshipfrom the Japanese government.

### 24 August 2016

Found an interesting FAQ on the UoT website.

Also having a look at JASSO scholarships. Found some great scholarships here.

### 25 August 2016

Found some scholarships. Also, I can also enrol as a privately funded research student at Tokyo Tech.

# Random Thoughts on Bass Tone

This post is about my learnings about bass tone. I’m currently using the following rig:

• Laney RB2 amplifier
• Tech 21 Sansamp Bass Driver Programmable DI
• Fender Mexican Standard Jazz Bass (4 string)

I will updating this post as and when I learn something new that I’d like to document or share. Suggestions are welcome. You can email me (see the ‘about’ section) or post a comment below.

#### 26 July 2016

As of now I’m tweaking the sansamp and trying to achieve good tone that will compliment the post/prog rock sound of my band Cat Kamikazee. I’m also reading up on different terminologies and use cases on the internet. For instance I found this explanation on DI boxes quite useful. For instance I learned that the ‘XLR Out Pad’ button on the sansamp actually provides a 20 db cut to the soundboard if your signal is too hot.

I am trying to couple the sansamp with a basic overdrive pedal I picked up from a friend. This thread on talkbass is pretty useful for that. The guy who answered the question states that it’s better to place the sansamp last in the chain so that the DI can deliver the output of the sound chain.

So the BLEND knob on the sansamp modulates how much of the dry signal is mixed with the sansamp tube amplifier emulation circutry. Can be useful when chaining effects pedals with the sansamp by reducing the blend and letting more of the dry signal pass through. Btw the bass, treble and level controls remain active irrespective of the position of BLEND.

One thing that was a little confusing was the whole thing about ‘harmonic partials’. I found a pretty informative thread about the same on this TalkBass thread.

Here’s an interesting piece on compressors.

Some more useful links I came across over the course of the past few days:

• https://theproaudiofiles.com/amp-overdrive-vs-pedal-overdrive/
• http://www.offbeatband.com/2009/08/the-difference-between-gain-volume-level-and-loudness/

#### 28 July 2016

Found an interesting and informative piece on bass pedals here. It’s a good walkthrough of different pedal types and their functionality and purpose.

I wanted to check out some overdrive pedals today but was soon sinking in a sea of terminologies. One thing that intrigued me is the difference between an overdrive, distortion and fuzz. I found a pretty informative article on this topic. The author has the following to say about these 3 different but seemingly similar things.

I had a look at the Darkglass b3k and b7k pedals too. They look like promising overdrive pedals. I’ll explore the b3k more since the only difference between the 3 and the 7 is that the 7 also functions as a DI box and has an EQ, while the 3 doesn’t. I already have a DI with a 2 band EQ in the sansamp.

#### 29 July 2016

One thing that I noticed when tweaking my sansamp is the level of ‘distortion’ in my tone varies a LOT when you change the bass or treble keeping the drive at the same level. Why does this happen?

#### 2 August 2016

Trying to dive further into distortion today. Found this article kind of useful. It relates mostly to lead guitar tones, but I think it applies in a general case too. I learned about symmetric and asymmetric clipping in that article.

According to the article, symmetric clipping is more focused and clear, because it is only generating one set of harmonic overtones. Since asymmetric clipping can be hard-clipped on one side, and soft-clipped on the other, it has the potential to create very thick complex sounds. This means that if you want plenty of overtones, but do not want a lot of gain, asymmetric clipping can be useful. For full-blown distortion symmetric clipping is usually more suitable, since high-gain tones are already very harmonically complex. Typically asymmetric clipping will have a predominant first harmonic, which the symmetric clipping will not (that’s probably why in this video, the SD1 sounds brigther than than the TS-9). High gain distortion tones sound best with most of the distortion coming from the pre-amp, so try to use a fairly neutral pickup or even a slightly ‘bright’ pickup.

The follow up to the above post talks about EQ in relation with distortion. It has stuff on pre and post EQ distortion and how it can affect the overall tone. If you place the EQ before the distortion, you can actually shape which frequencies will be clipped. However if you place it after the distortion then the EQ will only act for shaping the already distorted tone. Pre-dist EQ is more useful in most cases since it let’s you control the frequencies for clipping.

It also says that humbucking pickups have a mid-boost that is more focused by the lower part of the frequency range. Single coil pickups on the other hand have a mid-boost focused by the upper part of the frequency range. Single coils generally have clearer, more articulate bass end.

# Overview

I thought I’ll try something new by recording screencasts for some of my work on Ruby open source libraries.

This is quite a change for me since I’m primarily focused on the programming and designing side of things. Creating documentation is something I’ve not ventured into a lot except the usual YARD markup for Ruby methods and classes.

In this blog post (which I will keep updating as time progresses) I hope to document my efforts in creating screencasts. Mind you this is the first time I’m creating a screencast so if you find any potential improvements in my methods please point them out in the comments.

# Creating the video

My first ever screencast will be for my benchmark-plot gem. For creating the video I’m mainly using two tools - Kdenlive for video editing and Kazam for recording screen activity. I initially tried using Pitivi and OpenShot for video editing, but the former did not seem user friendly and the latter kept crashing on my system. For the desktop recording I first tried using RecordMyDesktop but gave up on it since it’s too heavy on resources and recoreded poor quality screencasts with not too many customization options.

For creating informative visuals, I’m using LibreOffice Impress so that I can create a slide, take it’s screenshot when in slideshow mode and put in the screencast. However I’ve generally found that using slides does not serve well the content delivery in a screencast and will probably not feature too many slides in future screencasts.

Sublime Text 3 is my primary text editor. I use it’s in built code execution functionality (by pressing Ctrl + Shift + B) to execute a code snippet and display the results immediately.

# Creating the audio

I am using Audacity for recording sound. Sadly my mic produces a lot of noise, so for removing that noise in Audacity, I use the inbuilt noise reduction tools.

Noise reduction in Audacity can be achieved by first selecting a small part of the sound that does not contain speech, then go to Effects -> Noise Reduction and click on ‘Get Noise Profile’. Then select the whole sound wave with Ctrl + A. Go to Effects -> Noise Reduction again and click ‘OK’. It should considerably reduce static noise from your sound file.

All files are exported to Ogg Vorbis.

# Putting it all together

I did some research on the screencasting process and found this article by Avdi Grimm and this one by Sayanee Basu extremely helpful.

I first started by writing the transcript along with any code samples that I had to show. I made it a point to describe the code being typed/displayed on the screen since it’s generally more useful to have a voice over explaning the code than having to pause the video and go over it yourself.

Then I recorded the voice over just for the part that featured slides. I imported the screenshots of the slides in kdenlive and adjusted them such that they fit the voice over. Recording the code samples was a bit of a challenge. I started typing out the code and talking about it into the mic. This was more difficult than I thought, almost like playing a Guitar and singing at the same time. I ended up recording the screencast in 4 separate takes, with several retakes for each take.

After importing the screencast with voice over into kdenlive and separating the audio and video components, I did some cuts to reduce redundancy or imperfections in my VO. Some of the parts of the video where there was a lot of typing had to be sped up by using kdenlive’s Speed tool.

Once this was upto my satisfaction, I exported it to mp4.

The video of my first screencast is now up on YouTube in the video below. Have a look and leave your feedback in the comments!

# Summary of Work This Summer for GSOC 2015

Over this summer as a part of Google Summer of Code 2015, daru received a lot of upgrades and new features which have made a pretty robust tool for data analysis in pure ruby. Of course, a lot of work still remains for bringing daru at par with the other data analysis solutions on offer today, but I feel the work done this summer has put daru on that path.

The new features led to the inclusion of daru in many of SciRuby’s gems, which use daru’s data storage, access and indexing features for storing and carrying around data. Statsample, statsample-glm, statsample-timeseries, statsample-bivariate-extensions are all now compatible with daru and use Vector and DataFrame as their primary data structures. Daru’s plotting functionality, that interfaced with nyaplot for creating interactive plots directly from the data was also significantly overhauled.

Also, new gems developed by other GSOC students, notably Ivan’s GnuplotRB gem and Alexej’s mixed_models gem both accept data from daru data structures. Do see their repo pages for seeing interesting ways of using daru.

The work on daru is also proving to be quite useful for other people, which led a talk/presentation at DeccanRubyConf 2015, which is one of the three major ruby conferences in India. You can see the slides and notebooks presented at the talk here. Given the current interest in data analysis and the need for a viable solution in ruby, I plan to take daru much further. Keep watching the repo for interesting updates :)

In the rest of this post I’ll elaborate on all the work done this summer.

## Pre-mid term submissions

Daru as a gem before GSOC was not exactly user friendly. There were many cases, particularly the iterators, that required some thinking before anybody used them. This is against the design philosophy of daru, or even ruby general, where surprising programmers with ubiqtuos constructs is usually frowned down upon by the community. So the first thing that I did mainly concerned overhauling the daru’s many iterators for both Vector and DataFrame.

For example, the #map iterator from Enumerable returns an Array no matter object you call it on. This was not the case before, where #map would a Daru::Vector or Daru::DataFrame. This behaviour was changed, and now #map returns an Array. If you want a Vector or a DataFrame of the modified values, you should call #recode on Vector or DataFrame.

Each of these iterators also accepts an optional argument, :row or :vector, which will define the axis over which iteration is supposed to be carried out. So now there are the #each, #map, #map!, #recode, #recode!, #collect, #collect_matrix, #all?, #any?, #keep_vector_if and #keep_row_if. To iterate over elements along with their respective indexes (or labels), you can likewise use #each_row_with_index, #each_vector_with_index, #map_rows_with_index, #map_vector_with_index, #collect_rows_with_index, #collect_vector_with_index or #each_index. I urge you to go over the docs of each of these methods to utilize the full power of daru.

Apart from this there was also quite a bit of refactoring involved for many methods (courtesy Alexej). This has made daru much faster than previous versions.

The next (major) thing to do was making daru compatible with statsample. This was very essential since statsample is very important tool for statistics in ruby and it was using its own Vector and Dataset classes, which weren’t very robust as computation tools and very difficult to use when it came to cleaning or munging data. So I replaced statsample’s Vector and Dataset clases with Daru::Vector and Daru::DataFrame. It involved a significant amount of work on both statsample and daru. Statsample because many constructs had to changed to make them compatible with daru, and daru because there was a lot of essential functionality in these classes that had to be ported to daru.

Porting code from statsample to daru improved daru significantly. There were a whole of statistics methods in statsample that were imported into daru and you can now use all them from daru. Statsample also works well with rubyvis, a great tool for visualization. You can now do that with daru as well.

Many new methods for reading and writing data to and from files were also added to daru. You can now read and write data to and from CSV, Excel, plain text files or even SQL databases.

In effect, daru is now completely compatible with statsample (and all the other statsample extensions). You can use daru data structures for storing data and pass them to statsample for performing computations. The biggest advantage of this approach is that the analysed data can be passed around to other scientific ruby libraries (some of which listed above) that use daru as well. Since daru offers in-built functions to better ‘see’ your data, better visualization is possible.

See these blogs and notebooks for a complete overview of daru’s new features.

Also see the notebooks in the statsample README for using daru with statsample.

## Post-mid term submissions

Most of time post the mid term submissions was spent in implementing the time series functions for daru.

I implemented a new index, the DateTimeIndex, which can used for indexing data on time stamps. It enables users to query data based on time stamps. Time stamps can either be specified with precise ruby DateTime objects or can be specified as strings, which will lead to retrival of all the data falling under that time. For example specifying ‘2012’ returns all data that falls in the year 2012. See detailed usage of DateTimeIndex in conjunction with other daru constructs in the daru README.

An essential utility in implementing DateTimeIndex was DateOffset, which is a new set of classes that offsets dates based on certain rules or business logic. It can advance or lag a ruby DateTime to the nearest day or any day of the week or the end or beginning of the month etc. DateOffset is an essential part of DateTimeIndex and can also be used as a standalone utility for advancing/lagging DateTime objects. This blog post elaborates more on the nuances of DateOffset and its usage.

The last thing done during the post mid term was complete compatibility with statsample-timeseries, which was created by Ankur Goel during GSOC 2013. It offers many uesful functions for analysis of time series data. It now works with daru containers. See some use cases here.

Thats all, as far as I can remember.

# Elaboration on Certain Internals of Daru

In this blog post I will elaborate on how a few of the features in daru were implemeted. Notably I will stress on what spurred a need for that particular design of the code.

This post is primarily intended to serve as documentation for me and future contributors. If readers have any inputs on improving this post, I’d be happy to accept new contributions :)

## Index factory architecture

Daru currently supports three types of indexes, Index, MultiIndex and DateTimeIndex.

It became very tedious to write if statements in the Vector or DataFrame codebase whenever a new data structure was to be created, since there were 3 possible indexes that could be attached with every data set. This mainly depended on what kind of data was present in the index, i.e. tuples would create a MultiIndex, DateTime objects or date-like strings would create a DateTimeIndex, and everything else would create a Daru::Index.

This looked something like the perfect use case for the factory pattern, the only hurdle being that the factory pattern in the pure sense of the term would be a superclass, something called Daru::IndexFactory that created an Index, DateTimeIndex or MultiIndex index using some methods and logic. The problem is that I did not want to call a separate class for creating Indexes. This would break existing code and possibly cause problems in libraries that were already using daru (viz. statsample), not to mention confusing users about which class they’re actually supposed to be using.

The solution came after I read this blog post, which demonstrates that the .new method for any class can be overridden. Thus, instead of calling initialize for creating the instance of a class, it calls the overridden new, which can then call initialize for instantiating an instance of that class. It so happens that you can make new return any object you want, unlike initialize which must an instance of the class it is declared in. Thus, for the factory pattern implementation of Daru::Index, we over-ride the .new method of the Daru::Index and write logic such that it manufactures the appropriate kind of index based on the data that is passed to Daru::Index.new(data). The pseudo code for doing this looks something like this:

Also, since over-riding .new tampers with the subclasses of the class as well, an inherited hook that replaces the over-ridden .new of the inherited class with the original one was added to Daru::Index.

## Working of the where clause

The where clause in daru lets users query data with a Array containing boolean variables. So whenever you call where on Daru::Vector or DataFrame, and pass in an Array containing true or false values, all the rows corresponding with true will be returned as a Vector or DataFrame respectively.

Since the where clause works in cojunction with the comparator methods of Daru::Vector (which return a Boolean Array), it was essential for these boolean arrays to be combined together such that piecewise AND and OR operations could be performed between multiple boolean arrays. Hence, the Daru::Core::Query::BoolArray class was created, which is specialized for handling boolean arrays and performing piecewise boolean operations.

The BoolArray defines the #& method for piecewise AND operations and it defines the #| method for piecewise OR operations. They work as follows:

# Finding and Combining Data in Daru

## Arel-like query syntax

Arel is a very popular ruby gem that is one of the major components of the most popular ruby framework, Rails. It is an ORM-helper of sorts that exposes a beatiful and intuitive syntax for creating SQL strings by chaining Ruby methods.

Daru successfully adopts this syntax and the result is a very intuitive and readable syntax for obtaining any sort of data from a DataFrame or Vector.

As a quick demonstration, lets create a DataFrame which looks like this:

To select all rows where df[:a] equals 2 or df[:c] equals 55, just write this:

As is easily seen above, the Daru::Vector class has special comparators defined on it, which allow it to check each value of the Vector and return an object that can be evaluated by the DataFrame#where method.

Notice that to club the two comparators above, we have used the union OR (|) operator.

Daru::Vector has a bunch of comparator methods defined on it, which can be used with #where for obtaining the desired results. All of these return an object of type Daru::Core::Query::BoolArray, which is read by #where. BoolArray uses the methods | (also aliased as #or) and & (also aliased as #and) for piecewise logical operations on other BoolArray objects.

BoolArray consists of an internal Array that contains true for every entry in the Vector that returns true for an operation between the comparable operand and a Vector entry.

For example,

The #& (or #and) and #| (or #or) methods on BoolArray apply a logical and and a logical or respectively between each element of the BoolArray and return another BoolArray that contains the results. For example:

The following comparators can be used with a Daru::Vector:

Comparator Method Description
eq Uses == and returns true for each equal entry
not_eq Uses != and returns true for each unequal entry
lt Uses < and returns true for each entry less than the supplied object
lteq Uses <= and returns true for each entry less than or equal to the supplied object
mt Uses > and returns true for each entry more than the supplied object
mteq Uses >= and returns true for each entry more than or equal to the supplied object
in Uses == for each element in the collection (Array, Daru::Vector, etc.) passed and returns true for a match

A major advantage of using the #where clause over DataFrame#filter or Vector#keep_if, apart from better readability and usability, is that it is much faster. These benchmarks prove my point.

I’ll conclude this chapter with a little more complex example of using the arel-like query syntax with a Daru::Vector object:

For more examples on using the arel-like query syntax, see this notebook. ## Joins

Daru::DataFrame offers the #join method for performing SQL style joins between two DataFrames. Currently #join supports inner, left outer, right outer and full outer joins between DataFrames.

In order to demonstrate joins, lets consider a single example of an inner on two DataFrames:

For more examples please refer this notebook.

# Analysis of Time Series in Daru

The newest release of daru brings alongwith it added support for time series data analysis, manipulation and visualization.

A time series is any data is indexed (or labelled) by time. This includes the stock market index, prices of crude oil or precious metals, or even geo-locations over a period of time.

The primary manner in which daru implements a time series is by indexing data objects (i.e Daru::Vector or Daru::DataFrame) on a new index called the DateTimeIndex. A DateTimeIndex consists of dates, which can queried individually or sliced.

## Introduction

A very basic time series can be created with something like this:

In the above code, the DateTimeIndex.date_range function is creating a DateTimeIndex starting from a particular date and spanning for 1000 periods, with a frequency of 1 day between period. For a complete coverage of DateTimeIndex see this notebook. For an introduction to the date offsets used by daru see this blog post.

The index is passed into the Vector like a normal Daru::Index object.

## Statistics functions and plotting for time series

Many functions are avaiable in daru for computing useful statistics and analysis. A brief of summary of statistics methods available on time series is as follows:

Method Name Description
rolling_mean Calculate Moving Average
rolling_median Calculate Moving Median
rolling_std Calculate Moving Standard Deviation
rolling_variance Calculate Moving Variance
rolling_max Calculate Moving Maximum value
rolling_min Calcuclate moving minimum value
rolling_count Calculate moving non-missing values
rolling_sum Calculate moving sum
ema Calculate exponential moving average
macd Moving Average Convergence-Divergence
acf Calculate Autocorrelation Co-efficients of the Series
acvf Provide the auto-covariance value

To demonstrate, the rolling mean of a Daru::Vector can be computed as follows:

This time series can be very easily plotted with its rolling mean by using the GnuplotRB gem:

These methods are also available on DataFrame, which results in calling them on each of numeric vectors:

In a manner similar to that done with Vectors above, we can easily plot each Vector of the DataFrame with GNU plot:

## Usage with statsample-timeseries

Daru now integrates with statsample-timeseries, a statsample extension that provides many useful statistical analysis tools commonly applied to time series.

Some examples with working examples of daru and statsample-timseries are coming soon. Stay tuned!

# Date Offsets in Daru

## Introduction

Daru’s (Data Analysis in RUby) latest release (0.2.0) brings in a host of new features, most important among them being time series manipulation functionality. In this post, we will go over the date offsets that daru offers, which can be used for creating date indexes of specific intervals. The offsets offer a host of options for easy creation of different intervals and even work with standalone DateTime objects to increase or decrease time.

## Offset classes and behaviour

The date offsets are contained in the Daru::Offsets sub-module. A number of classes are offered, each of which implements business logic for advancing or retracting date times by a specific interval.

To demonstrate with a quick example:

As you can see in the above example, an hour was added to the time specified by DateTime and returned. All the offset classes work in a similar manner. Following offset classes are available to users:

Offset Class Description
Daru::DateOffset Generic offset class
Second One Second
Minute One Minute
Hour One Hour
Day One Day
Week One Week. Can be anchored on any week of the day.
Month One Month.
MonthBegin Calendar Month Begin.
MonthEnd Calendar Month End.
Year One Year.
YearBegin Calendar Year Begin.
YearEnd Calendar Year End.

The generic Daru::DateOffset class is used for creating a generic offset by passing the number of intervals you want as the value for a key that describes the type of interval. For example to create an offset of 3 days, you pass the option days: 3 into the Daru::Offset constructor.

On a similar note, the DateOffset class constructor can accept the options :secs, :mins,:hours, :days, :weeks, :months or :years. Optionally, specifying the :n option will tell DateOffset to apply a particular offset more than once. To elaborate:

The specialized offset classes like MonthBegin, YearEnd, etc. all reside inside the Daru::Offsets namespace and can be used by simply calling .new on them. All accept an optional Integer argument that works like the :n option for Daru::DateOffset, i.e it applies the offset multiple times.

To elaborate, consider the YearEnd offset. This offsets the date to the nearest year end after itself:

Of special note is the Week offset. This offset can be ‘anchored’ to any week of the day that you specify. When this is done, the DateTime that is being offset will be offset to that day of the week.

For example, to anchor the Week offset to a Wednesday, pass ‘3’ as a value to the :weekday option:

Likewise, the Week offset can be anchored on any day of the week, by simplying specifying the :weekday option. Indexing for days of the week starts from 0 for Sunday and goes on 6 for Saturday.

## Offset string aliases

The most obvious use of date offsets is for creating DateTimeIndex objects with a fixed time interval between each date index. To make creation of indexes easy, each of the offset classes have been linked to certain string alaises, which can directly passed to the DateTimeIndex class.

For example, to create a DateTimeIndex of 100 periods with a frequency of 1 hour between each period:

Likewise all of the above listed offsets can be aliased using strings, which can be used for specifying the offset in a DateTimeIndex index. The string aliases of each offset class are as follows:

Alias String Offset Class / Description
‘S’ Second
‘M’ Minute
‘H’ Hour
‘D’ Days
‘W’ Default Week. Anchored on SUN.
‘W-SUN’ Week anchored on sunday
‘W-MON’ Week anchored on monday
‘W-TUE’ Week anchored on tuesday
‘W-WED’ Week anchored on wednesday
‘W-THU’ Week anchored on thursday
‘W-FRI’ Week anchored on friday
‘W-SAT’ Week anchored on saturday
‘MONTH’ Month
‘MB’ MonthBegin
‘ME’ MonthEnd
‘YEAR’ Year
‘YB’ YearBegin
‘YE’ YearEnd

See this notebook on daru’s time series functions in order to get a good overview of daru’s time series manipulation functionality.