Thursday, 13 November 2014

Cyber Attacks for Beginners: Miscellaneous Attacks

Nothing to do with security, just a relaxong picture of Basel
Cross Site Scripting, Cross Site Request Forgery, SQL Injection and HTP Response Splitting are all examples of injection attacks. However everything can and has been used to attack sites one way or another. Here a number of attacks are outlined in order to highlight this fact.

Null Character Injection

In Java the null character ( in hexadecimal) is a valid character but in C/C++ it is used to terminate a string. Since C and C++ are used to write operating system interactions this could lead to vulnerabilities. Via OS injection

LDAP Injection

The Lightweight Directory Access Protocol can be attacked in a manner similar to SQL Injection, to attack LDAP repositories. While such repositories may not reap a financial reward they may hold sensitive information which can be replaced with javascript snippets for example.

OS Injection
This happens when a user supplies malicious code that interacts with the operating system. If they can get root privileges they can wipe the machine clean with a command like cd / rm -r * and it is not easy to restore or reinstall a system like this since the boot loader is not removed. An inexperienced systems administrator, or one pressured to restore service instantly, might simply bulk erase the disk thus removing all possibility of using forensic tools to find the origin of the attack or, if the system has not been backed up recently, recovering information. More subtly files could be emptied or altered in the hope the attack would not be discovered till they had been backed up a few times.

Log Injection

An attacker might enter \r\n into their input and, if this is logged, they will forge a log entry. This could be used to damage a company's case in court or to damage their reputation. Such an attack would class as an Advanced Persistent Threat, since it would only target one company at a time and would require reconnaissance.

Directory Traversal

Here the site allows users to retrieve files and an attacker uses this to get arbitrary files, for example the list of user names and passwords. Even though these are stored encrypted, once downloaded the attacker can attack the file at leisure.

XML Injection
Here xml elements contain malicious code, much like LDAP and SQL injection. Again this is a specialised attack.

Buffer Overflow

If user input is not handled safely and attacker can input a string that exceeds the capacity of the buffer designed to hold it thus overwriting other parts of the memory. With luck, skill and reconnaissance the attacker can this inject their own code into the system or simply crash the application. It is rare in Java or other managed languages, but occurs in web applications written in other languages that do not handle buffers safely.

Random Input Attack

At one time smart cards could be attacked by stressing them and feeding them random data till an input caused them to output all the data on the card. This attack could also be combined with a buffer overflow attack but seems to have fallen out of fashion, probably because makers and web application designers have learned how to defend against this sort of attack. It is probably due for revival.

Insecure Direct Object Reference

Here a direct object reference is used insecurely, for example an account number or proce is exposed to the client and the attacker can manipulate this to their benefit. The risk can be mitigated by storing the data in the browser session and using indirect references to map to the actual values. Although more complicated another approach would be to keep the actual values on the server and map the values of indirect references sent by the browser to those on the server.

The Wrap

Any part of a system or web application can serve as an attack point. A general principle of defence is a zero trust policy. Since however this has performance implications a balance needs to be struck, with, say, some code relatively lightly protected, while other code, for example safety critical code, is heavily protected, and perhaps triplicated, with the accepted output being a majority vote of all copies of the code, so if one is compromised it becomes obvious.


The references here only describe some attacks. Googling on the attack names I give will reveal a ton of links. The OWASP links are good for someone with a moderate technical knowledge of security.

  1. This also gives an example of a directory traversal attack used to retrieve the password file. The Buffer overflow section involves C, since languages like Java make this attack hard
The following links point to earlier articles in this series.

Friday, 31 October 2014

Cyber Attacks for Beginners: Http Response Splitting

What is Response splitting?

Response Splitting is quite a bit harder to understand than Cross Site Scripting, Cross Site Request Forgery or SQL Injection. It relies on the fact that:

  1. The HTTP protocol on which the Web is based is a request response protocol, that is every request must have a matching response.
  2. The elements of a response are separated by CR-LF characters

In what follows I use CRLF to denote these responses but in reality these are sent as URL Encoded values, %0d%0a

The twist to this is that the response can come before the request. This sounds insane and hopefully will, if possible, be rectified in a future version of the protocol. I do not know enough to say why the protocol does not simply require dropping of a response with no prior request

Wikipedia puts this more formally:

The attack consists of making the server print a carriage return (CR,ASCII 0x0D) line feed (LF, ASCII 0x0A) sequence followed by content supplied by the attacker in the header section of its response, typically by including them in input fields sent to the application. Per the HTTP standard (RFC 2616), headers are separated by one CRLF and the response's headers are separated from its body by two. Therefore, the failure to remove CRs and LFs allows the attacker to set arbitrary headers, take control of the body, or break the response into two or more separate responses—hence the name.

Outline of an attack
The attacker sends the following

  1. A valid request
  2. A valid but empty response
  3. A second valid response that may (will) contain malicious code
  4. a second request, shortly after the first

1 and 2 pair up as the protocol demands

3 is left dangling till the second request (4)

After the second request (4) is sent the malicious response is sent

If the computer were human it would be thinking

Ah a request. And a reponse God
A second response but no request, hang on to it
Ah, a second request, send the second reponse, and cache it for all repetitions of this request.
Job done.

Why is it Dangerous

At first site this looks insane, the attacker is sending malicious code to themselves. The attack gets really dangerous if the requests and responses are sent to a (proxy) server that caches responses. If the second request (4) is a common one everyone who sends this request is sent the poisonous response. Reference (1) gives a detailed working through an attack and how this attack can be used for Cross site scripting and Cross Site Request Forgery

The following sequence is adapted from (1)


2. Content- Length: 0 CRLF

3. Content-Type: text/htmlCRLF
Content- Length: 35CRLF
alert('Running JS on your machine')

4. Any valid request e.g
    GET /branches.html HTTP/1.1


The defence is simple

Use server side validation and disallow CRLF characters in all requests where user input is reflected in the response header.

The attacker may try to evade this with a double encoding attack that disguises the CRLF characters. If the defender scans for Encoded CRLF characters before decoding, they will be missed.

The Wrap

This attack relies on the properties of the HTTP protocol to stage an attack. It requires the requests and responses to be sent to a server that caches responses.


Tuesday, 14 October 2014

Cyber Attacks for Beginners: Cross Site Request Forgery (CSRF)

What is it
Cross site request forgery is a special case of session hijacking. When you log on to a site it starts a session for you. If you navigate away without signing you may still have a session on that site. If you then visit a malicious site the site can send you malicious code that impersonates you on the site. For example, you visit a bank and close down the page without signing off then visit a malicious site it could then send the bank a POST request to transfer money from your account to theirs without your knowledge. As I said in a previous post, they could also make it look like you are committing some sort of crime and then report it to the police.

Such an attack is not easy. It requires you to navigate away from a site without signing off and visit a malicious site before your session on the first site expires. It also requires getting past any confirmation pages the bank or other site put up. Which makes it a numbers game for the attacker. All they need is a couple of visits a day and they are in business. At no extra cost.

Why is it Dangerous
This attack becomes very dangerous when uses with Javascript and AJAX, which lest them send asynchronous Post requests without your knowledge. When combined with Cross Site Scripting it risks you machine being turned into a zombie controlled by the attacker.

Signs of vulnerability to this include accepting HTTP requests from an authenticated user without having some control to verify that the HTTP request is unique to the user's session and very long session timeouts, which increase the chance an attack is made while the session is valid.
This section outlines some basic defences against CSRF. A full set of defences is given in (1) below.

One powerful defence is to send a random secret shared with the server on each request, something an attacker cannot access and cannot guess

More precisely

A request from a Web application should include a hidden input parameter (token), with a common name such as "CSRFToken", that has a random value (which should be long), generated by a cryptographically strong Random Number Generator, whenever a new session starts. An alternative might be to use a secure hash, Base64 encoded for transmission. As randomness and uniqueness must be used in the data that is hashed to generate the random token this has little advantage over a secure Random Number generator, though local considerations might make this attractive.
The token should only be sent via POST requests and server side actions (that change state) should respond only to POST requests (this is referred to as HTTP Method Scoping). More details (lots) in (1)

Another method is double cookie Submission, where a random token is sent both as a request parameter and as a cookie. The server compares the two to check they are equal. By default the attacker cannot read any data sent by the server or modify cookie values. This is called the same origin policy and requires some skill and effort to disable in the browser (just don''t do it, right?) .

Note that any cross-site scripting (XSS) vulnerabilities (2) can be used to defeat the defences above but there are defences it cannot evade, such as Captcha, reauthentication and one-off passwords such as thos generated by RSA tokens

Do it Yourself protection
  1. Logoff immediately after using a web application (especially online banking)
  2. Do not let your browser store user names and passwords (though the risk is less if the password is stored encrypted)
  3. Using the same browser for sensitive applications and general surfing is a bad idea and leaves you dependent on the security of the site you are visiting. This may be fairly safe for online banking, but not for watching porn.
  4. Use a plugin that disables JavaScript wherever possible so an attacker cannot submit an attack unless they persuade you to submit a form manually.
  5. The above recommendations come from (1). In addition I suggest using an Incognito window, if your browser allows it, so passwords etc do not hang around in your cache.

The Wrap
There is no sure defence against any attack, since attackers and defenders both evolve their techniques and every now advance in technology brings new weaknesses and strengths. The defences outlined here may change the economics of an attack.


  1. Attacks, Defences and how to review code with CSRF in mind

Thursday, 9 October 2014

Beginners Guide to Cyber Attacks: Cross Site Scripting

Cyber Attacks for Beginners: Cross Site Scripting
Keep your eyes open when developing applications

What is it
Cross Site Scripting (XSS) occurs when a website sends untrusted malicious data, for, example HTML or Javascript, to a browser that then runs the code. A typical point of attack is when user input is reflected back to the user, for example when you input your name and the next page says “Welcome “ followed by your name. (examples are given in the links at the bottom of this article)

There are three types of cross site scripting

  1. Reflected XSSwhere malicious data is embedded in the page that is returned to the browser immediately following the request. One example of this is where an attacker tricks a victim into loading a url containing malicious code into their browser. This is sent to a legitimate server and a response containing the malicious code is sent to the victim's browser where it is executed, perhaps sending the victim to the attacker's site where an effort may be made to rob them
  2. Stored XSSwhere malicious script an attacker previously managed to get stored on a server is sent to all users at some later time
  3. DOM based XSSwhere malicious code is injected into the pages DOM.

Reflected and persistent XSSassume that the payload moves from the browser to the server and back: if it goes back to the same browser it is reflected, if it goes to different browsers it is stored. DOM based XSS does not have this limitation.

Why is it Dangerous
XSSis dangerous because the code injected into the browser could do almost anything. The response could contain malicious code invisible to the user (DOM based Cross Site Scripting) that is then executed. The possibilities are endless. The page could display an image which contains some form of malware which is triggered from the page.The code could redirect the user to a site that downloaded malware, or it could send details of what the user does to an attacker, whether in government or private crime. Ir it could frame the user, making it look like they were committing a crime. Fortunately almost all attackers ar only in it for the money, which simplifies the defender'''s job immensely. '

DOM Based Cross site Scripting
When a browser executes Javascript it makes a number of Javascript Objects available to the code. These represent the Document Object Model (DOM) which is what the browser experiences. The DOM is populated according to the browser's understanding of the model. For example document.URL and document.location are populated with the URL of the page, as the browser understands it. They are invisible to the user. Sometimes the page will hold Javascript that parses document.URL and decides on an action, for example writing the value of a parameter to the page.
The danger arises when the original request sends a parameter value that contains malicious code. If the malicious code is hidden behind a # (known technically as a fragment identifier the code after the # is treated as a comment and may not even be sent to the server.

his section outlines some basic defences against XSS. A full set of defences is given in the cheat sheets below. Mostly they rely on escaping and encoding whaich is best handled by a trusted thord party library

Some browsers provide some protection against cross site scripting, for example by encoding special characters that Javascript uses, such as “<” and “>” into safe forms such as “%3C” and “%3E” but these can be evaded by an attack that does not need the raw forms of these characters. Encoding provides a useful layer of defence and there are third party libraries that provide this function but it is not a silver bullet.
Apart from encoding all input and output (and the encoding needed differs according to the context), two techniques are useful: Blacklisting, where the request is rejected if it contains dangerous characters, and white listing, where the request is rejected unless it contains only safe characters.

Generally speaking both black and whitelisting should be used. For example a request containing a name can be rejected if it contains “<” and accepted if it contains only alphanumeric characters. Of course whitelisting gets more complicated for languages like Chinese, Hebrew or Arabic and can be vetoed by budget conscious managers, but the principles remain valid.

In practice white and blacklisting tend to rely on regular expressions, and these need to be tested thoroughly before a product is released.

In brief an effective defence against Stored and Reflected Cross Site Scripting, is server side data validation. Both types can be detected by manual fault injection e.g typing an alert scriptlet into a field. An effective defence againsr DOM based XSS is client side validation of all DOM objects as they are used or changing server side logic to avoid using DOM properties. Since client pages are often server supplied pages it is again the responsibility of the server to protect the user.

Another, perhaps weaker defence is to include an anti-XSS header in the HTML response supplied by the server. This can either be done programatically in a servlet filter or the web server configuration. Which to use is often a matter of taste: filters still need to be configured in the server or application configuration files, but if you have a number of filters already the extra cost is marginal. This option may not be supported by all browsers and is not supported by old browsers.

Dom Based XSS is a bit trickier to handle. Defences include
  • Avoiding parsing and manipulating DOM objects
  • Sanitising and handling references to DOM Objects carefully. 
The Wrap

The constant arms race between attackers and defenders means there is no sure defence against XSS, or any other attack. The defences here and in the references may however change the economics of an attack so that it is not worthwhile for the attacker who may decide other tactics, like setting up a “legitimate” server that provides a service with a bit of theft on the side, would be easier and more profitable. This seems to be why we have phishing sites.

If you want to try out XSS attacks it is best to do so on a site you own. Doing so on a site you do not own could lead to a knock on the door at 6am. This would tend to ruin your day.

References A clear explanation of how XSSworks DOM Based XSS or XSS of the Third Kind  DOM Based Cross Site sScripting Prevention Cheat Sheet Beginners guide to Cross Site Scripting (XSs)

Saturday, 24 August 2013

Simple Particle Motion Simulation using D3

D3 is a good library for data visualisation. I wanted to visualise an ideal particle in a rectangular 2 dimensional box bouncing off the walls then progress to including gravity in the solution. The mathematics for such a particle is simple. In the process I learned a few things about D3, for example use of map attribute to initialise data and the use of the timer function.

A particle moving in an otherwise empty two dimensional box has a velocity v = (vx,vy). The walls are taken to be parallel to the x and y axes. When the particle hits a wall parallel to the x axis vy → - vy.
When it hits a wall parallel to the y axis vx → - vx .

Clearly such a particle will bounce off the walls forever. In a package like Director this was easy to implement but proved a little tricky in D3

The effect of gravity is included by noting that for the y velocity

dv = g.dt and setting dt = 1. 

Computational Aspects

A real particle moves continuously. In the simulation the virtual particle moves in discrete steps and may never “hit” the virtual wall of the box. Suppose it moves vx pixels along the x axis per time step then if at the end of a step it is vx/2 pixels from the wall it will go through the wall unless told to stop.

So the particle must be told to reverse its motion when it is about to cross a wall. This risks the virtual particle bouncing before it touches the wall, which is unconvincing, though it can be prevented. Here the problem of non rectangular boxes is ignored.

Some code details

The first step is to initialise the data.

// Map the range to an array of data of the same size
// For a large number of balls produces seemingly random motion instantly

var numballs=<your chosen value>;

var acceleration = <your chosen value>;

var data = d3.range(numballs).map(
// Assign random x and y velocities to each particle
var vx = randomVelocity(1);
var vy = randomVelocity(1);

// make some velocity components negative
if(Math.random()<0.5) vx = -vx;
if(Math.random()<0.5) vy= -vy;

// Assign random positions to the particles.
var itsx = itsx = 10*Math.random() -5;
var itsy = itsy =10*Math.random()-5;
return {xloc: itsx, yloc: itsy, xvel: vx, yvel: vy};

Scaling the data

var x = d3.scale.linear().domain([-5, 5]).range([0, width]);
var y = d3.scale.linear().domain([-5, 5])range([0, height]);

This puts the origin at the centre of the screen.

The Timer Function
The key to this exercise was the timer function which simple repeats what it is told to do for ever.


data.forEach( function(d) { update(d);} );

// Move each circle; define a coordinate transform
// translate circle by scaled coordinates
circle.attr("transform", function(d)
{ return "translate(" + x(d.xloc) + "," + y(d.yloc) + ")"; }
// just for fun shrink circle radius near the origin
.attr("r", function(d)
{return Math.sqrt(d.xloc*d.xloc +d.yloc*d.yloc)} ) ;

Updating coordinates

function update(d)

// record old positions, calculate new positions, update positions.
var oldx = d.xloc;
var newx = oldx + d.xvel;
d.xloc= newx;

var oldy = d.yloc
var newy= d.yloc + d.yvel;
d.yloc = newy;

// The upper and lower limits were established empirically-
// to convincing right bounce action
var lowerlimit = -5.1;
var upperlimit = 4.7;

// iscrossing returns true if the ball is about to hit a wall.
xcrossing = iscrossing(oldx,newx,upperlimit,lowerlimit);
ycrossing= iscrossing(oldy,newy,upperlimit,lowerlimit);
// reverse appropriate velocities if about to hit a wall.
if(ycrossing )
d.yvel = -d.yvel;
d.yloc += d.yvel;

if(xcrossing )
d.xvel = -d.xvel;
d.xloc += d.xvel;

Here is the function that decides if the particle is about to hit a wall.
Basically if it is about to cross a wall reverse the velocity component.

function iscrossing(old, new, upperlimit,lowerlimit)
var crossing = old< upperlimit && new > upperlimit;
crossing = crossing || old > upperlimit && new < upperlimit;
crossing = crossing || old < lowerlimit && new>= lowerlimit;
crossing = crossing || old > lowerlimit && new<=lowerlimit;
return crossing;

Including Gravity

The effect of gravity was included by updating the y velocity immediately before updating the position. Failure to do this resulted in unrealistic behaviour.

var oldy = d.yloc
// The velocity update needs to be applied before not after updating position
d.yvel += acceleration;
var newy= d.yloc + d.yvel;
d.yloc = newy;

The initial value of the acceleration was set at 0.01.


After some experimentation with the parameters I managed to get realistic behaviour. For an initially localised swarm of particles the swarm gradually diverged and appeared eventually to reach a state close to random motion, but with some sets of particles moving in unison.

The behaviour was improved, as would be expected, by assigning each particle a random initial velocity and position.

Introducing acceleration led to an apparent bunching of particles at “ground” level while high accelerations resulted in the swarm rising and falling together something like the motion of waves on the sea

The Wrap
The code snippets above were combined, together with standard boilerplate to create a virtual particle moving in a virtual box using d3. The best way to learn is to modify something that works and the code here was adapted from Bostock's SVG Swarm example 

The results so far show that an originally localised swarm would slowly tend to approximate particles in an ideal gas but assigning random initial positions resulted in almost instant approximation to an ideal gas.

They also showed that watching the swarm is hypnotic. 

Monday, 19 August 2013

Computing the mean and variance of a dataset from the mean and variance of two subsets.

The aspects of Big Data are that Volume, Velocity and Variety. Some types of data are adequately summarised by the mean and variance of the dataset. But if the volume is large enough computing these parameters could take too long, and if the data comes from multiple sources aggregating the data before computing the mean and variance could also exceed the computational deadline for the task. The mean and variance of multiple datasets can however be computed in parallel, possibly at different times, and the mean and variance of the total data set, or combinations of the component datasets can be obtained.

It is possible, given the means and variances of two datasets to calculate the mean and variance of the aggregate of the two sets. Here the method is shown for two datasets, but it is obvious how to extend it to multiple datasets. It is theoretically possible to compute this from the variances alone if the covariance of the datasets is known but this is not possible here.


We have two datasets, x of size N and y of size M   and and we know the mean and variance of each set. Using for the aggregate of the two sets, we want its mean and variance.

We know the average:

<x> = sum(x)/N


sum(x) = N<x>

And the estimate of the combined average is

(N<x> + M<y>)/(M+N)

The variance is a bit harder. The variance is given by

var(x) =  <x^2> - <x>^2


  <x^2>  = var(x) +  <x>^2

With analogous results for y

These results can be manipulated into a formula fo the combined variance if desired. It is easier however to use the last result and say

N<x^2> = sum(x^2)

and combine the sums for the two datasets. with the combined average

var(x+y) = N<x^2> + M<y^2>  + (N<x> +M<y>)/(M+N)

This Python code returns the combined mean and variance

# given the sample mean and variance of two datasets returns
# mean and variance of a dataset made by combining the two
# datasets
# may suffer from numerical problems in some circumstances.
def combine(meanx, varx, N, meany, vary, M):
totalsize = N + M;
combinedaverage = meanx*N + meany*M;
combinedaverage = combinedaverage/totalsize;
sumxsq= (varx + meanx**2);
sumxsq = sumxsq*N;
sumysq= vary + meany**2;
sumysq = sumysq*M;
combinedvariance= (sumxsq+ sumysq)/(totalsize) - combinedaverage**2;
return combinedaverage,combinedvariance;

The code was tested by taking two datasets and computing the mean and variance of the aggregated dataset directly and comparing it with the result returned by the method above. This is proof of concept code so issues of numerical stability and edge cases have not been addressed. Use at your own risk. 

If the size of the datasets used to compute the means and variances is not known this method cannot be used. Putting in random values showed that the result the method returns becomes highly inaccurate. 

The Wrap
A way of computing the mean and variance of a dataset given the mean and variance of two subsets of the dataset is presented. The extension to multiple subsets is obvious. If necessary datasets can be combined by recursive halving on a parallel architecture: lining up the pairs of means and variances and adding each even value to each odd value. This is repeated, each repetition halving the number of mean-variance pairs till only one is left. For large datasets this could represent a big speedup of the time needed to compute the mean and variance.

And it was fun  working this out

Sunday, 28 July 2013

Drawing a linegraph with D3

Current Specialties
I was first introduced to D3 late in 2011 and after some experimentation I decided it was great way to visualise data, and it had the added virtue of allowing multiple representations of the same data. It was too late for the project I was on, migrating compley text documents to the web, which was about to finish,but realised it could have simplified maintenance of tables and graphs in the migrated documents since the data owners could be left to maintain spreadsheets of their data and D3 would then present the data. Decoupling of business and technical functions seems line an increasingly good idea. One problem I had with D3 was that the examples on the web were mainly write only code and even tutorial application were harder to follow. I therefore spent a lot of time refactoring my code in an effort to ease maintenance. On the way I found a couple of issues that the code here adresses.

As always any code here comes with no warranty: It worked for me.

The Project

More recently a small web project required me to use D3 to display a history of Exchange rate data. I managed to get the example in
to do what I wanted but  eventually wanted a general purpose linegraph plotting routine. So once the project was essentially finished I returned to the web example and created something closer to what I wanted. In particular I wanted the routine to accept arbitrary x and y datasets, whereas the example above complicated matters by assuming the x values were simply the integers up to a maximum value.

The first step was to separate javascript and html. Following the rule that paranoia is never enough I created a simple function bridge() which initially only threw up an alert to say I had connected properly to the javascript library when the page was loaded. After that bridge was responsible for setting up the parameters of the file, creating data and calling the plot routine, in effect it was a test harness. In the real application bridge would be replaced by an AJAX method that got transformed data from a URL into a form D3 could use.

function bridge()
// create simple X and Y data arrays that we'll plot with a line
var datay = [3, 6, 2, 7, 5, 2, 0, 3, 8, 9, 2, 5, 9, 3, 6, 3, 6, 2, 7, 5, 2, 1, 3, 8, 9, 2, 5, 9, 2, 7,0];
var datax= d3.range(0,datay.length,1);
// define dimensions of graph
var m = [80, 80, 80, 80]; // margins of graph xleft, xright, ytop,ybottom
var w = 1000 - m[1] - m[3]; // width of graph
var h = 400 - m[0] - m[2]; // height of graph
var adiv= "#graph"; // #id of the div that holds the graph. The # is vital
drawgraph(datax,datay,m,w,h, adiv);

Now I could test the function drawgraph(). Drawing a graph involves

  • Creating/reserving an svg area inside the specified div
  • Scaling the data and drawing axes
  • Adding legends to the graph
  • drawing the trace.

Creating the Graph area

function creategraphbackground(adiv,aheight,awidth,margins)
// clear the div that will hold the graph. Without this it is not possible to show another
// graph in the same div or update the existing graph with new data.'');
// Add an SVG element with the desired dimensions and margin.
var svg ="svg");
var fullwidth= awidth + margins[1] + margins[3];
var fullheight = aheight+ margins[0] + margins[2];
// Data will be show in this area
var graph = svg
.attr("width",fullwidth )
.attr("height", fullheight)
.append("svg:g") // append an svg group
.attr("transform", "translate(" + margins[3] + "," + margins[0] +")");

// Put a green rectangle into the svg area
.attr('width', awidth).attr('height', aheight).attr('x', 0).attr('y', 0)
.style('fill', 'lightgreen')
.attr('stroke', 'black')
// put legends on the X and y axes.
graph.append('text').text(' X ☞ ')
.attr('x', awidth/2)
.attr('y', aheight + margins[3]/2)
graph.append('text').text(' Y☝ ')
.attr('x', 20 -margins[0])
.attr('y', aheight - margins[3])
return graph;

At this point all that will be seen if the prgoram pauses is an empty graph with axes and legends
Note that the legends include unicode characters and this needs the line
<meta charset="utf-8">
in the <head> of the html file. I have not yet solved the general problem of positioning the x and y legends nicely.

Adding the axes
Drawing axes requires creating a d3 scale, which appears to be an array of scaled data values and using these when drawing the axes. Creating the scales required knowing the maximum and minimum values of the data in x and y. For this note using D3's max and min functions is enough but in the application one set of data seemed to throw these off and I had to write my own code to find these.

function addaxes(datax,datay,awidth, aheight, margins,graph)
ymin= d3.min(datay);
ymax= d3.max(datay);
// X scale will fit all values from datax[] within pixels 0-w
var x = d3.scale.linear().domain([0, datax.length]).range([0, awidth]);
// Y scale will fit values from ymin to ymax within pixels h-0
// (Note the inverted domain for the y-scale: bigger is up!)
var y = d3.scale.linear().domain([ymin, ymax]).range([aheight, 0]);
// create xAxis
var xAxis = d3.svg.axis().scale(x).tickSize(-aheight).tickSubdivide(true);
// Add the x-axis.
.attr("class", "x axis")
.attr("transform", "translate(0," + aheight + ")")
// create yAxis to left
var yAxisLeft = d3.svg.axis().scale(y).ticks(5).orient("left");
.attr("class", "y axis")
.attr("transform", "translate(-" + awidth-margins[0] + ",0)")
return { x : x, y :y}

There is still some “magic” here to internalise but the interesting point is the return statement which allows multiple results to be returned from the function.

Putting it together
The final result: so far

The final routine for drawing the graph uses the above methods, the code should now be fairly clear

function drawgraph(datax,datay, margins,awidth,aheight,adiv)
var graph = creategraphbackground(adiv,aheight,awidth,margins);
var result = addaxes(datax,datay,awidth, aheight,margins,graph);
var xscale = result.x;
var yscale = result.y;
// create a line function to convert dataset (merged datax and datay) into D3 forma
var line = d3.svg.line()
.x(function(d) { return xscale(d[0])}) // return the scaled X coordinate where we want to plot this datapoint})
.y(function(d) { return yscale(d[1]); });// return the scaled Y coordinate where we want to plot this datapoint })
// D3 requires the dataset to be a list of lists of form [x,y] and merge converts datax and datay into this form
var dataset= merge(datax,datay);
// Add the line by appending an svg:path element with the line function created above
// do this AFTER the axes above so that the line is above the tick-lines
graph.append("svg:path").attr("d", line(dataset));

and that is it

In brief
The code above differs from the original code in that
  • It takes x and y data arrays rather than making assumptions about the data
  • It includes legends on the axes
  • It clears the html div before plotting the graph allowing redrawing the graph in the same div

The interesting point for those not too familiar with javascript is the returning of multiple parameters from a function.

The code is better structured than the original but I would have taken far longer to develop it without such a good starting point. From here I can move onto polar, logarithmic and other plots