LINQ: check for null return from function

One of of my colleagues wrote a LINQ query similar to the following:
var q=from i in keys
select GetTuplesOrPossibleNull(i)
into kvp
orderby kvp.Item2
select kvp;
I’ve noticed, that for some data it throws null exception on kvp.Item2 in orderby statement. It was found that calling function sometimes return null instead of expected key-value pair.

I thought where I should insert check for null and my first idea was to add check in where statement before select function call, something like the following

var q=from i in keys

where GetTuplesOrPossibleNull(i)!=null

select GetTuplesOrPossibleNull(i)

into kvp

orderby kvp.
Item2

select kvp
;

but it looks ugly, against DRY(don’t repeat yourself) principle and could be double slow, because the same function is called twice.
Fortunately check can be done just before orderby statement
var q=from i in keys

select GetTuplesOrPossibleNull(i)

into kvp

where kvp!=null

orderby kvp.Item2

select kvp
;

It looks much nicer and almost twice quicker. I’ve done some benchmark runs using LinqPad.

void Main()

{

TimeSpan ts
=Benchmark(TestCheckingForNullInWhere );

ts.
Dump();//00:00:00.2343750

ts
=Benchmark(TestCheckingForNullFunctionResult);

ts.
Dump();//00:00:00.4062500



}





void TestCheckingForNullFunctionResult()

{

int count=20;

//IEnumerable<Tuple<string, string>> tuples =NotCheckingForNull(count);

IEnumerable
<Tuple<string, string>> tuples =CheckingForNullFunctionResult(count);

tuples.
Dump();



}

void TestCheckingForNullInWhere()

{

int count=20;

//IEnumerable<Tuple<string, string>> tuples =NotCheckingForNull(count);

IEnumerable
<Tuple<string, string>> tuples =CheckingForNullInWhere(count);

tuples.
Dump();



}



public delegate void TestProcedure();

//from http://stackoverflow.com/questions/626679/datatable-select-vs-datatable-rows-find-vs-foreach-vs-findpredicatet-lambda

public TimeSpan Benchmark(TestProcedure tp)

{

int testBatchSize = 5;

List
<TimeSpan> results = new List<TimeSpan>();

for(int i = 0; i<testBatchSize; i++)

{

DateTime start
= DateTime.Now;

tp()
;

results.
Add(DateTime.Now start);

}

return results.Min();

}

// Define other methods and classes here

IEnumerable
<Tuple<string, string>> NotCheckingForNull(int count)

{



List
<int> keys=FillListOfInts( count);

 var q
=from i in keys

select GetTuplesOrPossibleNull(i)

into kvp

orderby kvp.
Item2

select kvp
;

 
return q;

}

IEnumerable
<Tuple<string, string>> CheckingForNullFunctionResult(int count)

{



List
<int> keys=FillListOfInts( count);

 var q
=from i in keys

where GetTuplesOrPossibleNull(i)
!=null

select GetTuplesOrPossibleNull(i)

into kvp

orderby kvp.
Item2

select kvp
;

 
return q;

}



IEnumerable
<Tuple<string, string>> CheckingForNullInWhere(int count)

{



List
<int> keys=FillListOfInts( count);

 var q
=from i in keys

select GetTuplesOrPossibleNull(i)

into kvp

where kvp
!=null

orderby kvp.
Item2

select kvp
;

 
return q;

}



List
<int> FillListOfInts(int count)

{

var keys
=new List<int>();

for(int i=0;i<count;i++)

{

 keys.
Add(i);

}

return keys;

}



Tuple
<string, string> GetTuplesOrPossibleNull(int i)

{

 
int delay =10;

 Thread.
Sleep(delay);

if (i%4==0)

 
return null;

else

return new Tuple<string, string>(i.ToString(),delay.ToString() +“mc delayed “+i.ToString());

}

 

Declare local variable within the loop instead of using loop variable in Linq methods

I’ve noticed in my LINQ code Resharper Warning “‘Access to modified closure’“. The search pointed that there is very confusing potential error.
If  for/foreach loop variable is used only in Linq methods (more general, only as a parameter for delegates) , only the last value of the variable will be used for all calls.
It always required to create local variable inside loop and use the local variable instead of loop variable.